Skip to content

feat/tdp alert pull workflow#267

Merged
xiami762 merged 48 commits into
devfrom
feat/tdp-alert-pull-workflow
May 14, 2026
Merged

feat/tdp alert pull workflow#267
xiami762 merged 48 commits into
devfrom
feat/tdp-alert-pull-workflow

Conversation

@duguwanglong
Copy link
Copy Markdown
Contributor

1. Alert dedup / triage workflow family

  • alert_dedupnetwork_alert_deduphttp_alert_dedup progression: full 4-stage LogProcessPipeline ported from aisoc_mini, then trimmed to a dedup-only variant with explicit branch nodes for log-type and filter routing.
  • MinHash LSH dedup replaces brute-force Jaccard in dedup_logs, with cross-platform on-disk LSH state, file-locking, and a clearer stats panel.
  • stream_alert_dedup workflow + ISO-3164 syslog parsing for real-time streaming input.
  • tdp_alert_pull_dedup workflow for polling TDP API in a loop with built-in dedup.
  • alert_dedup_triage combines dedup with triage; supports both syslog real-time mode (no summary, low latency) and batch mode (full report). Sub-workflow calls are now embedded via the engine, removing the previous HTTP-service dependency.
  • tdp_alert_triage rewritten as the canonical "NDR/TDP alert investigation" workflow (replaces the older HTTP-log-only variant); supports mixed TDP/Skyeye batches and flat-format TDP triage; final generate_summary no longer prints the duplicated (cached) label.
  • Legacy / superseded workflows removed: alert_dedup_triage, http_alert_dedup, stream_alert_dedup, tdp_alert_pull_dedup.

2. New syslog ingestion subsystem (flocks/ingest/syslog/)

  • UDP + TCP listeners with format auto-detection (RFC3164 / RFC5424 / auto).
  • Per-workflow lifecycle manager with bounded queue (200) + per-workflow semaphore (8) for backpressure under syslog floods.
  • Synchronous on_msg callback so the UDP protocol layer doesn't spawn an asyncio.Task per datagram (preserves backpressure).
  • Best-effort drain + cancel on shutdown so in-flight dispatches finish their Storage writes.
  • Renamed module path flocks/syslog → flocks/ingest/syslog to make room for other ingestion sources.
  • Server lifespan delays syslog listener startup by 3 s so the main startup path can finish before the first message flood (breaks the crash-restart loop).

3. Workflow execution store

  • Persist execution records + stats for syslog-triggered runs (previously the syslog path bypassed history).
  • Per-workflow stats (callCount / successCount / errorCount / avgRuntime) now serialized with an asyncio.Lock so concurrent completions no longer lose increments.
  • Recorder.record_workflow_execution and history-trim run as background tasks — the syslog dispatcher releases its concurrency slot immediately.
  • Execution history is auto-trimmed to 500 records / workflow, with the O(N) scan amortised every 50 completions.
  • /api/workflow/{id}/history endpoint switched from N+1 reads to a single Storage.list_entries query (response time dropped from ~180 s to sub-second on large stores).
  • Step-progress and finish/error paths now tolerate a missing/non-dict execution record (no more NoneType.update crashes when trim races with a long-running step).
  • _active_workflow_executions is cleaned up via task.add_done_callback in addition to finally, eliminating registry leaks on cancellation paths.

4. WebUI

  • New IntegrationTab for managing syslog config (host / port / protocol / format / input key) and showing live status.
  • WorkflowDetail right panel adds run / integration / runs tabs; RunTab refactored and RunTab.test.tsx added.
  • api/workflow.ts exposes syslog + integration endpoints; workflow.json locales (en/zh) added.

5. run_workflow tool hardening

  • Handles JSON-encoded string paths, dict-typed JSON input, raw alert_file mode, and split type branches with clear per-type errors.
  • Cross-platform file-lock with cleaner stats output.
  • Re-applies workspace path for LSH state (regression from earlier dev merge).
  • Workflow ID lookup added.

6. Plugin / agent file watchers — stop self-sustaining reload loop

  • watchdog.on_any_event was firing for opened / closed / closed_no_write events; every reload re-opens YAML/Python files and re-triggered the watcher every 2-3 s, keeping ToolRegistry and the Agent cache thrashing.
  • Tool / agent / skill watchers now react only to modified/created/deleted/moved and ignore __pycache__ and dotfile noise.

7. Startup hygiene

  • Safe-access guard so missing config.memory doesn't crash hook registration.
  • Storage.get / list_entries tolerate non-Pydantic types (fixes legacy session migration crashing on dict.model_validate_json).
  • Provider auto-register: opencode entry now references the real OpenCodeProvider (was the missing FlocksCompatProvider alias).
  • Fix command_loader flocks_global_dir NameError typo.
  • Demote benign mcp.already_initialized warn to debug.

8. Log noise reduction

  • plugin.tool.duplicate only warns on genuine cross-source collisions; idempotent re-scans of the same source are silent.
  • agent.toolset.tool_missing demoted to debug for built-in agents declaring optional lsp_* / ast_grep_search tools.

9. Windows installer

  • python-build-standalone runtime bundled in the installer staging directory.
  • Installer shortcuts now require elevation.

10. Channel message tool

  • Attach Bearer API token on local HTTP send.
  • Reuse API_TOKEN_SECRET_ID and clarify the fallback path.

duguwanglong and others added 30 commits May 8, 2026 16:00
基于 aisoc_mini 的告警去重算法(URI 归一化 + 5-gram Shingling + Jaccard
相似度)移植为 flocks 工作流,包含 5 个线性节点:接收解析 → URI 归一化
→ 去重键计算 → 分组去重 → 报告生成。支持严格字段精确匹配与 LSH 字段近似
匹配,输出唯一告警、重复告警及 Markdown 分析报告。

Co-authored-by: Cursor <cursoragent@cursor.com>
…stages)

重写工作流以完整对齐 aisoc_mini 的 LogProcessPipeline 四阶段主流程:
- normalize_logs: TDP/Skyeye 字段映射 + 嵌套结构扁平化
- filter_logs: jsonLogic 规则过滤(扫描类/出站/非HTTP告警剔除)
- dedup_logs: URI 归一化 + 5-gram Jaccard 相似度去重,生成 dedup_key
- analyze_unique: 仅对唯一 dedup_key 调用 LLM is_attack 研判并回填重复告警
- generate_report: 输出四阶段统计 Markdown 报告及 JSONL 数据文件

Co-authored-by: Cursor <cursoragent@cursor.com>
…rallelize LLM analysis

修复审阅中发现的功能性偏差:

1. filter_logs: 完整对齐 LogFilter._get_tdp_process_type() 9 种 process_type 分类
   - 修正:HTTP 非扫描告警无论方向(in/out/lateral)都需研判(之前错误地只保留
     direction=in/none,会丢失大量本应分析的出站/横向 HTTP 告警)
   - HTTP 协议判断兼容 application_layer_protocol/net_type/net_app_proto 多字段
   - threat_type 取值与原版一致:tdp 取 threat_name,skyeye 取 threat_type
   - 新增 _process_type、_need_analysis_attack_status 字段
   - 统计中加入 filter_process_type_counts

2. analyze_unique: ThreadPoolExecutor 并行调用 LLM(与 LogAnalysis.process_parallel
   一致,可通过 analyze_max_workers 配置),单条失败不影响其他

3. dedup_logs: 移除函数体内重复的 import hashlib as _hl

4. 各节点新增轻量 print 进度日志,便于大批量调试

5. workflow.md 同步修正过滤逻辑描述

Co-authored-by: Cursor <cursoragent@cursor.com>
…result routing

用显式 branch 节点替换代码内部的 if-else 路由,拓扑结构变为:

  receive_alerts
      → branch_log_type (select_key: source_log_type)
          label:"tdp"    → normalize_tdp    → filter_logs
          label:"skyeye" → normalize_skyeye → filter_logs
      → branch_has_alerts (select_key: _has_alerts)
          label:"true"  → dedup_logs → analyze_unique → generate_report
          label:"false" → generate_empty_report(无 LLM 调用的快速终点)

修复:
  - branch_log_type 两条边均使用显式 label ("tdp"/"skyeye"),
    使 lint 正确识别为互斥路径,消除 multi_incoming_no_join 报错
  - false 分支独立终结于 generate_empty_report,
    避免 generate_report 多入边 lint 错误
  - 测试验证:TDP/Skyeye 字段映射、has_alerts true/false 四条路径均正确路由

Co-authored-by: Cursor <cursoragent@cursor.com>
…y to dedup-only

- 目录重命名 alert_dedup → network_alert_dedup
- workflow.json / meta.json name & id 改为 network_alert_dedup
- 移除 analyze_unique / generate_report / generate_empty_report / branch_has_alerts 四个节点
- 移除 LLM 研判相关参数(analyze_enabled / analyze_max_workers)
- dedup_logs 成为终点节点,直接输出 dict:
    deduped_alerts(全量含 dedup_key)/ unique_alerts(唯一簇)/ stats / dedup_summary
- filter_logs 清理预填空结果逻辑,精简 outputs 传递
- workflow.md 更新为三阶段流程说明(归一化→过滤→去重)
- lint 无警告,模型验证通过

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…ert_dedup

- filter_logs 新增 _has_alerts 输出
- 插入 branch_has_alerts 分支节点(select_key: _has_alerts)
- true 路径 → dedup_logs(执行去重,终点)
- false 路径 → dedup_empty(无告警,直接返回空 dict,终点)
- 更新 workflow.md 流程图及节点说明
- lint 无警告,模型验证通过(8 节点 / 8 边)

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…p_alert_dedup

filter_logs 直接连接 dedup_logs,dedup_logs 自然处理空列表输入

Co-authored-by: Cursor <cursoragent@cursor.com>
…_logs

使用 datasketch MinHash + MinHashLSH 替换原有 O(n²) 暴力 Jaccard:
- NUM_PERM=128, MINHASH_SEED=2024(对齐 lsh_processor.py)
- 共享 permutations,对齐 LSHProcessor 初始化方式
- query_most_similar:LSH 快速候选 → 精确 Jaccard 取最相似(对齐原版逻辑)
- normalize_uri 对齐 utils.py(DATETIME/UUID/TRAVERSAL/NULL_REPLACED/HEXADECIMAL)
- dedup_key = MD5(strict_text + '. ' + lsh_key),对齐 LogDedup._generate_dedup_key_text
- 实测 453 条 TDP 过滤后告警 → 360 个唯一簇,压缩率 20.5%

Co-authored-by: Cursor <cursoragent@cursor.com>
…dup_logs

- All Chinese comments replaced with English
- cluster_id (int) written to alert['_lsh_cluster_id'] so callers can inspect LSH cluster membership
- lsh_cache now stores MinHash directly (not wrapped in dict) for cleaner re-ranking
- summary string switched to English

Co-authored-by: Cursor <cursoragent@cursor.com>
LSH state (lsh_index + lsh_cache) is now saved to and loaded from:
  ~/.flocks/data/workflows/http_alert_dedup/lsh_state_np128_th<threshold>.pkl

- get_lsh_state_path(): builds path via Config().get_data_path()
- load_lsh_state(): loads pickle, validates num_perm/threshold params match
- dump_lsh_state(): saves after each run; mirrors LogDedup.dump()
- Filename encodes NUM_PERM and threshold so param changes auto-reset state
- stats gains lsh_total_clusters and lsh_state_path fields
- Verified: run2 correctly loads run1 clusters and accumulates new ones

Co-authored-by: Cursor <cursoragent@cursor.com>
Four correctness fixes plus the datasketch dependency:

1. Persist dedup_key_cache (the set of MD5 keys ever seen) into the same pkl
   so that 'dedup_key_already_exists' is correct across restarts and batches.
   Previously it was a per-run local set, breaking cross-batch dedup detection.

2. Atomic write: pickle to <state>.tmp, fsync, then os.replace() over the
   target. A crash mid-write no longer corrupts the persisted state.

3. fcntl.flock(LOCK_EX) on a sibling .lock file serializes load+modify+dump
   across concurrent workflow runs, eliminating the read-modify-write race
   that previously caused lost updates.

4. dedup_enabled=False branch now sets _lsh_cluster_id=None so that downstream
   consumers see a consistent schema regardless of dedup mode. In-batch
   duplicate detection still works in this branch.

Also: warn when persisted cluster count exceeds 100k so operators can rotate
state. New stat: lsh_total_dedup_keys.

Verified end-to-end:
  - cross-batch: same alert in run2 reports already_exists=True
  - corruption:  garbage pkl is auto-discarded, no crash
  - concurrent:  5 parallel processes writing 10 alerts each end up with
                 50 persisted clusters (not lost to read-modify-write)
  - disabled:    _lsh_cluster_id=None, in-batch duplicate detected

Co-authored-by: Cursor <cursoragent@cursor.com>
Cross-platform support:
- Replace bare 'import fcntl' with platform detection.
- POSIX path keeps fcntl.flock(LOCK_EX).
- Windows path uses msvcrt.locking(LK_LOCK, 1) on a single byte of the lock
  file, looping on OSError to wait beyond LK_LOCK's built-in 10-second cap.
- Both branches release the lock symmetrically in release_lock().

Review fixes:
- 'lsh_state_path', 'lsh_total_clusters', 'lsh_total_dedup_keys' are no longer
  written to stats when dedup_enabled=False (was misleading; no file is touched
  in that mode). Added 'dedup_state_persisted' bool for callers to branch on.
- Cluster-overflow warning now also checks dedup_key_cache, since pkl size is
  dominated by per-key entries, not per-cluster.
- normalize_uri's UUID regex now uses re.IGNORECASE so uppercase UUIDs map to
  the same 'UUID' placeholder (and thus the same LSH cluster) as lowercase ones.
- Disabled-mode summary clearly states 'no state persisted, in-batch only'.

Verified: cross-batch, disabled-mode stats, uppercase UUID dedup, concurrent writers.
Co-authored-by: Cursor <cursoragent@cursor.com>
…; add alert_file support

run_workflow.py:
- When workflow is a JSON-decoded non-dict (e.g. the AI wraps the path in extra
  quotes producing '\"..path..\"'), treat the decoded string as a file path instead
  of crashing with 'str object has no attribute name'.
- Add upfront dict sanity-check: if the dict has no 'start' key it's almost certainly
  the inputs dict passed to the wrong parameter; return a clear error instead of a
  Pydantic validation traceback.
- Add a comment clarifying that the else-branch delivers a Path, not a str.

workflow.py:
- Add sync_workflows_from_filesystem() which app.py imports at startup.
  Triggers the one-time storage→filesystem migration then returns the count of
  discovered workflows so the startup log entry is informative.

http_alert_dedup/workflow.json:
- receive_alerts now accepts alert_file (absolute or ~-prefixed path to a JSON file)
  as an alternative to inlining the alerts list directly in inputs.
  This removes the need to manually load large log files via bash before running the
  workflow (the main friction observed in session 询问可用工作流).
- sampleInputs annotated with a _comment_alert_file hint.

Co-authored-by: Cursor <cursoragent@cursor.com>
…type errors

Previous fix used 'parsed if isinstance(parsed, str) else raw' which silently
passed list/int/bool results to Path(), giving a misleading 'file not found' error.

Now three explicit branches:
- parsed is str  → double-encoded path; use decoded value as file path
- parsed is None → JSONDecodeError; use raw string as file path (original behaviour)
- parsed is other → clear error stating the unexpected JSON type

Co-authored-by: Cursor <cursoragent@cursor.com>
…ID lookup

Resolve the merge conflict in the `workflow` string parsing block by
combining both sides:

- HEAD: explicit type dispatch on json.loads() result (dict / str / None /
  other) with clear per-type error messages.
- Upstream (d8b44e5): on JSONDecodeError, try read_workflow_from_fs() first
  so callers can pass a plain workflow ID instead of a full JSON string.

Merged result: keep the type-dispatch structure from HEAD; in the
`parsed is None` (JSONDecodeError) branch, attempt workflow-ID resolution
via read_workflow_from_fs() before falling back to a file path.

Co-authored-by: Cursor <cursoragent@cursor.com>
The get_state_paths() change to ~/.flocks/workspace/workflows/ was lost
when merging dev into feat/alert-dedup-workflow (da58f7b). Re-apply:

- Remove top-level `from flocks.config import Config` import
- Replace Config().get_data_path() with
  Config().get_global().data_dir.parent / 'workspace' / 'workflows'
- Update node description accordingly

Also sync the running release snapshot and restart the service process
so the /invoke API immediately uses the new path.

Co-authored-by: Cursor <cursoragent@cursor.com>
…calls

workflow_center_invoke (POST /workflow-center/{id}/invoke) now records
execution stats via create_execution_record + _update_workflow_stats +
_record_execution_result, so that UI callCount / successCount / errorCount
counters are updated for every /workflow-center call — not only for
agent-driven /workflow/{id}/run calls.

Co-authored-by: Cursor <cursoragent@cursor.com>
New workflow:
- Add alert_dedup_triage workflow: chains http_alert_dedup → tdp_alert_triage
  in a single pipeline; per-alert dedup via MinHash LSH, then LLM triage for
  first-seen unique alerts; duplicate alerts are annotated with cached triage
  results from a persisted triage_cache.pkl (FIFO LRU, max_dedup_keys cap).

http_alert_dedup improvements:
- Upgrade dedup_key_cache from set to ordered dict for FIFO LRU eviction.
- Add max_dedup_keys input param (default 100 000); oldest entries are evicted
  before each persist so the state file stays bounded.
- Use a monotonic cluster_id counter (_cid_box) instead of len(lsh_cache) so
  IDs never collide after eviction; remove evicted cluster_ids from the
  MinHashLSH index to prevent stale query results.
- Expose lsh_max_dedup_keys / lsh_evicted_keys / lsh_evicted_clusters in stats.
- Forward max_dedup_keys through normalize_* and filter_logs nodes.

tdp_alert_triage:
- Remove web-log detection branch; workflow now accepts HTTP alerts directly.
- Add _strip_think() to all LLM nodes to strip <think>…</think> blocks.
- Update workflow.md to reflect simplified node structure.

Test tools (moved from scripts/ → tests/integration/):
- test_http_alert_dedup_stream.py: streaming simulation for http_alert_dedup.
- test_alert_dedup_triage_stream.py: end-to-end dedup → triage pipeline test.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add flocks/syslog package (constants, parser, listener, manager) to
  receive UDP/TCP syslog messages and trigger workflow execution via the
  Channel-GatewayManager pattern
- Expose POST/GET /workflow/{id}/syslog-config API endpoints; start/stop
  all enabled listeners in the FastAPI lifespan
- Add SyslogConfig TS interface and saveSyslogConfig/getSyslogConfig API
  methods to the frontend workflow client
- Extract PublishSection, KafkaSection, SyslogSection from RunTab into a
  new IntegrationTab component; wire it as a fourth "Integration" tab in
  RightPanel so the Run tab only shows test-run and execution history
- Add tabIntegration i18n keys (zh-CN / en-US) and syslogActive badge
  shown when listener is enabled and collapsed

Co-authored-by: Cursor <cursoragent@cursor.com>
Group data-ingestion triggers under a shared flocks/ingest/ namespace so
that future connectors (kafka, webhook, …) live alongside syslog as
sibling sub-packages rather than top-level packages.

- Move flocks/syslog/* → flocks/ingest/syslog/*
- Add flocks/ingest/__init__.py (empty package marker)
- Update all internal imports (flocks.syslog → flocks.ingest.syslog)
  in listener.py, manager.py, __init__.py, server/app.py and
  server/routes/workflow.py

Co-authored-by: Cursor <cursoragent@cursor.com>
receive_alerts node now handles three input sources in priority order:
1. syslog_message (injected by flocks syslog listener, RFC3164/5424)
   - TDP alert JSON parsed from syslog_message.message
   - Syslog metadata (hostname, severity, timestamp…) attached to alert
     under _syslog_meta for traceability; does not affect dedup/triage logic
2. alerts (batch list, existing)
3. alert_file (local JSON path, existing)

Also adds syslog-config API documentation and example to workflow.md and
syslog_message sample to metadata.sampleInputs.

Co-authored-by: Cursor <cursoragent@cursor.com>
…move HTTP service dependency

Replace direct HTTP POST calls to published service ports (19000/19001)
with in-process embedded invocations using flocks.workflow.runner.run_workflow
and flocks.workflow.fs_store.workflow_scan_dirs.

- _invoke_workflow(): locates sub-workflow.json via workflow_scan_dirs(),
  then calls run_workflow() directly in the same process
- Removes urllib/HTTP helpers, dedup_service_url, triage_service_url inputs
- Adds dedup_workflow_id / triage_workflow_id inputs (default: http_alert_dedup,
  tdp_alert_triage) so callers can override the sub-workflow IDs if needed
- http_alert_dedup and tdp_alert_triage no longer need to be published as
  running services for alert_dedup_triage to function

Co-authored-by: Cursor <cursoragent@cursor.com>
…nglish

- Node description fields: translated to English
- generate_summary: verdict_label / stage_label dicts and summary_md
  template converted from Chinese to English
- Code comments in receive_alerts and dedup_and_triage were already English
- description_cn field and LLM prompt content in tdp_alert_triage left as-is
  (intentionally Chinese for LLM interaction)

Co-authored-by: Cursor <cursoragent@cursor.com>
…able

Co-authored-by: Cursor <cursoragent@cursor.com>
…report in batch mode

Add branch_output_mode node after dedup_and_triage:
- input_mode == 'syslog' -> direct_output: returns single-alert triage
  result directly (one-liner summary_report, no table, no file write)
- input_mode == 'alerts' | 'alert_file' -> generate_summary: full
  statistics table + top-risk report written to pipeline_summary.md

Both paths emit identical output field names for downstream compatibility.

Co-authored-by: Cursor <cursoragent@cursor.com>
…iggered runs

Syslog-triggered workflow runs were invisible in the WebUI history panel and
not counted in the workflow stats card because `_trigger_workflow` called the
runner directly, bypassing both the execution record write and the call counter
update. Run records were never written, and `callCount` only ever reflected
HTTP-triggered runs.

This change makes the syslog path go through the same persistence helpers as
the HTTP API path so all trigger sources are uniformly visible in the UI:

- syslog/manager: wrap each run with create_execution_record /
  record_execution_result; write the same fields the WebUI consumes
  (`outputResults`, `errorMessage`, `duration`, `executionLog`, `currentNodeId`,
  `currentPhase`, `currentStepIndex`); tag input params with `_trigger=syslog`
  for source identification.
- workflow/execution_store: move the workflow stats counter update into
  `record_execution_result` so every persisted result automatically increments
  callCount/successCount/errorCount/totalRuntime/avgRuntime, regardless of
  trigger source.
- server/routes/workflow: remove the now-redundant `_update_workflow_stats`
  calls from the HTTP run/invoke paths and drop the orphaned helper to avoid
  double counting.

Co-authored-by: Cursor <cursoragent@cursor.com>
… triage

http_alert_dedup:
- Replace branch_log_type + normalize_tdp + normalize_skyeye nodes with a
  single unified normalize node; each alert is individually classified via
  field signatures (nested net dict / behave_uuid for TDP; uri / vuln_name /
  attack_result for Skyeye) before field mapping, enabling mixed-type batches
  in a single invocation.
- Update filter_logs to read per-alert _source_type set by normalize instead
  of the batch-level source_log_type, ensuring correct threat/process type
  classification in mixed batches.
- source_log_type is retained as an optional batch-level fallback hint when
  per-alert detection is inconclusive.

alert_dedup_triage:
- Implement 4-priority source_log_type resolution in receive_alerts:
  (1) explicit input param, (2) syslog app_name/hostname hint,
  (3) JSON field auto-detection on first alert, (4) default 'tdp'.
- Emit source_log_type_reason for traceability.

tdp_alert_triage:
- Extend receive_alert pick() lookups to cover flat TDP field names
  (net_real_src_ip, net_dest_ip, net_http_url, net_http_reqs_body,
  net_http_resp_body, net_http_status, etc.) alongside nested TDP and
  normalized schema, fixing empty-payload LLM analysis for pre-flattened
  TDP alerts arriving via alert_dedup_triage.

Co-authored-by: Cursor <cursoragent@cursor.com>
duguwanglong and others added 16 commits May 12, 2026 10:10
…slog support

- Add stream_alert_dedup workflow: streaming single-record deduplication via
  syslog UDP input (RFC3164/5424/iso3164), with MinHash LSH (128 perms,
  5-gram shingles, Jaccard threshold 0.7). Output is original alert JSON
  enriched with dedup_key and is_duplicate fields, written to date-partitioned
  JSONL files (~/.flocks/workspace/workflows/stream_alert_dedup/YYYY-MM-DD/)
  with a max of 10,000 records per file and a timestamped file header.

- Fix syslog parser to handle non-standard iso3164 format used by TDP devices:
  <PRI>ISO_TS HOSTNAME APP[PID]: msg (ISO 8601 timestamp, no RFC5424 version
  number). Adds _ISO3164_REST_RE regex, _parse_iso3164() handler, and auto-
  detection in parse_syslog(). Result carries format="iso3164" for traceability.
  Existing rfc3164 and rfc5424 parsing is unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add a long-running workflow that actively polls TDP attack alerts via the
tdp_log_search tool (TDP v3.3.10) instead of relying on syslog ingestion,
reusing the stream_alert_dedup pipeline (normalize → filter → MinHash LSH
dedup) at each iteration.

Key behavior:
- Single python node runs `while True` with configurable pull_interval_s
  (default 60s), max_iterations and max_runtime_s for stop control;
  node_timeout_s is raised to 30 days so the node can run indefinitely.
- Time cursor at ~/.flocks/workflows/tdp_alert_pull_dedup/cursor.json is
  persisted atomically each round; restarts resume from next_from with no
  gaps and no overlap. TDP failures do not advance the cursor so the same
  window is retried on the next round.
- Enriched alerts are appended to JSONL files under
  ~/.flocks/workflows/tdp_alert_pull_dedup/<YYYY-MM-DD>/alerts_NNN.jsonl,
  with a file_header line and a 10,000-records-per-file rollover.
- LSH state is persisted independently from stream_alert_dedup at
  ~/.flocks/workflows/tdp_alert_pull_dedup/lsh_state_np128_th*.pkl with
  atomic write + file lock and FIFO LRU eviction (max_dedup_keys).
- Robust TDP response unwrapping accepts list / {"log":[...]} /
  {"list":[...]} / {"data":[...]} / nested {data: {list: [...]}} shapes.

Layout follows existing workflows (workflow.json + workflow.md); the
companion _node_pull_dedup_loop.py source file plus _build_workflow.py
script keep the embedded node code readable and regenerable. Both
underscore-prefixed files are ignored by the workflow scanner which only
picks up workflow.json.

Verified with Workflow.from_dict + workflow_lint (0 issues) and
compile_workflow_file inside the flocks venv.

Co-authored-by: Cursor <cursoragent@cursor.com>
Route installer-created desktop and Start menu launchers through an elevated PowerShell helper so Windows prompts for UAC before starting Flocks. This keeps the shared flocks.cmd wrapper as the real entrypoint while preventing updater failures caused by insufficient permissions.

Co-authored-by: Cursor <cursoragent@cursor.com>
…aging (#256)

- Pin CPython 3.12.12 in versions.manifest.json with standalone archive metadata
- Download and extract install-only tarball in build-staging.ps1 (mirror env support)
- Configure PATH and UV_* env vars in bootstrap-windows.ps1 for bundled Python
- Assert bundled Python in Windows packaging CI workflows
- Extend manifest and bootstrap script contract tests
_http_session_send was posting to /api/channel/session-send without an
Authorization header, so the server-side auth middleware rejected it as a
non-browser request and returned HTTP 401.

Read the API token from the secret manager (server_api_token) and inject
it as Authorization: Bearer <token>. If no token is configured locally
and the server still returns 401, silently fall back to the in-process
delivery path so the tool keeps working in unauthenticated setups.

Co-authored-by: Cursor <cursoragent@cursor.com>
…back

- Import API_TOKEN_SECRET_ID from flocks.server.auth instead of
  hardcoding "server_api_token" so the client and server-side auth
  middleware cannot drift out of sync and silently start failing 401.
- Refine the comment on the 401 fallback: distinguish "client did not
  obtain a token" from "server has no token configured", and make it
  explicit that when we DID send a token but it was rejected we do not
  fall back, surfacing the server detail so misconfiguration is visible.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document the contribution workflow in English, link it from the READMEs, and allow docs markdown files to be tracked in git.
)

* feat(server,webui): phased startup, route timing, session/tools UX

- Run server startup in timed phases; defer non-critical work to background
  tasks and cancel them on shutdown; log blocking startup duration
- Add duration logging for session list, tools list/refresh, task dashboard APIs
- Export workflow filesystem sync helper for startup
- WebUI: stop auto-selecting first session; enable live/SSE only with session id;
  reorder tools refresh vs list; reduce loading flicker on polling hooks;
  drop redundant task queue refetch from Task page
- TUI: inline tsconfig compiler options (drop extends)
- Add useTools tests and adjust Session page tests

* fix(mcp): owner task for session lifecycle and serialized RPC

- Run remote/stdio MCP I/O in one asyncio task with a command queue so
  list/call/read/disconnect share the same ClientSession context
- Use streamable_http_client transport factory pattern; improve cleanup
  and pending-command failure handling on disconnect
- Update MCP client/SSE tests for owner-task wiring; add same-task assertions
- Simplify browser-use SKILL CDP fallback instructions

* fix(mcp): normalize client timeout when config is missing or invalid

Default to 30s for None/non-numeric/<=0 values so connect wait_for uses a
finite bound; add regression test for timeout=None.

* feat(server): log slow route timings at INFO; fix(mcp) stdio stderr lifecycle

- Add log_route_timing helper: fast requests emit DEBUG, >=300ms emit INFO
- Wire session list, task dashboard routes, and tools list/refresh to helper
- MCP: wrap stdio stderr TemporaryFile in context manager; drop unused
  _streams_context; add regression test ensuring stderr file closes on failure
- Clarify SSE-first shutdown comment in app lifespan
Co-authored-by: Cursor <cursoragent@cursor.com>
…kills (#263)

- browser-use: document product browser experience and link new reference
- Add browser-experience-in-skill.md for workflow and templates
- web2cli: output under outputs/web2cli, iteration step 11, auth troubleshooting
- cli-in-skill: browser-workflow.md scope and writing guide
Plugin/agent/skill file watchers used watchdog's ``on_any_event`` and
re-fired on ``opened``/``closed``/``closed_no_write`` events. Each reload
opens every YAML/Python file, which retriggers the watcher and creates a
self-sustaining loop that scanned plugins every 2-3 s, kept ToolRegistry
and Agent caches thrashing, and silently amplified syslog-driven load.
Only react to actual content events (modified/created/deleted/moved) and
ignore __pycache__/dotfile noise.

Workflow execution path:
- Serialize per-workflow stats RMW with an asyncio.Lock so concurrent
  syslog/HTTP completions no longer drop callCount/successCount.
- Run Recorder audit + history trim as background tasks so the syslog
  dispatcher releases its semaphore slot without waiting on SQLite.
- Tolerate a missing/non-dict execution record in step progress and
  finish/error paths instead of crashing on ``NoneType.update``.
- Add done_callback cleanup so ``_active_workflow_executions`` cannot
  leak when a run is cancelled before reaching its own ``finally``.

Startup hygiene:
- Skip built-in hook registration safely when ``config.memory`` is None.
- Migrate-legacy-sessions: drop the bogus ``dict`` model arg to
  ``Storage.get`` and make ``Storage.get/list_entries`` tolerate
  non-Pydantic types via ``json.loads`` fallback.
- Provider auto-register: opencode entry referenced the missing
  ``FlocksCompatProvider`` symbol on the submodule; use the real
  ``OpenCodeProvider`` class.
- command_loader: fix ``flocks_global_dir`` NameError typo.
- mcp: demote benign ``mcp.already_initialized`` warn to debug.

Logs noise:
- ``plugin.tool.duplicate``: silently skip idempotent re-scans of the
  same plugin source; only warn on genuine cross-source collisions.
- ``agent.toolset.tool_missing``: demote to debug for built-in agents
  that declare optional ``lsp_*``/``ast_grep_search`` tools.

Co-authored-by: Cursor <cursoragent@cursor.com>
…riage

- Remove four obsolete workflows that have been superseded by the
  current alert-triage pipeline: alert_dedup_triage, http_alert_dedup,
  stream_alert_dedup, tdp_alert_pull_dedup.
- Rewrite tdp_alert_triage workflow (definition + docs) as the canonical
  "NDR/TDP alert investigation" workflow, replacing the previous
  HTTP-log oriented variant.

Co-authored-by: Cursor <cursoragent@cursor.com>
@duguwanglong duguwanglong requested a review from xiami762 May 14, 2026 03:46
duguwanglong and others added 2 commits May 14, 2026 14:11
…e, bind-error reporting

## fix: syslog ingest — replace semaphore+create_task with a bounded worker pool

The previous design drained the bounded queue immediately by spawning an
`asyncio.Task` per message; the semaphore was only acquired inside those
tasks, so pending coroutines could grow without bound under a syslog flood.

Replace with a fixed pool of `_MAX_CONCURRENT_EXECUTIONS` `_worker_loop`
coroutines that each `await queue.get()` serially.  The pool size is now the
*only* concurrency knob; total in-flight workflow runs is exactly the pool
size, making backpressure a hard invariant rather than a soft hint.

## fix(watcher): check dest_path on atomic-save events in tool/agent/skill watchers

Editors (vim, VS Code "useAtomicSave", …) persist edits by writing a sibling
temp file then renaming it onto the real target.  watchdog reports this as a
`moved` event where `src_path` is the throwaway name and `dest_path` is the
actual `tool.yaml` / `agent.yaml` / `SKILL.md`.  Filtering only by `src_path`
(the previous behaviour) silently skipped these valid updates.

Extract the path-matching logic into module-level predicates
`_tool_event_should_reload`, `_agent_event_should_reload`, and
`_skill_event_should_reload` that inspect both endpoints before deciding to
skip an event.

## fix(syslog): save-config endpoint surfaces bind failures synchronously

`restart_workflow` now blocks (up to `_BIND_WAIT_TIMEOUT_S = 3 s`) until the
socket either binds successfully or raises `OSError`.  Bind failures are
recorded in `_listener_status` with `state="failed"` and propagated back to
`POST /api/workflow/{id}/syslog-config` as `409 Conflict`, so the UI sees an
actionable error instead of a false "Listening" badge.

Add `GET /api/workflow/{id}/syslog-status` to expose the runtime state
(binding / listening / failed / stopped, queue depth, worker count)
independently of the persisted config.

Update `IntegrationTab.tsx` to derive the "Listening" indicator from the
runtime `syslogStatus.state` rather than the saved `enabled` field, add an
amber "binding…" indicator, red error banners on failure, and a live
queue-depth readout while listening.

## test: add pytest-runnable regression tests for all three areas

- `tests/ingest/test_syslog_manager_backpressure.py` — worker pool bounds
  in-flight dispatches, queue drops on full, stop_workflow cancels pool cleanly
- `tests/ingest/test_syslog_manager_bind_failure.py` — restart_workflow reports
  state="failed" on port conflict; state="stopped" when disabled
- `tests/tool/test_watcher_atomic_save.py` — all three watcher predicates
  accept dest_path on rename and reject unrelated / hidden / pycache paths

15 new tests, all passing.

Co-authored-by: Cursor <cursoragent@cursor.com>
@xiami762 xiami762 merged commit c0e0d28 into dev May 14, 2026
duguwanglong pushed a commit to DearEmma/flocks that referenced this pull request May 18, 2026
…ll-workflow

feat/tdp alert pull workflow
duguwanglong added a commit to DearEmma/flocks that referenced this pull request May 18, 2026
Follow-up to PR AgentFlocks#267: the syslog config form already validated host but
accepted any integer for port, allowing values such as 0, 22, or 99999
to be saved and only blow up at bind time with a confusing OS error
(e.g. "[Errno 49] can't assign requested address").

- Backend SyslogConfigRequest.port now enforces ge=1, le=65535 via
  Pydantic so out-of-range ports are rejected with HTTP 422 before the
  config is persisted or the listener is restarted.
- IntegrationTab.tsx adds an inline validator that strictly matches a
  numeric string (rejecting "5140abc", "3.14", "-1", etc.) and the
  1..65535 range; the input shows a red border with a localized hint
  and the save button is disabled while the value is invalid.
- extractErrorMessage now flattens Pydantic 422 detail arrays
  ([{loc, msg, type}, ...]) into a readable "; "-joined string so
  scripted callers (or any caller that bypasses the UI validator)
  see the actual validation message instead of "[object Object]".
- Adds zh-CN/en-US copy for detail.run.syslogPortError.

Co-authored-by: Cursor <cursoragent@cursor.com>
duguwanglong added a commit to DearEmma/flocks that referenced this pull request May 18, 2026
Follow-up to PR AgentFlocks#267: the syslog config form already validated host but
accepted any integer for port, allowing values such as 0, 22, or 99999
to be saved and only blow up at bind time with a confusing OS error
(e.g. "[Errno 49] can't assign requested address").

- Backend SyslogConfigRequest.port now enforces ge=1, le=65535 via
  Pydantic so out-of-range ports are rejected with HTTP 422 before the
  config is persisted or the listener is restarted.
- IntegrationTab.tsx adds an inline validator that strictly matches a
  numeric string (rejecting "5140abc", "3.14", "-1", etc.) and the
  1..65535 range; the input shows a red border with a localized hint
  and the save button is disabled while the value is invalid.
- extractErrorMessage now flattens Pydantic 422 detail arrays
  ([{loc, msg, type}, ...]) into a readable "; "-joined string so
  scripted callers (or any caller that bypasses the UI validator)
  see the actual validation message instead of "[object Object]".
- Adds zh-CN/en-US copy for detail.run.syslogPortError.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants