feat/tdp alert pull workflow by duguwanglong · Pull Request #267 · AgentFlocks/flocks

duguwanglong · 2026-05-14T03:46:54Z

1. Alert dedup / triage workflow family

alert_dedup → network_alert_dedup → http_alert_dedup progression: full 4-stage LogProcessPipeline ported from aisoc_mini, then trimmed to a dedup-only variant with explicit branch nodes for log-type and filter routing.
MinHash LSH dedup replaces brute-force Jaccard in dedup_logs, with cross-platform on-disk LSH state, file-locking, and a clearer stats panel.
stream_alert_dedup workflow + ISO-3164 syslog parsing for real-time streaming input.
tdp_alert_pull_dedup workflow for polling TDP API in a loop with built-in dedup.
alert_dedup_triage combines dedup with triage; supports both syslog real-time mode (no summary, low latency) and batch mode (full report). Sub-workflow calls are now embedded via the engine, removing the previous HTTP-service dependency.
tdp_alert_triage rewritten as the canonical "NDR/TDP alert investigation" workflow (replaces the older HTTP-log-only variant); supports mixed TDP/Skyeye batches and flat-format TDP triage; final generate_summary no longer prints the duplicated (cached) label.
Legacy / superseded workflows removed: alert_dedup_triage, http_alert_dedup, stream_alert_dedup, tdp_alert_pull_dedup.

2. New syslog ingestion subsystem (`flocks/ingest/syslog/`)

UDP + TCP listeners with format auto-detection (RFC3164 / RFC5424 / auto).
Per-workflow lifecycle manager with bounded queue (200) + per-workflow semaphore (8) for backpressure under syslog floods.
Synchronous on_msg callback so the UDP protocol layer doesn't spawn an asyncio.Task per datagram (preserves backpressure).
Best-effort drain + cancel on shutdown so in-flight dispatches finish their Storage writes.
Renamed module path flocks/syslog → flocks/ingest/syslog to make room for other ingestion sources.
Server lifespan delays syslog listener startup by 3 s so the main startup path can finish before the first message flood (breaks the crash-restart loop).

3. Workflow execution store

Persist execution records + stats for syslog-triggered runs (previously the syslog path bypassed history).
Per-workflow stats (callCount / successCount / errorCount / avgRuntime) now serialized with an asyncio.Lock so concurrent completions no longer lose increments.
Recorder.record_workflow_execution and history-trim run as background tasks — the syslog dispatcher releases its concurrency slot immediately.
Execution history is auto-trimmed to 500 records / workflow, with the O(N) scan amortised every 50 completions.
/api/workflow/{id}/history endpoint switched from N+1 reads to a single Storage.list_entries query (response time dropped from ~180 s to sub-second on large stores).
Step-progress and finish/error paths now tolerate a missing/non-dict execution record (no more NoneType.update crashes when trim races with a long-running step).
_active_workflow_executions is cleaned up via task.add_done_callback in addition to finally, eliminating registry leaks on cancellation paths.

4. WebUI

New IntegrationTab for managing syslog config (host / port / protocol / format / input key) and showing live status.
WorkflowDetail right panel adds run / integration / runs tabs; RunTab refactored and RunTab.test.tsx added.
api/workflow.ts exposes syslog + integration endpoints; workflow.json locales (en/zh) added.

5. `run_workflow` tool hardening

Handles JSON-encoded string paths, dict-typed JSON input, raw alert_file mode, and split type branches with clear per-type errors.
Cross-platform file-lock with cleaner stats output.
Re-applies workspace path for LSH state (regression from earlier dev merge).
Workflow ID lookup added.

6. Plugin / agent file watchers — stop self-sustaining reload loop

watchdog.on_any_event was firing for opened / closed / closed_no_write events; every reload re-opens YAML/Python files and re-triggered the watcher every 2-3 s, keeping ToolRegistry and the Agent cache thrashing.
Tool / agent / skill watchers now react only to modified/created/deleted/moved and ignore __pycache__ and dotfile noise.

7. Startup hygiene

Safe-access guard so missing config.memory doesn't crash hook registration.
Storage.get / list_entries tolerate non-Pydantic types (fixes legacy session migration crashing on dict.model_validate_json).
Provider auto-register: opencode entry now references the real OpenCodeProvider (was the missing FlocksCompatProvider alias).
Fix command_loader flocks_global_dir NameError typo.
Demote benign mcp.already_initialized warn to debug.

8. Log noise reduction

plugin.tool.duplicate only warns on genuine cross-source collisions; idempotent re-scans of the same source are silent.
agent.toolset.tool_missing demoted to debug for built-in agents declaring optional lsp_* / ast_grep_search tools.

9. Windows installer

python-build-standalone runtime bundled in the installer staging directory.
Installer shortcuts now require elevation.

10. Channel message tool

Attach Bearer API token on local HTTP send.
Reuse API_TOKEN_SECRET_ID and clarify the fallback path.

基于 aisoc_mini 的告警去重算法（URI 归一化 + 5-gram Shingling + Jaccard 相似度）移植为 flocks 工作流，包含 5 个线性节点：接收解析 → URI 归一化 → 去重键计算 → 分组去重 → 报告生成。支持严格字段精确匹配与 LSH 字段近似匹配，输出唯一告警、重复告警及 Markdown 分析报告。 Co-authored-by: Cursor <cursoragent@cursor.com>

…stages) 重写工作流以完整对齐 aisoc_mini 的 LogProcessPipeline 四阶段主流程： - normalize_logs: TDP/Skyeye 字段映射 + 嵌套结构扁平化 - filter_logs: jsonLogic 规则过滤（扫描类/出站/非HTTP告警剔除） - dedup_logs: URI 归一化 + 5-gram Jaccard 相似度去重，生成 dedup_key - analyze_unique: 仅对唯一 dedup_key 调用 LLM is_attack 研判并回填重复告警 - generate_report: 输出四阶段统计 Markdown 报告及 JSONL 数据文件 Co-authored-by: Cursor <cursoragent@cursor.com>

…rallelize LLM analysis 修复审阅中发现的功能性偏差： 1. filter_logs: 完整对齐 LogFilter._get_tdp_process_type() 9 种 process_type 分类 - 修正：HTTP 非扫描告警无论方向（in/out/lateral）都需研判（之前错误地只保留 direction=in/none，会丢失大量本应分析的出站/横向 HTTP 告警） - HTTP 协议判断兼容 application_layer_protocol/net_type/net_app_proto 多字段 - threat_type 取值与原版一致：tdp 取 threat_name，skyeye 取 threat_type - 新增 _process_type、_need_analysis_attack_status 字段 - 统计中加入 filter_process_type_counts 2. analyze_unique: ThreadPoolExecutor 并行调用 LLM（与 LogAnalysis.process_parallel 一致，可通过 analyze_max_workers 配置），单条失败不影响其他 3. dedup_logs: 移除函数体内重复的 import hashlib as _hl 4. 各节点新增轻量 print 进度日志，便于大批量调试 5. workflow.md 同步修正过滤逻辑描述 Co-authored-by: Cursor <cursoragent@cursor.com>

…result routing 用显式 branch 节点替换代码内部的 if-else 路由，拓扑结构变为： receive_alerts → branch_log_type (select_key: source_log_type) label:"tdp" → normalize_tdp → filter_logs label:"skyeye" → normalize_skyeye → filter_logs → branch_has_alerts (select_key: _has_alerts) label:"true" → dedup_logs → analyze_unique → generate_report label:"false" → generate_empty_report（无 LLM 调用的快速终点）修复： - branch_log_type 两条边均使用显式 label ("tdp"/"skyeye")，使 lint 正确识别为互斥路径，消除 multi_incoming_no_join 报错 - false 分支独立终结于 generate_empty_report，避免 generate_report 多入边 lint 错误 - 测试验证：TDP/Skyeye 字段映射、has_alerts true/false 四条路径均正确路由 Co-authored-by: Cursor <cursoragent@cursor.com>

…y to dedup-only - 目录重命名 alert_dedup → network_alert_dedup - workflow.json / meta.json name & id 改为 network_alert_dedup - 移除 analyze_unique / generate_report / generate_empty_report / branch_has_alerts 四个节点 - 移除 LLM 研判相关参数（analyze_enabled / analyze_max_workers） - dedup_logs 成为终点节点，直接输出 dict： deduped_alerts（全量含 dedup_key）/ unique_alerts（唯一簇）/ stats / dedup_summary - filter_logs 清理预填空结果逻辑，精简 outputs 传递 - workflow.md 更新为三阶段流程说明（归一化→过滤→去重） - lint 无警告，模型验证通过 Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…ert_dedup - filter_logs 新增 _has_alerts 输出 - 插入 branch_has_alerts 分支节点（select_key: _has_alerts） - true 路径 → dedup_logs（执行去重，终点） - false 路径 → dedup_empty（无告警，直接返回空 dict，终点） - 更新 workflow.md 流程图及节点说明 - lint 无警告，模型验证通过（8 节点 / 8 边） Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…p_alert_dedup filter_logs 直接连接 dedup_logs，dedup_logs 自然处理空列表输入 Co-authored-by: Cursor <cursoragent@cursor.com>

…_logs 使用 datasketch MinHash + MinHashLSH 替换原有 O(n²) 暴力 Jaccard： - NUM_PERM=128, MINHASH_SEED=2024（对齐 lsh_processor.py） - 共享 permutations，对齐 LSHProcessor 初始化方式 - query_most_similar：LSH 快速候选 → 精确 Jaccard 取最相似（对齐原版逻辑） - normalize_uri 对齐 utils.py（DATETIME/UUID/TRAVERSAL/NULL_REPLACED/HEXADECIMAL） - dedup_key = MD5(strict_text + '. ' + lsh_key)，对齐 LogDedup._generate_dedup_key_text - 实测 453 条 TDP 过滤后告警 → 360 个唯一簇，压缩率 20.5% Co-authored-by: Cursor <cursoragent@cursor.com>

…dup_logs - All Chinese comments replaced with English - cluster_id (int) written to alert['_lsh_cluster_id'] so callers can inspect LSH cluster membership - lsh_cache now stores MinHash directly (not wrapped in dict) for cleaner re-ranking - summary string switched to English Co-authored-by: Cursor <cursoragent@cursor.com>

LSH state (lsh_index + lsh_cache) is now saved to and loaded from: ~/.flocks/data/workflows/http_alert_dedup/lsh_state_np128_th<threshold>.pkl - get_lsh_state_path(): builds path via Config().get_data_path() - load_lsh_state(): loads pickle, validates num_perm/threshold params match - dump_lsh_state(): saves after each run; mirrors LogDedup.dump() - Filename encodes NUM_PERM and threshold so param changes auto-reset state - stats gains lsh_total_clusters and lsh_state_path fields - Verified: run2 correctly loads run1 clusters and accumulates new ones Co-authored-by: Cursor <cursoragent@cursor.com>

Four correctness fixes plus the datasketch dependency: 1. Persist dedup_key_cache (the set of MD5 keys ever seen) into the same pkl so that 'dedup_key_already_exists' is correct across restarts and batches. Previously it was a per-run local set, breaking cross-batch dedup detection. 2. Atomic write: pickle to <state>.tmp, fsync, then os.replace() over the target. A crash mid-write no longer corrupts the persisted state. 3. fcntl.flock(LOCK_EX) on a sibling .lock file serializes load+modify+dump across concurrent workflow runs, eliminating the read-modify-write race that previously caused lost updates. 4. dedup_enabled=False branch now sets _lsh_cluster_id=None so that downstream consumers see a consistent schema regardless of dedup mode. In-batch duplicate detection still works in this branch. Also: warn when persisted cluster count exceeds 100k so operators can rotate state. New stat: lsh_total_dedup_keys. Verified end-to-end: - cross-batch: same alert in run2 reports already_exists=True - corruption: garbage pkl is auto-discarded, no crash - concurrent: 5 parallel processes writing 10 alerts each end up with 50 persisted clusters (not lost to read-modify-write) - disabled: _lsh_cluster_id=None, in-batch duplicate detected Co-authored-by: Cursor <cursoragent@cursor.com>

Cross-platform support: - Replace bare 'import fcntl' with platform detection. - POSIX path keeps fcntl.flock(LOCK_EX). - Windows path uses msvcrt.locking(LK_LOCK, 1) on a single byte of the lock file, looping on OSError to wait beyond LK_LOCK's built-in 10-second cap. - Both branches release the lock symmetrically in release_lock(). Review fixes: - 'lsh_state_path', 'lsh_total_clusters', 'lsh_total_dedup_keys' are no longer written to stats when dedup_enabled=False (was misleading; no file is touched in that mode). Added 'dedup_state_persisted' bool for callers to branch on. - Cluster-overflow warning now also checks dedup_key_cache, since pkl size is dominated by per-key entries, not per-cluster. - normalize_uri's UUID regex now uses re.IGNORECASE so uppercase UUIDs map to the same 'UUID' placeholder (and thus the same LSH cluster) as lowercase ones. - Disabled-mode summary clearly states 'no state persisted, in-batch only'. Verified: cross-batch, disabled-mode stats, uppercase UUID dedup, concurrent writers. Co-authored-by: Cursor <cursoragent@cursor.com>

…; add alert_file support run_workflow.py: - When workflow is a JSON-decoded non-dict (e.g. the AI wraps the path in extra quotes producing '\"..path..\"'), treat the decoded string as a file path instead of crashing with 'str object has no attribute name'. - Add upfront dict sanity-check: if the dict has no 'start' key it's almost certainly the inputs dict passed to the wrong parameter; return a clear error instead of a Pydantic validation traceback. - Add a comment clarifying that the else-branch delivers a Path, not a str. workflow.py: - Add sync_workflows_from_filesystem() which app.py imports at startup. Triggers the one-time storage→filesystem migration then returns the count of discovered workflows so the startup log entry is informative. http_alert_dedup/workflow.json: - receive_alerts now accepts alert_file (absolute or ~-prefixed path to a JSON file) as an alternative to inlining the alerts list directly in inputs. This removes the need to manually load large log files via bash before running the workflow (the main friction observed in session 询问可用工作流). - sampleInputs annotated with a _comment_alert_file hint. Co-authored-by: Cursor <cursoragent@cursor.com>

…type errors Previous fix used 'parsed if isinstance(parsed, str) else raw' which silently passed list/int/bool results to Path(), giving a misleading 'file not found' error. Now three explicit branches: - parsed is str → double-encoded path; use decoded value as file path - parsed is None → JSONDecodeError; use raw string as file path (original behaviour) - parsed is other → clear error stating the unexpected JSON type Co-authored-by: Cursor <cursoragent@cursor.com>

…ID lookup Resolve the merge conflict in the `workflow` string parsing block by combining both sides: - HEAD: explicit type dispatch on json.loads() result (dict / str / None / other) with clear per-type error messages. - Upstream (d8b44e5): on JSONDecodeError, try read_workflow_from_fs() first so callers can pass a plain workflow ID instead of a full JSON string. Merged result: keep the type-dispatch structure from HEAD; in the `parsed is None` (JSONDecodeError) branch, attempt workflow-ID resolution via read_workflow_from_fs() before falling back to a file path. Co-authored-by: Cursor <cursoragent@cursor.com>

…edup-workflow

The get_state_paths() change to ~/.flocks/workspace/workflows/ was lost when merging dev into feat/alert-dedup-workflow (da58f7b). Re-apply: - Remove top-level `from flocks.config import Config` import - Replace Config().get_data_path() with Config().get_global().data_dir.parent / 'workspace' / 'workflows' - Update node description accordingly Also sync the running release snapshot and restart the service process so the /invoke API immediately uses the new path. Co-authored-by: Cursor <cursoragent@cursor.com>

…calls workflow_center_invoke (POST /workflow-center/{id}/invoke) now records execution stats via create_execution_record + _update_workflow_stats + _record_execution_result, so that UI callCount / successCount / errorCount counters are updated for every /workflow-center call — not only for agent-driven /workflow/{id}/run calls. Co-authored-by: Cursor <cursoragent@cursor.com>

New workflow: - Add alert_dedup_triage workflow: chains http_alert_dedup → tdp_alert_triage in a single pipeline; per-alert dedup via MinHash LSH, then LLM triage for first-seen unique alerts; duplicate alerts are annotated with cached triage results from a persisted triage_cache.pkl (FIFO LRU, max_dedup_keys cap). http_alert_dedup improvements: - Upgrade dedup_key_cache from set to ordered dict for FIFO LRU eviction. - Add max_dedup_keys input param (default 100 000); oldest entries are evicted before each persist so the state file stays bounded. - Use a monotonic cluster_id counter (_cid_box) instead of len(lsh_cache) so IDs never collide after eviction; remove evicted cluster_ids from the MinHashLSH index to prevent stale query results. - Expose lsh_max_dedup_keys / lsh_evicted_keys / lsh_evicted_clusters in stats. - Forward max_dedup_keys through normalize_* and filter_logs nodes. tdp_alert_triage: - Remove web-log detection branch; workflow now accepts HTTP alerts directly. - Add _strip_think() to all LLM nodes to strip <think>…</think> blocks. - Update workflow.md to reflect simplified node structure. Test tools (moved from scripts/ → tests/integration/): - test_http_alert_dedup_stream.py: streaming simulation for http_alert_dedup. - test_alert_dedup_triage_stream.py: end-to-end dedup → triage pipeline test. Co-authored-by: Cursor <cursoragent@cursor.com>

- Add flocks/syslog package (constants, parser, listener, manager) to receive UDP/TCP syslog messages and trigger workflow execution via the Channel-GatewayManager pattern - Expose POST/GET /workflow/{id}/syslog-config API endpoints; start/stop all enabled listeners in the FastAPI lifespan - Add SyslogConfig TS interface and saveSyslogConfig/getSyslogConfig API methods to the frontend workflow client - Extract PublishSection, KafkaSection, SyslogSection from RunTab into a new IntegrationTab component; wire it as a fourth "Integration" tab in RightPanel so the Run tab only shows test-run and execution history - Add tabIntegration i18n keys (zh-CN / en-US) and syslogActive badge shown when listener is enabled and collapsed Co-authored-by: Cursor <cursoragent@cursor.com>

Group data-ingestion triggers under a shared flocks/ingest/ namespace so that future connectors (kafka, webhook, …) live alongside syslog as sibling sub-packages rather than top-level packages. - Move flocks/syslog/* → flocks/ingest/syslog/* - Add flocks/ingest/__init__.py (empty package marker) - Update all internal imports (flocks.syslog → flocks.ingest.syslog) in listener.py, manager.py, __init__.py, server/app.py and server/routes/workflow.py Co-authored-by: Cursor <cursoragent@cursor.com>

receive_alerts node now handles three input sources in priority order: 1. syslog_message (injected by flocks syslog listener, RFC3164/5424) - TDP alert JSON parsed from syslog_message.message - Syslog metadata (hostname, severity, timestamp…) attached to alert under _syslog_meta for traceability; does not affect dedup/triage logic 2. alerts (batch list, existing) 3. alert_file (local JSON path, existing) Also adds syslog-config API documentation and example to workflow.md and syslog_message sample to metadata.sampleInputs. Co-authored-by: Cursor <cursoragent@cursor.com>

…move HTTP service dependency Replace direct HTTP POST calls to published service ports (19000/19001) with in-process embedded invocations using flocks.workflow.runner.run_workflow and flocks.workflow.fs_store.workflow_scan_dirs. - _invoke_workflow(): locates sub-workflow.json via workflow_scan_dirs(), then calls run_workflow() directly in the same process - Removes urllib/HTTP helpers, dedup_service_url, triage_service_url inputs - Adds dedup_workflow_id / triage_workflow_id inputs (default: http_alert_dedup, tdp_alert_triage) so callers can override the sub-workflow IDs if needed - http_alert_dedup and tdp_alert_triage no longer need to be published as running services for alert_dedup_triage to function Co-authored-by: Cursor <cursoragent@cursor.com>

…nglish - Node description fields: translated to English - generate_summary: verdict_label / stage_label dicts and summary_md template converted from Chinese to English - Code comments in receive_alerts and dedup_and_triage were already English - description_cn field and LLM prompt content in tdp_alert_triage left as-is (intentionally Chinese for LLM interaction) Co-authored-by: Cursor <cursoragent@cursor.com>

…able Co-authored-by: Cursor <cursoragent@cursor.com>

…report in batch mode Add branch_output_mode node after dedup_and_triage: - input_mode == 'syslog' -> direct_output: returns single-alert triage result directly (one-liner summary_report, no table, no file write) - input_mode == 'alerts' | 'alert_file' -> generate_summary: full statistics table + top-risk report written to pipeline_summary.md Both paths emit identical output field names for downstream compatibility. Co-authored-by: Cursor <cursoragent@cursor.com>

…iggered runs Syslog-triggered workflow runs were invisible in the WebUI history panel and not counted in the workflow stats card because `_trigger_workflow` called the runner directly, bypassing both the execution record write and the call counter update. Run records were never written, and `callCount` only ever reflected HTTP-triggered runs. This change makes the syslog path go through the same persistence helpers as the HTTP API path so all trigger sources are uniformly visible in the UI: - syslog/manager: wrap each run with create_execution_record / record_execution_result; write the same fields the WebUI consumes (`outputResults`, `errorMessage`, `duration`, `executionLog`, `currentNodeId`, `currentPhase`, `currentStepIndex`); tag input params with `_trigger=syslog` for source identification. - workflow/execution_store: move the workflow stats counter update into `record_execution_result` so every persisted result automatically increments callCount/successCount/errorCount/totalRuntime/avgRuntime, regardless of trigger source. - server/routes/workflow: remove the now-redundant `_update_workflow_stats` calls from the HTTP run/invoke paths and drop the orphaned helper to avoid double counting. Co-authored-by: Cursor <cursoragent@cursor.com>

… triage http_alert_dedup: - Replace branch_log_type + normalize_tdp + normalize_skyeye nodes with a single unified normalize node; each alert is individually classified via field signatures (nested net dict / behave_uuid for TDP; uri / vuln_name / attack_result for Skyeye) before field mapping, enabling mixed-type batches in a single invocation. - Update filter_logs to read per-alert _source_type set by normalize instead of the batch-level source_log_type, ensuring correct threat/process type classification in mixed batches. - source_log_type is retained as an optional batch-level fallback hint when per-alert detection is inconclusive. alert_dedup_triage: - Implement 4-priority source_log_type resolution in receive_alerts: (1) explicit input param, (2) syslog app_name/hostname hint, (3) JSON field auto-detection on first alert, (4) default 'tdp'. - Emit source_log_type_reason for traceability. tdp_alert_triage: - Extend receive_alert pick() lookups to cover flat TDP field names (net_real_src_ip, net_dest_ip, net_http_url, net_http_reqs_body, net_http_resp_body, net_http_status, etc.) alongside nested TDP and normalized schema, fixing empty-payload LLM analysis for pre-flattened TDP alerts arriving via alert_dedup_triage. Co-authored-by: Cursor <cursoragent@cursor.com>

…edup-workflow

…slog support - Add stream_alert_dedup workflow: streaming single-record deduplication via syslog UDP input (RFC3164/5424/iso3164), with MinHash LSH (128 perms, 5-gram shingles, Jaccard threshold 0.7). Output is original alert JSON enriched with dedup_key and is_duplicate fields, written to date-partitioned JSONL files (~/.flocks/workspace/workflows/stream_alert_dedup/YYYY-MM-DD/) with a max of 10,000 records per file and a timestamped file header. - Fix syslog parser to handle non-standard iso3164 format used by TDP devices: <PRI>ISO_TS HOSTNAME APP[PID]: msg (ISO 8601 timestamp, no RFC5424 version number). Adds _ISO3164_REST_RE regex, _parse_iso3164() handler, and auto- detection in parse_syslog(). Result carries format="iso3164" for traceability. Existing rfc3164 and rfc5424 parsing is unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

Add a long-running workflow that actively polls TDP attack alerts via the tdp_log_search tool (TDP v3.3.10) instead of relying on syslog ingestion, reusing the stream_alert_dedup pipeline (normalize → filter → MinHash LSH dedup) at each iteration. Key behavior: - Single python node runs `while True` with configurable pull_interval_s (default 60s), max_iterations and max_runtime_s for stop control; node_timeout_s is raised to 30 days so the node can run indefinitely. - Time cursor at ~/.flocks/workflows/tdp_alert_pull_dedup/cursor.json is persisted atomically each round; restarts resume from next_from with no gaps and no overlap. TDP failures do not advance the cursor so the same window is retried on the next round. - Enriched alerts are appended to JSONL files under ~/.flocks/workflows/tdp_alert_pull_dedup/<YYYY-MM-DD>/alerts_NNN.jsonl, with a file_header line and a 10,000-records-per-file rollover. - LSH state is persisted independently from stream_alert_dedup at ~/.flocks/workflows/tdp_alert_pull_dedup/lsh_state_np128_th*.pkl with atomic write + file lock and FIFO LRU eviction (max_dedup_keys). - Robust TDP response unwrapping accepts list / {"log":[...]} / {"list":[...]} / {"data":[...]} / nested {data: {list: [...]}} shapes. Layout follows existing workflows (workflow.json + workflow.md); the companion _node_pull_dedup_loop.py source file plus _build_workflow.py script keep the embedded node code readable and regenerable. Both underscore-prefixed files are ignored by the workflow scanner which only picks up workflow.json. Verified with Workflow.from_dict + workflow_lint (0 issues) and compile_workflow_file inside the flocks venv. Co-authored-by: Cursor <cursoragent@cursor.com>

Route installer-created desktop and Start menu launchers through an elevated PowerShell helper so Windows prompts for UAC before starting Flocks. This keeps the shared flocks.cmd wrapper as the real entrypoint while preventing updater failures caused by insufficient permissions. Co-authored-by: Cursor <cursoragent@cursor.com>

…aging (#256) - Pin CPython 3.12.12 in versions.manifest.json with standalone archive metadata - Download and extract install-only tarball in build-staging.ps1 (mirror env support) - Configure PATH and UV_* env vars in bootstrap-windows.ps1 for bundled Python - Assert bundled Python in Windows packaging CI workflows - Extend manifest and bootstrap script contract tests

_http_session_send was posting to /api/channel/session-send without an Authorization header, so the server-side auth middleware rejected it as a non-browser request and returned HTTP 401. Read the API token from the secret manager (server_api_token) and inject it as Authorization: Bearer <token>. If no token is configured locally and the server still returns 401, silently fall back to the in-process delivery path so the tool keeps working in unauthenticated setups. Co-authored-by: Cursor <cursoragent@cursor.com>

…back - Import API_TOKEN_SECRET_ID from flocks.server.auth instead of hardcoding "server_api_token" so the client and server-side auth middleware cannot drift out of sync and silently start failing 401. - Refine the comment on the 401 fallback: distinguish "client did not obtain a token" from "server has no token configured", and make it explicit that when we DID send a token but it was rejected we do not fall back, surfacing the server detail so misconfiguration is visible. Co-authored-by: Cursor <cursoragent@cursor.com>

Document the contribution workflow in English, link it from the READMEs, and allow docs markdown files to be tracked in git.

) * feat(server,webui): phased startup, route timing, session/tools UX - Run server startup in timed phases; defer non-critical work to background tasks and cancel them on shutdown; log blocking startup duration - Add duration logging for session list, tools list/refresh, task dashboard APIs - Export workflow filesystem sync helper for startup - WebUI: stop auto-selecting first session; enable live/SSE only with session id; reorder tools refresh vs list; reduce loading flicker on polling hooks; drop redundant task queue refetch from Task page - TUI: inline tsconfig compiler options (drop extends) - Add useTools tests and adjust Session page tests * fix(mcp): owner task for session lifecycle and serialized RPC - Run remote/stdio MCP I/O in one asyncio task with a command queue so list/call/read/disconnect share the same ClientSession context - Use streamable_http_client transport factory pattern; improve cleanup and pending-command failure handling on disconnect - Update MCP client/SSE tests for owner-task wiring; add same-task assertions - Simplify browser-use SKILL CDP fallback instructions * fix(mcp): normalize client timeout when config is missing or invalid Default to 30s for None/non-numeric/<=0 values so connect wait_for uses a finite bound; add regression test for timeout=None. * feat(server): log slow route timings at INFO; fix(mcp) stdio stderr lifecycle - Add log_route_timing helper: fast requests emit DEBUG, >=300ms emit INFO - Wire session list, task dashboard routes, and tools list/refresh to helper - MCP: wrap stdio stderr TemporaryFile in context manager; drop unused _streams_context; add regression test ensuring stderr file closes on failure - Clarify SSE-first shutdown comment in app lifespan

Co-authored-by: Cursor <cursoragent@cursor.com>

…rt-pull-workflow

…kills (#263) - browser-use: document product browser experience and link new reference - Add browser-experience-in-skill.md for workflow and templates - web2cli: output under outputs/web2cli, iteration step 11, auth troubleshooting - cli-in-skill: browser-workflow.md scope and writing guide

Plugin/agent/skill file watchers used watchdog's ``on_any_event`` and re-fired on ``opened``/``closed``/``closed_no_write`` events. Each reload opens every YAML/Python file, which retriggers the watcher and creates a self-sustaining loop that scanned plugins every 2-3 s, kept ToolRegistry and Agent caches thrashing, and silently amplified syslog-driven load. Only react to actual content events (modified/created/deleted/moved) and ignore __pycache__/dotfile noise. Workflow execution path: - Serialize per-workflow stats RMW with an asyncio.Lock so concurrent syslog/HTTP completions no longer drop callCount/successCount. - Run Recorder audit + history trim as background tasks so the syslog dispatcher releases its semaphore slot without waiting on SQLite. - Tolerate a missing/non-dict execution record in step progress and finish/error paths instead of crashing on ``NoneType.update``. - Add done_callback cleanup so ``_active_workflow_executions`` cannot leak when a run is cancelled before reaching its own ``finally``. Startup hygiene: - Skip built-in hook registration safely when ``config.memory`` is None. - Migrate-legacy-sessions: drop the bogus ``dict`` model arg to ``Storage.get`` and make ``Storage.get/list_entries`` tolerate non-Pydantic types via ``json.loads`` fallback. - Provider auto-register: opencode entry referenced the missing ``FlocksCompatProvider`` symbol on the submodule; use the real ``OpenCodeProvider`` class. - command_loader: fix ``flocks_global_dir`` NameError typo. - mcp: demote benign ``mcp.already_initialized`` warn to debug. Logs noise: - ``plugin.tool.duplicate``: silently skip idempotent re-scans of the same plugin source; only warn on genuine cross-source collisions. - ``agent.toolset.tool_missing``: demote to debug for built-in agents that declare optional ``lsp_*``/``ast_grep_search`` tools. Co-authored-by: Cursor <cursoragent@cursor.com>

…rt-pull-workflow

…riage - Remove four obsolete workflows that have been superseded by the current alert-triage pipeline: alert_dedup_triage, http_alert_dedup, stream_alert_dedup, tdp_alert_pull_dedup. - Rewrite tdp_alert_triage workflow (definition + docs) as the canonical "NDR/TDP alert investigation" workflow, replacing the previous HTTP-log oriented variant. Co-authored-by: Cursor <cursoragent@cursor.com>

…rt-pull-workflow

…e, bind-error reporting ## fix: syslog ingest — replace semaphore+create_task with a bounded worker pool The previous design drained the bounded queue immediately by spawning an `asyncio.Task` per message; the semaphore was only acquired inside those tasks, so pending coroutines could grow without bound under a syslog flood. Replace with a fixed pool of `_MAX_CONCURRENT_EXECUTIONS` `_worker_loop` coroutines that each `await queue.get()` serially. The pool size is now the *only* concurrency knob; total in-flight workflow runs is exactly the pool size, making backpressure a hard invariant rather than a soft hint. ## fix(watcher): check dest_path on atomic-save events in tool/agent/skill watchers Editors (vim, VS Code "useAtomicSave", …) persist edits by writing a sibling temp file then renaming it onto the real target. watchdog reports this as a `moved` event where `src_path` is the throwaway name and `dest_path` is the actual `tool.yaml` / `agent.yaml` / `SKILL.md`. Filtering only by `src_path` (the previous behaviour) silently skipped these valid updates. Extract the path-matching logic into module-level predicates `_tool_event_should_reload`, `_agent_event_should_reload`, and `_skill_event_should_reload` that inspect both endpoints before deciding to skip an event. ## fix(syslog): save-config endpoint surfaces bind failures synchronously `restart_workflow` now blocks (up to `_BIND_WAIT_TIMEOUT_S = 3 s`) until the socket either binds successfully or raises `OSError`. Bind failures are recorded in `_listener_status` with `state="failed"` and propagated back to `POST /api/workflow/{id}/syslog-config` as `409 Conflict`, so the UI sees an actionable error instead of a false "Listening" badge. Add `GET /api/workflow/{id}/syslog-status` to expose the runtime state (binding / listening / failed / stopped, queue depth, worker count) independently of the persisted config. Update `IntegrationTab.tsx` to derive the "Listening" indicator from the runtime `syslogStatus.state` rather than the saved `enabled` field, add an amber "binding…" indicator, red error banners on failure, and a live queue-depth readout while listening. ## test: add pytest-runnable regression tests for all three areas - `tests/ingest/test_syslog_manager_backpressure.py` — worker pool bounds in-flight dispatches, queue drops on full, stop_workflow cancels pool cleanly - `tests/ingest/test_syslog_manager_bind_failure.py` — restart_workflow reports state="failed" on port conflict; state="stopped" when disabled - `tests/tool/test_watcher_atomic_save.py` — all three watcher predicates accept dest_path on rename and reject unrelated / hidden / pycache paths 15 new tests, all passing. Co-authored-by: Cursor <cursoragent@cursor.com>

…ll-workflow feat/tdp alert pull workflow

Follow-up to PR AgentFlocks#267: the syslog config form already validated host but accepted any integer for port, allowing values such as 0, 22, or 99999 to be saved and only blow up at bind time with a confusing OS error (e.g. "[Errno 49] can't assign requested address"). - Backend SyslogConfigRequest.port now enforces ge=1, le=65535 via Pydantic so out-of-range ports are rejected with HTTP 422 before the config is persisted or the listener is restarted. - IntegrationTab.tsx adds an inline validator that strictly matches a numeric string (rejecting "5140abc", "3.14", "-1", etc.) and the 1..65535 range; the input shows a red border with a localized hint and the save button is disabled while the value is invalid. - extractErrorMessage now flattens Pydantic 422 detail arrays ([{loc, msg, type}, ...]) into a readable "; "-joined string so scripted callers (or any caller that bypasses the UI validator) see the actual validation message instead of "[object Object]". - Adds zh-CN/en-US copy for detail.run.syslogPortError. Co-authored-by: Cursor <cursoragent@cursor.com>

duguwanglong and others added 30 commits May 8, 2026 16:00

chore(workflow): remove aisoc_mini references from network_alert_dedup

edd9d63

Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(workflow): rename network_alert_dedup → http_alert_dedup

4f59e98

Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(workflow): remove branch_has_alerts and dedup_empty from htt…

248aa19

…p_alert_dedup filter_logs 直接连接 dedup_logs，dedup_logs 自然处理空列表输入 Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'dev' of github.com:AgentFlocks/flocks into feat/alert-d…

da58f7b

…edup-workflow

fix(generate_summary): remove duplicate '(cached)' label in summary t…

ce8ef77

…able Co-authored-by: Cursor <cursoragent@cursor.com>

duguwanglong and others added 16 commits May 12, 2026 10:10

Merge branch 'dev' of github.com:AgentFlocks/flocks into feat/alert-d…

2b0fd1b

…edup-workflow

docs: add contributing guide (#257)

b33ba18

Document the contribution workflow in English, link it from the READMEs, and allow docs markdown files to be tracked in git.

chore: bump package version to v2026.5.12

2415272

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'dev' of github.com:AgentFlocks/flocks into feat/tdp-ale…

e211501

…rt-pull-workflow

Merge branch 'dev' of github.com:AgentFlocks/flocks into feat/tdp-ale…

21ca0d7

…rt-pull-workflow

Merge branch 'dev' of github.com:AgentFlocks/flocks into feat/tdp-ale…

785147d

…rt-pull-workflow

duguwanglong requested a review from xiami762 May 14, 2026 03:46

duguwanglong and others added 2 commits May 14, 2026 14:11

Merge branch 'dev' of github.com:AgentFlocks/flocks into feat/tdp-ale…

7a66670

…rt-pull-workflow

xiami762 approved these changes May 14, 2026

View reviewed changes

xiami762 merged commit c0e0d28 into dev May 14, 2026

duguwanglong pushed a commit to DearEmma/flocks that referenced this pull request May 18, 2026

Merge pull request AgentFlocks#267 from AgentFlocks/feat/tdp-alert-pu…

bac9f4c

…ll-workflow feat/tdp alert pull workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/tdp alert pull workflow#267

feat/tdp alert pull workflow#267
xiami762 merged 48 commits into
devfrom
feat/tdp-alert-pull-workflow

duguwanglong commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

duguwanglong commented May 14, 2026

1. Alert dedup / triage workflow family

2. New syslog ingestion subsystem (flocks/ingest/syslog/)

3. Workflow execution store

4. WebUI

5. run_workflow tool hardening

6. Plugin / agent file watchers — stop self-sustaining reload loop

7. Startup hygiene

8. Log noise reduction

9. Windows installer

10. Channel message tool

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. New syslog ingestion subsystem (`flocks/ingest/syslog/`)

5. `run_workflow` tool hardening