feat/tdp alert pull workflow#267
Merged
Merged
Conversation
基于 aisoc_mini 的告警去重算法(URI 归一化 + 5-gram Shingling + Jaccard 相似度)移植为 flocks 工作流,包含 5 个线性节点:接收解析 → URI 归一化 → 去重键计算 → 分组去重 → 报告生成。支持严格字段精确匹配与 LSH 字段近似 匹配,输出唯一告警、重复告警及 Markdown 分析报告。 Co-authored-by: Cursor <cursoragent@cursor.com>
…stages) 重写工作流以完整对齐 aisoc_mini 的 LogProcessPipeline 四阶段主流程: - normalize_logs: TDP/Skyeye 字段映射 + 嵌套结构扁平化 - filter_logs: jsonLogic 规则过滤(扫描类/出站/非HTTP告警剔除) - dedup_logs: URI 归一化 + 5-gram Jaccard 相似度去重,生成 dedup_key - analyze_unique: 仅对唯一 dedup_key 调用 LLM is_attack 研判并回填重复告警 - generate_report: 输出四阶段统计 Markdown 报告及 JSONL 数据文件 Co-authored-by: Cursor <cursoragent@cursor.com>
…rallelize LLM analysis
修复审阅中发现的功能性偏差:
1. filter_logs: 完整对齐 LogFilter._get_tdp_process_type() 9 种 process_type 分类
- 修正:HTTP 非扫描告警无论方向(in/out/lateral)都需研判(之前错误地只保留
direction=in/none,会丢失大量本应分析的出站/横向 HTTP 告警)
- HTTP 协议判断兼容 application_layer_protocol/net_type/net_app_proto 多字段
- threat_type 取值与原版一致:tdp 取 threat_name,skyeye 取 threat_type
- 新增 _process_type、_need_analysis_attack_status 字段
- 统计中加入 filter_process_type_counts
2. analyze_unique: ThreadPoolExecutor 并行调用 LLM(与 LogAnalysis.process_parallel
一致,可通过 analyze_max_workers 配置),单条失败不影响其他
3. dedup_logs: 移除函数体内重复的 import hashlib as _hl
4. 各节点新增轻量 print 进度日志,便于大批量调试
5. workflow.md 同步修正过滤逻辑描述
Co-authored-by: Cursor <cursoragent@cursor.com>
…result routing
用显式 branch 节点替换代码内部的 if-else 路由,拓扑结构变为:
receive_alerts
→ branch_log_type (select_key: source_log_type)
label:"tdp" → normalize_tdp → filter_logs
label:"skyeye" → normalize_skyeye → filter_logs
→ branch_has_alerts (select_key: _has_alerts)
label:"true" → dedup_logs → analyze_unique → generate_report
label:"false" → generate_empty_report(无 LLM 调用的快速终点)
修复:
- branch_log_type 两条边均使用显式 label ("tdp"/"skyeye"),
使 lint 正确识别为互斥路径,消除 multi_incoming_no_join 报错
- false 分支独立终结于 generate_empty_report,
避免 generate_report 多入边 lint 错误
- 测试验证:TDP/Skyeye 字段映射、has_alerts true/false 四条路径均正确路由
Co-authored-by: Cursor <cursoragent@cursor.com>
…y to dedup-only
- 目录重命名 alert_dedup → network_alert_dedup
- workflow.json / meta.json name & id 改为 network_alert_dedup
- 移除 analyze_unique / generate_report / generate_empty_report / branch_has_alerts 四个节点
- 移除 LLM 研判相关参数(analyze_enabled / analyze_max_workers)
- dedup_logs 成为终点节点,直接输出 dict:
deduped_alerts(全量含 dedup_key)/ unique_alerts(唯一簇)/ stats / dedup_summary
- filter_logs 清理预填空结果逻辑,精简 outputs 传递
- workflow.md 更新为三阶段流程说明(归一化→过滤→去重)
- lint 无警告,模型验证通过
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…ert_dedup - filter_logs 新增 _has_alerts 输出 - 插入 branch_has_alerts 分支节点(select_key: _has_alerts) - true 路径 → dedup_logs(执行去重,终点) - false 路径 → dedup_empty(无告警,直接返回空 dict,终点) - 更新 workflow.md 流程图及节点说明 - lint 无警告,模型验证通过(8 节点 / 8 边) Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…p_alert_dedup filter_logs 直接连接 dedup_logs,dedup_logs 自然处理空列表输入 Co-authored-by: Cursor <cursoragent@cursor.com>
…_logs 使用 datasketch MinHash + MinHashLSH 替换原有 O(n²) 暴力 Jaccard: - NUM_PERM=128, MINHASH_SEED=2024(对齐 lsh_processor.py) - 共享 permutations,对齐 LSHProcessor 初始化方式 - query_most_similar:LSH 快速候选 → 精确 Jaccard 取最相似(对齐原版逻辑) - normalize_uri 对齐 utils.py(DATETIME/UUID/TRAVERSAL/NULL_REPLACED/HEXADECIMAL) - dedup_key = MD5(strict_text + '. ' + lsh_key),对齐 LogDedup._generate_dedup_key_text - 实测 453 条 TDP 过滤后告警 → 360 个唯一簇,压缩率 20.5% Co-authored-by: Cursor <cursoragent@cursor.com>
…dup_logs - All Chinese comments replaced with English - cluster_id (int) written to alert['_lsh_cluster_id'] so callers can inspect LSH cluster membership - lsh_cache now stores MinHash directly (not wrapped in dict) for cleaner re-ranking - summary string switched to English Co-authored-by: Cursor <cursoragent@cursor.com>
LSH state (lsh_index + lsh_cache) is now saved to and loaded from: ~/.flocks/data/workflows/http_alert_dedup/lsh_state_np128_th<threshold>.pkl - get_lsh_state_path(): builds path via Config().get_data_path() - load_lsh_state(): loads pickle, validates num_perm/threshold params match - dump_lsh_state(): saves after each run; mirrors LogDedup.dump() - Filename encodes NUM_PERM and threshold so param changes auto-reset state - stats gains lsh_total_clusters and lsh_state_path fields - Verified: run2 correctly loads run1 clusters and accumulates new ones Co-authored-by: Cursor <cursoragent@cursor.com>
Four correctness fixes plus the datasketch dependency:
1. Persist dedup_key_cache (the set of MD5 keys ever seen) into the same pkl
so that 'dedup_key_already_exists' is correct across restarts and batches.
Previously it was a per-run local set, breaking cross-batch dedup detection.
2. Atomic write: pickle to <state>.tmp, fsync, then os.replace() over the
target. A crash mid-write no longer corrupts the persisted state.
3. fcntl.flock(LOCK_EX) on a sibling .lock file serializes load+modify+dump
across concurrent workflow runs, eliminating the read-modify-write race
that previously caused lost updates.
4. dedup_enabled=False branch now sets _lsh_cluster_id=None so that downstream
consumers see a consistent schema regardless of dedup mode. In-batch
duplicate detection still works in this branch.
Also: warn when persisted cluster count exceeds 100k so operators can rotate
state. New stat: lsh_total_dedup_keys.
Verified end-to-end:
- cross-batch: same alert in run2 reports already_exists=True
- corruption: garbage pkl is auto-discarded, no crash
- concurrent: 5 parallel processes writing 10 alerts each end up with
50 persisted clusters (not lost to read-modify-write)
- disabled: _lsh_cluster_id=None, in-batch duplicate detected
Co-authored-by: Cursor <cursoragent@cursor.com>
Cross-platform support: - Replace bare 'import fcntl' with platform detection. - POSIX path keeps fcntl.flock(LOCK_EX). - Windows path uses msvcrt.locking(LK_LOCK, 1) on a single byte of the lock file, looping on OSError to wait beyond LK_LOCK's built-in 10-second cap. - Both branches release the lock symmetrically in release_lock(). Review fixes: - 'lsh_state_path', 'lsh_total_clusters', 'lsh_total_dedup_keys' are no longer written to stats when dedup_enabled=False (was misleading; no file is touched in that mode). Added 'dedup_state_persisted' bool for callers to branch on. - Cluster-overflow warning now also checks dedup_key_cache, since pkl size is dominated by per-key entries, not per-cluster. - normalize_uri's UUID regex now uses re.IGNORECASE so uppercase UUIDs map to the same 'UUID' placeholder (and thus the same LSH cluster) as lowercase ones. - Disabled-mode summary clearly states 'no state persisted, in-batch only'. Verified: cross-batch, disabled-mode stats, uppercase UUID dedup, concurrent writers. Co-authored-by: Cursor <cursoragent@cursor.com>
…; add alert_file support run_workflow.py: - When workflow is a JSON-decoded non-dict (e.g. the AI wraps the path in extra quotes producing '\"..path..\"'), treat the decoded string as a file path instead of crashing with 'str object has no attribute name'. - Add upfront dict sanity-check: if the dict has no 'start' key it's almost certainly the inputs dict passed to the wrong parameter; return a clear error instead of a Pydantic validation traceback. - Add a comment clarifying that the else-branch delivers a Path, not a str. workflow.py: - Add sync_workflows_from_filesystem() which app.py imports at startup. Triggers the one-time storage→filesystem migration then returns the count of discovered workflows so the startup log entry is informative. http_alert_dedup/workflow.json: - receive_alerts now accepts alert_file (absolute or ~-prefixed path to a JSON file) as an alternative to inlining the alerts list directly in inputs. This removes the need to manually load large log files via bash before running the workflow (the main friction observed in session 询问可用工作流). - sampleInputs annotated with a _comment_alert_file hint. Co-authored-by: Cursor <cursoragent@cursor.com>
…type errors Previous fix used 'parsed if isinstance(parsed, str) else raw' which silently passed list/int/bool results to Path(), giving a misleading 'file not found' error. Now three explicit branches: - parsed is str → double-encoded path; use decoded value as file path - parsed is None → JSONDecodeError; use raw string as file path (original behaviour) - parsed is other → clear error stating the unexpected JSON type Co-authored-by: Cursor <cursoragent@cursor.com>
…ID lookup Resolve the merge conflict in the `workflow` string parsing block by combining both sides: - HEAD: explicit type dispatch on json.loads() result (dict / str / None / other) with clear per-type error messages. - Upstream (d8b44e5): on JSONDecodeError, try read_workflow_from_fs() first so callers can pass a plain workflow ID instead of a full JSON string. Merged result: keep the type-dispatch structure from HEAD; in the `parsed is None` (JSONDecodeError) branch, attempt workflow-ID resolution via read_workflow_from_fs() before falling back to a file path. Co-authored-by: Cursor <cursoragent@cursor.com>
The get_state_paths() change to ~/.flocks/workspace/workflows/ was lost when merging dev into feat/alert-dedup-workflow (da58f7b). Re-apply: - Remove top-level `from flocks.config import Config` import - Replace Config().get_data_path() with Config().get_global().data_dir.parent / 'workspace' / 'workflows' - Update node description accordingly Also sync the running release snapshot and restart the service process so the /invoke API immediately uses the new path. Co-authored-by: Cursor <cursoragent@cursor.com>
…calls
workflow_center_invoke (POST /workflow-center/{id}/invoke) now records
execution stats via create_execution_record + _update_workflow_stats +
_record_execution_result, so that UI callCount / successCount / errorCount
counters are updated for every /workflow-center call — not only for
agent-driven /workflow/{id}/run calls.
Co-authored-by: Cursor <cursoragent@cursor.com>
New workflow: - Add alert_dedup_triage workflow: chains http_alert_dedup → tdp_alert_triage in a single pipeline; per-alert dedup via MinHash LSH, then LLM triage for first-seen unique alerts; duplicate alerts are annotated with cached triage results from a persisted triage_cache.pkl (FIFO LRU, max_dedup_keys cap). http_alert_dedup improvements: - Upgrade dedup_key_cache from set to ordered dict for FIFO LRU eviction. - Add max_dedup_keys input param (default 100 000); oldest entries are evicted before each persist so the state file stays bounded. - Use a monotonic cluster_id counter (_cid_box) instead of len(lsh_cache) so IDs never collide after eviction; remove evicted cluster_ids from the MinHashLSH index to prevent stale query results. - Expose lsh_max_dedup_keys / lsh_evicted_keys / lsh_evicted_clusters in stats. - Forward max_dedup_keys through normalize_* and filter_logs nodes. tdp_alert_triage: - Remove web-log detection branch; workflow now accepts HTTP alerts directly. - Add _strip_think() to all LLM nodes to strip <think>…</think> blocks. - Update workflow.md to reflect simplified node structure. Test tools (moved from scripts/ → tests/integration/): - test_http_alert_dedup_stream.py: streaming simulation for http_alert_dedup. - test_alert_dedup_triage_stream.py: end-to-end dedup → triage pipeline test. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add flocks/syslog package (constants, parser, listener, manager) to
receive UDP/TCP syslog messages and trigger workflow execution via the
Channel-GatewayManager pattern
- Expose POST/GET /workflow/{id}/syslog-config API endpoints; start/stop
all enabled listeners in the FastAPI lifespan
- Add SyslogConfig TS interface and saveSyslogConfig/getSyslogConfig API
methods to the frontend workflow client
- Extract PublishSection, KafkaSection, SyslogSection from RunTab into a
new IntegrationTab component; wire it as a fourth "Integration" tab in
RightPanel so the Run tab only shows test-run and execution history
- Add tabIntegration i18n keys (zh-CN / en-US) and syslogActive badge
shown when listener is enabled and collapsed
Co-authored-by: Cursor <cursoragent@cursor.com>
Group data-ingestion triggers under a shared flocks/ingest/ namespace so that future connectors (kafka, webhook, …) live alongside syslog as sibling sub-packages rather than top-level packages. - Move flocks/syslog/* → flocks/ingest/syslog/* - Add flocks/ingest/__init__.py (empty package marker) - Update all internal imports (flocks.syslog → flocks.ingest.syslog) in listener.py, manager.py, __init__.py, server/app.py and server/routes/workflow.py Co-authored-by: Cursor <cursoragent@cursor.com>
receive_alerts node now handles three input sources in priority order:
1. syslog_message (injected by flocks syslog listener, RFC3164/5424)
- TDP alert JSON parsed from syslog_message.message
- Syslog metadata (hostname, severity, timestamp…) attached to alert
under _syslog_meta for traceability; does not affect dedup/triage logic
2. alerts (batch list, existing)
3. alert_file (local JSON path, existing)
Also adds syslog-config API documentation and example to workflow.md and
syslog_message sample to metadata.sampleInputs.
Co-authored-by: Cursor <cursoragent@cursor.com>
…move HTTP service dependency Replace direct HTTP POST calls to published service ports (19000/19001) with in-process embedded invocations using flocks.workflow.runner.run_workflow and flocks.workflow.fs_store.workflow_scan_dirs. - _invoke_workflow(): locates sub-workflow.json via workflow_scan_dirs(), then calls run_workflow() directly in the same process - Removes urllib/HTTP helpers, dedup_service_url, triage_service_url inputs - Adds dedup_workflow_id / triage_workflow_id inputs (default: http_alert_dedup, tdp_alert_triage) so callers can override the sub-workflow IDs if needed - http_alert_dedup and tdp_alert_triage no longer need to be published as running services for alert_dedup_triage to function Co-authored-by: Cursor <cursoragent@cursor.com>
…nglish - Node description fields: translated to English - generate_summary: verdict_label / stage_label dicts and summary_md template converted from Chinese to English - Code comments in receive_alerts and dedup_and_triage were already English - description_cn field and LLM prompt content in tdp_alert_triage left as-is (intentionally Chinese for LLM interaction) Co-authored-by: Cursor <cursoragent@cursor.com>
…able Co-authored-by: Cursor <cursoragent@cursor.com>
…report in batch mode Add branch_output_mode node after dedup_and_triage: - input_mode == 'syslog' -> direct_output: returns single-alert triage result directly (one-liner summary_report, no table, no file write) - input_mode == 'alerts' | 'alert_file' -> generate_summary: full statistics table + top-risk report written to pipeline_summary.md Both paths emit identical output field names for downstream compatibility. Co-authored-by: Cursor <cursoragent@cursor.com>
…iggered runs Syslog-triggered workflow runs were invisible in the WebUI history panel and not counted in the workflow stats card because `_trigger_workflow` called the runner directly, bypassing both the execution record write and the call counter update. Run records were never written, and `callCount` only ever reflected HTTP-triggered runs. This change makes the syslog path go through the same persistence helpers as the HTTP API path so all trigger sources are uniformly visible in the UI: - syslog/manager: wrap each run with create_execution_record / record_execution_result; write the same fields the WebUI consumes (`outputResults`, `errorMessage`, `duration`, `executionLog`, `currentNodeId`, `currentPhase`, `currentStepIndex`); tag input params with `_trigger=syslog` for source identification. - workflow/execution_store: move the workflow stats counter update into `record_execution_result` so every persisted result automatically increments callCount/successCount/errorCount/totalRuntime/avgRuntime, regardless of trigger source. - server/routes/workflow: remove the now-redundant `_update_workflow_stats` calls from the HTTP run/invoke paths and drop the orphaned helper to avoid double counting. Co-authored-by: Cursor <cursoragent@cursor.com>
… triage http_alert_dedup: - Replace branch_log_type + normalize_tdp + normalize_skyeye nodes with a single unified normalize node; each alert is individually classified via field signatures (nested net dict / behave_uuid for TDP; uri / vuln_name / attack_result for Skyeye) before field mapping, enabling mixed-type batches in a single invocation. - Update filter_logs to read per-alert _source_type set by normalize instead of the batch-level source_log_type, ensuring correct threat/process type classification in mixed batches. - source_log_type is retained as an optional batch-level fallback hint when per-alert detection is inconclusive. alert_dedup_triage: - Implement 4-priority source_log_type resolution in receive_alerts: (1) explicit input param, (2) syslog app_name/hostname hint, (3) JSON field auto-detection on first alert, (4) default 'tdp'. - Emit source_log_type_reason for traceability. tdp_alert_triage: - Extend receive_alert pick() lookups to cover flat TDP field names (net_real_src_ip, net_dest_ip, net_http_url, net_http_reqs_body, net_http_resp_body, net_http_status, etc.) alongside nested TDP and normalized schema, fixing empty-payload LLM analysis for pre-flattened TDP alerts arriving via alert_dedup_triage. Co-authored-by: Cursor <cursoragent@cursor.com>
…slog support - Add stream_alert_dedup workflow: streaming single-record deduplication via syslog UDP input (RFC3164/5424/iso3164), with MinHash LSH (128 perms, 5-gram shingles, Jaccard threshold 0.7). Output is original alert JSON enriched with dedup_key and is_duplicate fields, written to date-partitioned JSONL files (~/.flocks/workspace/workflows/stream_alert_dedup/YYYY-MM-DD/) with a max of 10,000 records per file and a timestamped file header. - Fix syslog parser to handle non-standard iso3164 format used by TDP devices: <PRI>ISO_TS HOSTNAME APP[PID]: msg (ISO 8601 timestamp, no RFC5424 version number). Adds _ISO3164_REST_RE regex, _parse_iso3164() handler, and auto- detection in parse_syslog(). Result carries format="iso3164" for traceability. Existing rfc3164 and rfc5424 parsing is unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a long-running workflow that actively polls TDP attack alerts via the
tdp_log_search tool (TDP v3.3.10) instead of relying on syslog ingestion,
reusing the stream_alert_dedup pipeline (normalize → filter → MinHash LSH
dedup) at each iteration.
Key behavior:
- Single python node runs `while True` with configurable pull_interval_s
(default 60s), max_iterations and max_runtime_s for stop control;
node_timeout_s is raised to 30 days so the node can run indefinitely.
- Time cursor at ~/.flocks/workflows/tdp_alert_pull_dedup/cursor.json is
persisted atomically each round; restarts resume from next_from with no
gaps and no overlap. TDP failures do not advance the cursor so the same
window is retried on the next round.
- Enriched alerts are appended to JSONL files under
~/.flocks/workflows/tdp_alert_pull_dedup/<YYYY-MM-DD>/alerts_NNN.jsonl,
with a file_header line and a 10,000-records-per-file rollover.
- LSH state is persisted independently from stream_alert_dedup at
~/.flocks/workflows/tdp_alert_pull_dedup/lsh_state_np128_th*.pkl with
atomic write + file lock and FIFO LRU eviction (max_dedup_keys).
- Robust TDP response unwrapping accepts list / {"log":[...]} /
{"list":[...]} / {"data":[...]} / nested {data: {list: [...]}} shapes.
Layout follows existing workflows (workflow.json + workflow.md); the
companion _node_pull_dedup_loop.py source file plus _build_workflow.py
script keep the embedded node code readable and regenerable. Both
underscore-prefixed files are ignored by the workflow scanner which only
picks up workflow.json.
Verified with Workflow.from_dict + workflow_lint (0 issues) and
compile_workflow_file inside the flocks venv.
Co-authored-by: Cursor <cursoragent@cursor.com>
Route installer-created desktop and Start menu launchers through an elevated PowerShell helper so Windows prompts for UAC before starting Flocks. This keeps the shared flocks.cmd wrapper as the real entrypoint while preventing updater failures caused by insufficient permissions. Co-authored-by: Cursor <cursoragent@cursor.com>
…aging (#256) - Pin CPython 3.12.12 in versions.manifest.json with standalone archive metadata - Download and extract install-only tarball in build-staging.ps1 (mirror env support) - Configure PATH and UV_* env vars in bootstrap-windows.ps1 for bundled Python - Assert bundled Python in Windows packaging CI workflows - Extend manifest and bootstrap script contract tests
_http_session_send was posting to /api/channel/session-send without an Authorization header, so the server-side auth middleware rejected it as a non-browser request and returned HTTP 401. Read the API token from the secret manager (server_api_token) and inject it as Authorization: Bearer <token>. If no token is configured locally and the server still returns 401, silently fall back to the in-process delivery path so the tool keeps working in unauthenticated setups. Co-authored-by: Cursor <cursoragent@cursor.com>
…back - Import API_TOKEN_SECRET_ID from flocks.server.auth instead of hardcoding "server_api_token" so the client and server-side auth middleware cannot drift out of sync and silently start failing 401. - Refine the comment on the 401 fallback: distinguish "client did not obtain a token" from "server has no token configured", and make it explicit that when we DID send a token but it was rejected we do not fall back, surfacing the server detail so misconfiguration is visible. Co-authored-by: Cursor <cursoragent@cursor.com>
Document the contribution workflow in English, link it from the READMEs, and allow docs markdown files to be tracked in git.
) * feat(server,webui): phased startup, route timing, session/tools UX - Run server startup in timed phases; defer non-critical work to background tasks and cancel them on shutdown; log blocking startup duration - Add duration logging for session list, tools list/refresh, task dashboard APIs - Export workflow filesystem sync helper for startup - WebUI: stop auto-selecting first session; enable live/SSE only with session id; reorder tools refresh vs list; reduce loading flicker on polling hooks; drop redundant task queue refetch from Task page - TUI: inline tsconfig compiler options (drop extends) - Add useTools tests and adjust Session page tests * fix(mcp): owner task for session lifecycle and serialized RPC - Run remote/stdio MCP I/O in one asyncio task with a command queue so list/call/read/disconnect share the same ClientSession context - Use streamable_http_client transport factory pattern; improve cleanup and pending-command failure handling on disconnect - Update MCP client/SSE tests for owner-task wiring; add same-task assertions - Simplify browser-use SKILL CDP fallback instructions * fix(mcp): normalize client timeout when config is missing or invalid Default to 30s for None/non-numeric/<=0 values so connect wait_for uses a finite bound; add regression test for timeout=None. * feat(server): log slow route timings at INFO; fix(mcp) stdio stderr lifecycle - Add log_route_timing helper: fast requests emit DEBUG, >=300ms emit INFO - Wire session list, task dashboard routes, and tools list/refresh to helper - MCP: wrap stdio stderr TemporaryFile in context manager; drop unused _streams_context; add regression test ensuring stderr file closes on failure - Clarify SSE-first shutdown comment in app lifespan
Co-authored-by: Cursor <cursoragent@cursor.com>
…kills (#263) - browser-use: document product browser experience and link new reference - Add browser-experience-in-skill.md for workflow and templates - web2cli: output under outputs/web2cli, iteration step 11, auth troubleshooting - cli-in-skill: browser-workflow.md scope and writing guide
Plugin/agent/skill file watchers used watchdog's ``on_any_event`` and re-fired on ``opened``/``closed``/``closed_no_write`` events. Each reload opens every YAML/Python file, which retriggers the watcher and creates a self-sustaining loop that scanned plugins every 2-3 s, kept ToolRegistry and Agent caches thrashing, and silently amplified syslog-driven load. Only react to actual content events (modified/created/deleted/moved) and ignore __pycache__/dotfile noise. Workflow execution path: - Serialize per-workflow stats RMW with an asyncio.Lock so concurrent syslog/HTTP completions no longer drop callCount/successCount. - Run Recorder audit + history trim as background tasks so the syslog dispatcher releases its semaphore slot without waiting on SQLite. - Tolerate a missing/non-dict execution record in step progress and finish/error paths instead of crashing on ``NoneType.update``. - Add done_callback cleanup so ``_active_workflow_executions`` cannot leak when a run is cancelled before reaching its own ``finally``. Startup hygiene: - Skip built-in hook registration safely when ``config.memory`` is None. - Migrate-legacy-sessions: drop the bogus ``dict`` model arg to ``Storage.get`` and make ``Storage.get/list_entries`` tolerate non-Pydantic types via ``json.loads`` fallback. - Provider auto-register: opencode entry referenced the missing ``FlocksCompatProvider`` symbol on the submodule; use the real ``OpenCodeProvider`` class. - command_loader: fix ``flocks_global_dir`` NameError typo. - mcp: demote benign ``mcp.already_initialized`` warn to debug. Logs noise: - ``plugin.tool.duplicate``: silently skip idempotent re-scans of the same plugin source; only warn on genuine cross-source collisions. - ``agent.toolset.tool_missing``: demote to debug for built-in agents that declare optional ``lsp_*``/``ast_grep_search`` tools. Co-authored-by: Cursor <cursoragent@cursor.com>
…riage - Remove four obsolete workflows that have been superseded by the current alert-triage pipeline: alert_dedup_triage, http_alert_dedup, stream_alert_dedup, tdp_alert_pull_dedup. - Rewrite tdp_alert_triage workflow (definition + docs) as the canonical "NDR/TDP alert investigation" workflow, replacing the previous HTTP-log oriented variant. Co-authored-by: Cursor <cursoragent@cursor.com>
…e, bind-error reporting
## fix: syslog ingest — replace semaphore+create_task with a bounded worker pool
The previous design drained the bounded queue immediately by spawning an
`asyncio.Task` per message; the semaphore was only acquired inside those
tasks, so pending coroutines could grow without bound under a syslog flood.
Replace with a fixed pool of `_MAX_CONCURRENT_EXECUTIONS` `_worker_loop`
coroutines that each `await queue.get()` serially. The pool size is now the
*only* concurrency knob; total in-flight workflow runs is exactly the pool
size, making backpressure a hard invariant rather than a soft hint.
## fix(watcher): check dest_path on atomic-save events in tool/agent/skill watchers
Editors (vim, VS Code "useAtomicSave", …) persist edits by writing a sibling
temp file then renaming it onto the real target. watchdog reports this as a
`moved` event where `src_path` is the throwaway name and `dest_path` is the
actual `tool.yaml` / `agent.yaml` / `SKILL.md`. Filtering only by `src_path`
(the previous behaviour) silently skipped these valid updates.
Extract the path-matching logic into module-level predicates
`_tool_event_should_reload`, `_agent_event_should_reload`, and
`_skill_event_should_reload` that inspect both endpoints before deciding to
skip an event.
## fix(syslog): save-config endpoint surfaces bind failures synchronously
`restart_workflow` now blocks (up to `_BIND_WAIT_TIMEOUT_S = 3 s`) until the
socket either binds successfully or raises `OSError`. Bind failures are
recorded in `_listener_status` with `state="failed"` and propagated back to
`POST /api/workflow/{id}/syslog-config` as `409 Conflict`, so the UI sees an
actionable error instead of a false "Listening" badge.
Add `GET /api/workflow/{id}/syslog-status` to expose the runtime state
(binding / listening / failed / stopped, queue depth, worker count)
independently of the persisted config.
Update `IntegrationTab.tsx` to derive the "Listening" indicator from the
runtime `syslogStatus.state` rather than the saved `enabled` field, add an
amber "binding…" indicator, red error banners on failure, and a live
queue-depth readout while listening.
## test: add pytest-runnable regression tests for all three areas
- `tests/ingest/test_syslog_manager_backpressure.py` — worker pool bounds
in-flight dispatches, queue drops on full, stop_workflow cancels pool cleanly
- `tests/ingest/test_syslog_manager_bind_failure.py` — restart_workflow reports
state="failed" on port conflict; state="stopped" when disabled
- `tests/tool/test_watcher_atomic_save.py` — all three watcher predicates
accept dest_path on rename and reject unrelated / hidden / pycache paths
15 new tests, all passing.
Co-authored-by: Cursor <cursoragent@cursor.com>
xiami762
approved these changes
May 14, 2026
duguwanglong
pushed a commit
to DearEmma/flocks
that referenced
this pull request
May 18, 2026
…ll-workflow feat/tdp alert pull workflow
duguwanglong
added a commit
to DearEmma/flocks
that referenced
this pull request
May 18, 2026
Follow-up to PR AgentFlocks#267: the syslog config form already validated host but accepted any integer for port, allowing values such as 0, 22, or 99999 to be saved and only blow up at bind time with a confusing OS error (e.g. "[Errno 49] can't assign requested address"). - Backend SyslogConfigRequest.port now enforces ge=1, le=65535 via Pydantic so out-of-range ports are rejected with HTTP 422 before the config is persisted or the listener is restarted. - IntegrationTab.tsx adds an inline validator that strictly matches a numeric string (rejecting "5140abc", "3.14", "-1", etc.) and the 1..65535 range; the input shows a red border with a localized hint and the save button is disabled while the value is invalid. - extractErrorMessage now flattens Pydantic 422 detail arrays ([{loc, msg, type}, ...]) into a readable "; "-joined string so scripted callers (or any caller that bypasses the UI validator) see the actual validation message instead of "[object Object]". - Adds zh-CN/en-US copy for detail.run.syslogPortError. Co-authored-by: Cursor <cursoragent@cursor.com>
duguwanglong
added a commit
to DearEmma/flocks
that referenced
this pull request
May 18, 2026
Follow-up to PR AgentFlocks#267: the syslog config form already validated host but accepted any integer for port, allowing values such as 0, 22, or 99999 to be saved and only blow up at bind time with a confusing OS error (e.g. "[Errno 49] can't assign requested address"). - Backend SyslogConfigRequest.port now enforces ge=1, le=65535 via Pydantic so out-of-range ports are rejected with HTTP 422 before the config is persisted or the listener is restarted. - IntegrationTab.tsx adds an inline validator that strictly matches a numeric string (rejecting "5140abc", "3.14", "-1", etc.) and the 1..65535 range; the input shows a red border with a localized hint and the save button is disabled while the value is invalid. - extractErrorMessage now flattens Pydantic 422 detail arrays ([{loc, msg, type}, ...]) into a readable "; "-joined string so scripted callers (or any caller that bypasses the UI validator) see the actual validation message instead of "[object Object]". - Adds zh-CN/en-US copy for detail.run.syslogPortError. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Alert dedup / triage workflow family
alert_dedup→network_alert_dedup→http_alert_dedupprogression: full 4-stageLogProcessPipelineported fromaisoc_mini, then trimmed to a dedup-only variant with explicit branch nodes for log-type and filter routing.dedup_logs, with cross-platform on-disk LSH state, file-locking, and a clearer stats panel.stream_alert_dedupworkflow + ISO-3164 syslog parsing for real-time streaming input.tdp_alert_pull_dedupworkflow for polling TDP API in a loop with built-in dedup.alert_dedup_triagecombines dedup with triage; supports both syslog real-time mode (no summary, low latency) and batch mode (full report). Sub-workflow calls are now embedded via the engine, removing the previous HTTP-service dependency.tdp_alert_triagerewritten as the canonical "NDR/TDP alert investigation" workflow (replaces the older HTTP-log-only variant); supports mixed TDP/Skyeye batches and flat-format TDP triage; finalgenerate_summaryno longer prints the duplicated(cached)label.alert_dedup_triage,http_alert_dedup,stream_alert_dedup,tdp_alert_pull_dedup.2. New syslog ingestion subsystem (
flocks/ingest/syslog/)on_msgcallback so the UDP protocol layer doesn't spawn anasyncio.Taskper datagram (preserves backpressure).flocks/syslog → flocks/ingest/syslogto make room for other ingestion sources.3. Workflow execution store
callCount/successCount/errorCount/avgRuntime) now serialized with anasyncio.Lockso concurrent completions no longer lose increments.Recorder.record_workflow_executionand history-trim run as background tasks — the syslog dispatcher releases its concurrency slot immediately./api/workflow/{id}/historyendpoint switched from N+1 reads to a singleStorage.list_entriesquery (response time dropped from ~180 s to sub-second on large stores).NoneType.updatecrashes when trim races with a long-running step)._active_workflow_executionsis cleaned up viatask.add_done_callbackin addition tofinally, eliminating registry leaks on cancellation paths.4. WebUI
RunTabrefactored andRunTab.test.tsxadded.api/workflow.tsexposes syslog + integration endpoints;workflow.jsonlocales (en/zh) added.5.
run_workflowtool hardeningalert_filemode, and split type branches with clear per-type errors.6. Plugin / agent file watchers — stop self-sustaining reload loop
watchdog.on_any_eventwas firing foropened/closed/closed_no_writeevents; every reload re-opens YAML/Python files and re-triggered the watcher every 2-3 s, keepingToolRegistryand theAgentcache thrashing.modified/created/deleted/movedand ignore__pycache__and dotfile noise.7. Startup hygiene
config.memorydoesn't crash hook registration.Storage.get / list_entriestolerate non-Pydantic types (fixes legacy session migration crashing ondict.model_validate_json).opencodeentry now references the realOpenCodeProvider(was the missingFlocksCompatProvideralias).command_loaderflocks_global_dirNameError typo.mcp.already_initializedwarn to debug.8. Log noise reduction
plugin.tool.duplicateonly warns on genuine cross-source collisions; idempotent re-scans of the same source are silent.agent.toolset.tool_missingdemoted to debug for built-in agents declaring optionallsp_*/ast_grep_searchtools.9. Windows installer
python-build-standaloneruntime bundled in the installer staging directory.10. Channel message tool
API_TOKEN_SECRET_IDand clarify the fallback path.