fix/stream alert dedup memory leak#273
Merged
Merged
Conversation
…lert_dedup Root causes identified: 1. Large alert lists (enriched_alerts, unique_alerts, …) were written verbatim into every workflow_execution SQLite row — both in outputResults and in each executionLog snapshot emitted by _on_step_complete — duplicating megabytes of data that is already persisted to JSONL on disk. 2. The problem affected both the syslog trigger path (manager.py) and the HTTP background execution path (routes/workflow.py), but only the former had a local fix applied; the HTTP path was left unprotected. Changes: - execution_store.py: add compact_outputs_for_storage() and compact_history_for_storage() as the canonical compact API. Lists whose key belongs to DEFAULT_LARGE_LIST_KEYS and whose length exceeds DEFAULT_COMPACT_SIZE_THRESHOLD (100) are replaced with a lightweight _<key>_count integer; small lists and unknown keys pass through unchanged, so ordinary metadata arrays are never silently dropped. - syslog/manager.py: remove the local _compact_outputs/_compact_history implementation and import the shared helpers from execution_store. - routes/workflow.py: apply compact_outputs_for_storage to outputResults and compact_history_for_storage to executionLog in both the background execution path and the workflow-center invoke path. Also compact each step dict inside _on_step_complete before appending to step_history so that every intermediate _write_progress snapshot stays bounded. - tests/workflow/test_execution_store_compact.py: 14 new unit tests covering the size-threshold guard, non-mutation contract, defensive pass-through for non-dict inputs, and the end-to-end size reduction (>1 000x on a 10K-alert payload). Co-authored-by: Cursor <cursoragent@cursor.com>
inputParams was written verbatim into every workflow_execution SQLite row.
For HTTP /run batch calls that pass a large alert list as an input
parameter (e.g. {"raw_alerts": [...10k items...]}) this inflates each row
by the same order of magnitude as the uncompacted outputResults bug fixed
in the previous commit.
Changes:
- execution_store.py: create_execution_record() now passes input_params
through compact_outputs_for_storage() before building the record, so
known large-list keys (raw_alerts, enriched_alerts, etc.) are replaced
with lightweight _<key>_count integers. Scalar fields and unknown keys
are preserved unchanged, so audit / replay use-cases are not affected.
- tests/workflow/test_execution_store_compact.py: two new tests pin the
expected behaviour — "alerts" (not in DEFAULT_LARGE_LIST_KEYS) passes
through unchanged, while "raw_alerts" (in the default set) is compacted.
Co-authored-by: Cursor <cursoragent@cursor.com>
Five issues raised in code review, all fixed in this commit:
1. Docstring/behaviour mismatch — alerts key
create_execution_record()'s docstring previously used {"alerts":[…10k]}
as the example of what gets compacted. "alerts" is NOT in
DEFAULT_LARGE_LIST_KEYS, so the row would silently stay large. The
docstring now cites "raw_alerts" (which IS in the set) and adds an
explicit note that keys outside the default set are stored verbatim.
compact_outputs_for_storage() gains the same clarification so callers
can easily see the full list of compacted keys.
2. workflow_center_invoke — executionLog path
The invoke path proxies to an external service and never populates
executionLog locally, so no data was at risk. Added an explanatory
comment and an explicit compact_history_for_storage() call on the
executionLog field as a forward-compatible guard against future code
that might populate it before the final write.
3. tuple sequences not compacted
compact_outputs_for_storage() checked isinstance(v, list), missing
tuple values that some serialisation paths may produce. Changed to
isinstance(v, (list, tuple)). New test test_compact_outputs_compacts_
tuple_sequences() pins this behaviour.
4. Unused import pytest
Removed the bare "import pytest" from test_execution_store_compact.py;
no pytest.raises / @pytest.mark usages exist in the file.
5. Observability note
compact_outputs_for_storage() docstring now explicitly states that
compacted keys are replaced with _<key>_count and that callers who
need the full list contents must read from the JSONL files written by
the workflow.
Co-authored-by: Cursor <cursoragent@cursor.com>
xiami762
approved these changes
May 15, 2026
duguwanglong
added a commit
to DearEmma/flocks
that referenced
this pull request
May 18, 2026
Five issues raised in code review, all fixed in this commit:
1. Docstring/behaviour mismatch — alerts key
create_execution_record()'s docstring previously used {"alerts":[…10k]}
as the example of what gets compacted. "alerts" is NOT in
DEFAULT_LARGE_LIST_KEYS, so the row would silently stay large. The
docstring now cites "raw_alerts" (which IS in the set) and adds an
explicit note that keys outside the default set are stored verbatim.
compact_outputs_for_storage() gains the same clarification so callers
can easily see the full list of compacted keys.
2. workflow_center_invoke — executionLog path
The invoke path proxies to an external service and never populates
executionLog locally, so no data was at risk. Added an explanatory
comment and an explicit compact_history_for_storage() call on the
executionLog field as a forward-compatible guard against future code
that might populate it before the final write.
3. tuple sequences not compacted
compact_outputs_for_storage() checked isinstance(v, list), missing
tuple values that some serialisation paths may produce. Changed to
isinstance(v, (list, tuple)). New test test_compact_outputs_compacts_
tuple_sequences() pins this behaviour.
4. Unused import pytest
Removed the bare "import pytest" from test_execution_store_compact.py;
no pytest.raises / @pytest.mark usages exist in the file.
5. Observability note
compact_outputs_for_storage() docstring now explicitly states that
compacted keys are replaced with _<key>_count and that callers who
need the full list contents must read from the JSONL files written by
the workflow.
Co-authored-by: Cursor <cursoragent@cursor.com>
duguwanglong
pushed a commit
to DearEmma/flocks
that referenced
this pull request
May 18, 2026
…dedup-memory-leak fix/stream alert dedup memory leak
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(workflow): compact inputParams before persisting execution record