feat: add red-blue correlation engine and learning system for investigation coverage by l50 · Pull Request #26 · dreadnode/ares

l50 · 2026-01-11T01:09:19Z

Key Changes:

Introduced a red-blue correlation engine to assess detection coverage and gaps
Added investigation result persistence and learning system with SQLite backend
Implemented learning tools for querying past investigations and query effectiveness
Enhanced blue agent workflow with query rate limiting, improved evidence handling, and robust timeouts

Added:

Red-Blue Correlation Engine: New src/ares/core/correlation.py parses red and blue reports, correlates activities to detections, generates gap/coverage reports, and outputs markdown
Investigation Persistence: src/ares/core/persistence.py provides SQLite-backed storage for investigation results, query effectiveness stats, and similarity lookup
Learning Tools: src/ares/tools/blue/learning.py exposes tools for querying historical investigations, effective queries, and false positive patterns for agent learning
Query Resilience: src/ares/core/query_resilience.py adds automatic retry, time range reduction, and chunking for large log queries
Remote Command Execution: src/ares/core/remote.py enables AWS SSM-based remote execution for red team tools, with robust SSO credential validation
Comprehensive red-blue correlation and learning tests: tests/test_correlation.py, tests/test_persistence.py, tests/test_learning.py, tests/test_query_resilience.py
Query template tools: src/ares/tools/blue/query_templates.py provides pre-built LogQL queries mapped to MITRE techniques

Changed:

Blue agent now enforces strict query rate limiting (default 5 per investigation), duplicate query detection, and improved evidence extraction
Investigation orchestrator adds watchdog thread for hard timeouts and generates partial reports on timeout
Taskfile and documentation updated for new log/coverage workflow, reduced default max steps, and log management commands
Agent system instructions and investigation prompt templates improved for IOC extraction and anti-loop guidance
Red team agent tools now execute on remote Kali via SSM, with robust error handling and output parsing
Blue agent now records executed queries and integrates query effectiveness into persistence/learning
Added boto3 as a required dependency

Removed:

Legacy local subprocess usage for red team tools; all execution now via remote SSM executor
Unused aiobotocore/aioitertools dependencies from lockfile to resolve S3 compatibility

…n query templates **Added:** - Introduced `src/ares/core/remote.py` for remote command execution on the Kali attack box via AWS SSM, including SSO credential validation, error handling, and a `run_remote` convenience function - Added `QueryTemplateTools` to `src/ares/tools/blue/query_templates.py`, providing MITRE-mapped LogQL query templates for detecting red team attack patterns and AD attacks - Registered `QueryTemplateTools` in blue team toolset and included in agent factory for investigation agent - Added `boto3>=1.42.25` as a dependency for AWS API integration **Changed:** - Updated all red team network toolsets in `src/ares/tools/red/network.py` to execute commands remotely via SSM instead of subprocess, centralizing command execution and error handling - Refactored Taskfile and documentation defaults: lowered polling mode steps to 50 and once mode steps to 15 for agent timeouts; clarified timeout behaviors in `README.md` and `docs/taskfile_usage.md` - Updated AWS region defaults in `Taskfile.yaml` from `us-west-2` to `us-west-1` - In red team orchestrator, added fail-fast SSO credential validation before starting operations - Improved admin access finding validation in red team reporting to reject error-containing results and require success indicators - Improved blue agent orchestrator with a hard signal-based timeout and robust MCP connection handling - Registered new blue team tools and query templates in import/export lists - Updated dependency and lock files (`pyproject.toml`, `uv.lock`) to add and pin `boto3` and compatible AWS packages, and remove unused aiobotocore/aioitertools - Cleaned up subprocess error handling in red team tools, removing timeouts and local file usage in favor of remote SSM execution **Removed:** - Eliminated all local subprocess execution for red team operations in favor of SSM-based remote execution - Removed unused and incompatible `aiobotocore` and `aioitertools` packages from lock file

…igation agent **Added:** - Introduced `WatchdogTimer` class for enforcing hard investigation timeout using a background thread, enabling forced exit and partial report generation even if the event loop is blocked **Changed:** - Replaced Unix-only signal-based hard timeout with cross-platform watchdog thread in `InvestigationOrchestrator` - Updated timeout handling logic to use the new watchdog and improved partial report generation upon timeout - Cleaned up code by removing signal handler setup and exception raising for timeout, delegating forced exit to the watchdog - Adjusted logging to reflect new watchdog mechanism and clarify timeout events **Removed:** - Removed dependency on `signal` module and associated signal handler logic for timeouts - Eliminated `InvestigationTimeoutError` usage and related exception handling from the orchestration flow - Removed code for restoring old signal handlers and alarm cleanup, as they're no longer needed

…vestigation flow **Added:** - Introduced /logs/ directory for agent log files and updated .gitignore to exclude it - Added log directory configuration and automatic log file creation for blue and red team tasks in Taskfile.yaml - Implemented Taskfile log management tasks: list, tail (latest/all/blue/red), and clean - Added log management usage docs to `docs/taskfile_usage.md` - Created timeline event from alert at investigation start for improved reporting - Added `reset_query_tracking()` and query counting utilities to blue_factory to enforce query and tool call limits per investigation - Wrapped Grafana MCP query tools with rate limiting and duplicate query detection - Added max queries/tool calls stop conditions to investigation agent - Blue `record_evidence()` tool now resolves and caches MITRE technique names/tactics - Red agent event logging now debounces rapid/duplicate events for cleaner logs - Red team `secretsdump` tool now includes SMB connectivity check, dc_ip param, and connection timeouts **Changed:** - Default max_steps for blue investigation agent lowered from 150 to 30 for tighter control - Updated all relevant blue and red team tasks to log to per-run logfiles in /logs/ - Blue team investigation flow now enforces strict query and tool call limits; agent is forced to complete if limits are hit - Blue `complete_investigation()` tool now auto-extracts recommendations from alert annotations if none provided, generates fallback synopsis from evidence, and logs more completion details - Enhanced evidence recording: technique metadata resolved and timeline event auto-added from alert - Initial alert prompt and system instructions templates now emphasize query limits, correct IOC extraction, and completion criteria; anti-patterns highlighted - Investigation docs and usage updated to clarify new stop conditions, log management, and completion requirements - Improved blue investigation docs and templates to stress the importance of IOC extraction, evidence recording, and attack synopsis requirements **Removed:** - Removed unused/obsolete warnings and manual validations from blue completion tool - Legacy query loop detection logic replaced by new global query/tool call limiters

…and query resilience **Added:** - Introduced a Red-Blue Correlation Engine for mapping red team activities to blue team detections, generating coverage metrics and detailed markdown reports (`src/ares/core/correlation.py`) - Implemented a persistence layer for storing investigation results, tracking query effectiveness, and similarity-based lookup for new alerts (`src/ares/core/persistence.py`) - Added query resilience module to provide automatic retry, time range reduction, and chunking for large queries to Loki/Prometheus backends (`src/ares/core/query_resilience.py`) - Added `LearningTools` agent toolset to expose past investigation data, effective queries, false positive patterns, and statistics to the agent (`src/ares/tools/blue/learning.py`) - Introduced workflow for generating and updating coverage badge in CI (`.github/workflows/coverage-badge.yaml`) - Added static badge for code coverage to repo (`.github/badges/coverage.svg`) - Added comprehensive test suites for correlation, learning, persistence, and query resilience modules (`tests/test_correlation.py`, `tests/test_learning.py`, `tests/test_persistence.py`, `tests/test_query_resilience.py`) **Changed:** - Extended `InvestigationOrchestrator` to persist all completed, escalated, timed out, and failed investigations for later learning and analysis - Updated query tool wrapping in `blue_factory.py` to integrate rate limiting, duplicate detection, and resilient execution via the new resilience module - Added `LearningTools` to agent toolset for blue investigations - Updated `.pre-commit-config.yaml` to exclude `tests/` from mypy type checks - Modified test workflow to output coverage as XML and upload coverage artifact for badge generation (`.github/workflows/tests.yaml`) - Updated `src/ares/tools/blue/__init__.py` to export new learning tools - Various code comments and docstrings cleaned up for clarity and conciseness **Removed:** - None

**Changed:** - Refactored LearningTools to use a public `store` attribute instead of a private `_store` with property logic, simplifying initialization and access - Replaced all direct store accesses with a `get_store()` method to ensure store is initialized when needed - Updated tests to use the public `store` attribute and `get_store()` method, reflecting the new initialization and access pattern - Improved class and attribute documentation for clarity

linear · 2026-01-11T01:09:22Z

CAP-822 Add AWS SSM Remote Execution & Enhance Blue/Red Tooling

Description:
Implement AWS SSM-based remote command execution, expand blue team Grafana query templates, and improve both blue and red team tools. This upgrade aims to streamline SOC investigations, strengthen network scanning for red teams, and clarify system configuration defaults.

Objective:

Enable remote command execution on EC2 instances via AWS SSM, enhance blue team investigation capabilities with new query templates and SOC tools, improve red team network scanning, and update configuration defaults and documentation for better usability.

Scope of Work:

Implement remote.py module to support AWS SSM command execution on remote EC2 (Kali attack box)
Create and integrate Grafana query templates in query_templates.py for blue team use
Enhance soc_investigator.py to improve SOC investigation workflows
Update actions.py with new or improved blue team investigation actions
Improve red team network scanning capabilities in network.py
Update default configuration (max-steps) and document timeout behavior in README, Taskfile.yaml, and docs

Dependencies:

AWS IAM permissions for SSM access
EC2 instances registered with AWS Systems Manager
Grafana environment for query template integration
None identified beyond above

Acceptance Criteria:

remote.py enables successful execution of remote shell commands on registered EC2 instances via AWS SSM.
Blue team Grafana query templates are present, well-documented, and usable within investigations.
SOC investigator tool improvements are functional and demonstrably enhance investigation workflows.
Red team network scanning tool updates are implemented and tested.
Configuration defaults (max-steps) and timeout behaviors are clearly updated in documentation.
All changes are reflected in the relevant source files and documentation.

Additional Notes:

Ensure all AWS credentials and permissions are handled securely.
Reference AWS SSM documentation for implementation: https://docs.aws.amazon.com/systems-manager/latest/userguide/execute-remote-commands.html
Include code samples in documentation where relevant for new features.
Coordinate with DevOps for deployment/testing on EC2 instances.

…enhance-bluered-tooling

l50 added 5 commits January 9, 2026 19:19

dreadnode-renovate-bot Bot added area/docs Changes made to project documentation area/python area/pre-commit Changes made to pre-commit hooks labels Jan 11, 2026

Merge branch 'main' into jayson/cap-822-add-aws-ssm-remote-execution-…

6e8251a

…enhance-bluered-tooling

dreadnode-renovate-bot Bot added area/templates Changes made to warpgate template configurations area/github Changes made to GitHub Actions workflows type/core labels Jan 11, 2026

l50 merged commit 5454ff0 into main Jan 11, 2026
8 checks passed

l50 deleted the jayson/cap-822-add-aws-ssm-remote-execution-enhance-bluered-tooling branch January 11, 2026 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add red-blue correlation engine and learning system for investigation coverage#26

feat: add red-blue correlation engine and learning system for investigation coverage#26
l50 merged 6 commits intomainfrom
jayson/cap-822-add-aws-ssm-remote-execution-enhance-bluered-tooling

l50 commented Jan 11, 2026

Uh oh!

linear Bot commented Jan 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

l50 commented Jan 11, 2026

Uh oh!

linear Bot commented Jan 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant