feat: improve blue team detection prompt accuracy and log query performance#217
Merged
feat: improve blue team detection prompt accuracy and log query performance#217
Conversation
…res-core (#214) **Key Changes:** - Moved YAML-driven detection configuration and logic from ares-tools to ares-core - Converted all lateral movement mappings, patterns, and MITRE technique lookups to be YAML-driven - Refactored query building and pattern detection to use shared config and types - Improved host/computer label handling and made LogQL selectors more accurate **Added:** - Shared detection config module in `ares-core/src/detection/mod.rs` for loading and querying `detections.yaml` - Centralized `ares-core/src/detection/detections.yaml` with lateral movement patterns, template connection types, and MITRE mappings - New tests for lateral pattern loading, brute force query logic, and MSSQL template resolution in ares-tools **Changed:** - Lateral movement analyzer and patterns now load connection types and regexes from YAML config, eliminating hardcoded match arms - MITRE technique mapping for connection types now derives from template YAML, with hardcoded values as fallback only - Refactored detection config usage in ares-tools to re-export types from ares-core - LogQL query builder and detection template lookup in ares-tools now use shared types and support negative/exclusion patterns - Changed LogQL label selection to use the `computer` label instead of `hostname` for greater accuracy and coverage - Refined suggested queries in investigation tools and reports to use correct job labels and computer names - Added more accurate event ID, pattern, and connection type metadata to detection templates - Improved report generator to aggregate MITRE techniques from both state and evidence items **Removed:** - All hardcoded detection patterns, connection type match arms, and technique ID maps in ares-core lateral movement modules - Redundant detection config types and logic from ares-tools in favor of using the shared ares-core module - Unnecessary use of `Box::leak` for static string lifetimes in lateral pattern code
… query performance **Added:** - Inject deployment label into blue team system prompts if available, allowing sub-agents to use the correct deployment in Loki queries for improved accuracy - Partition tool callbacks in the agent loop to run `dispatch_*` tools in parallel and lifecycle callbacks sequentially for better performance and correct short-circuiting - Add ready-to-use and triage detection query templates with deployment variable support in blue team agent documentation for operational speed and clarity - Implement a 5-minute TTL cache for Loki log queries to eliminate duplicate queries within investigations, improving response time and reducing load - Extend system prompt and template rendering to accept arbitrary extra context values (e.g., deployment label) for greater flexibility **Changed:** - Blue team prompt builder and orchestrator now extract deployment label from alert metadata or environment variable and pass it into templates - Blue team agent templates (triage, threat hunter, lateral analyst) now conditionally render deployment label guidance and queries based on provided deployment value - Blue callback handler and investigation flow updated to propagate deployment context to all blue agent sub-tasks - `build_pattern_filter` in detection utilities now uses fast `|=` contains filters for up to 3 simple literals, falling back to regex only when needed, improving Loki query speed - Maximum stored evidence validation results increased from 10 to 50 to avoid premature eviction and improve confidence scoring in multi-agent investigations - Increase Loki query retry attempts to 3 and improve backoff, using the cache for all identical queries within TTL - Update and expand detection utility tests for new pattern filter logic and cache behaviors - Improve orchestrator and worker prompt-building logic to support new prompt signatures and deployment context injection **Removed:** - Hardcoded deployment label strings from documentation and code paths, relying instead on dynamic injection via context
…ion logic and docs **Added:** - Added `exclude_patterns` to DCSync and replication detections to filter out machine accounts (usernames ending with `$`) in `detections.yaml` - Introduced test cases verifying DCSync templates exclude machine accounts in `tests.rs` - Expanded blue team agent docs with mandatory DCSync attribution decision tree and clearer detection logic in `threat_hunter.md.tera` **Changed:** - Updated LogQL queries for DCSync detection in both triage and threat hunter docs to exclude machine accounts, clarifying attack vs. normal replication - Improved documentation in triage and threat hunter agents to explain the exclusion logic and correct interpretation of DCSync results - Enhanced JSON examples in triage workflow to reference attacker-only DCSync queries and clarify descriptions **Removed:** - Removed outdated DCSync interpretation rules and explanations that did not account for machine account exclusion in `threat_hunter.md.tera` and `triage.md.tera`
**Added:** - Added test assertions to ensure DCSync exclusion patterns use `.u003e` to match JSON-escaped `>` in Loki logs **Changed:** - Updated detection templates and documentation to use `.u003e` instead of `>` in regex patterns for DCSync exclusion, ensuring compatibility with Loki's JSON-escaped XML output - Updated LogQL queries and triage documentation to reflect `.u003e` usage in DCSync machine account exclusions **Why:** - Loki stores XML `>` as `\u003e`, so regex patterns must match `.u003e` to correctly exclude machine accounts and avoid false positives in DCSync detection
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Key Changes:
Added:
optional deployment label, automatically injected into prompt templates for
precise log query instructions in sub-agents
eliminating duplicate queries and reducing API load during investigations
DCSync, Kerberoasting, Golden Ticket, and lateral movement detection, with
copy-paste LogQL for live environments
Changed:
deployment label from alert or environment and pass it to prompt builder,
ensuring all sub-agents use the correct deployment context
render the deployment label and offer tailored detection query advice
build_pattern_filternow uses Loki's fast|=contains for 1-3 simple patterns, falling back to regex only when needed,
drastically improving query speed for common detection patterns
query results, ensuring evidence validation remains robust even with many
sub-agent queries
query_logsnow caches results by (logql, time window),reducing repeat calls; retry logic increased to 3 attempts with better
backoff; cache is pruned as needed
hunting after triage, adds stronger language to avoid premature completion
Removed:
dynamically if available, or fallback instructions shown otherwise