Skip to content

refactor: optimize and clarify Loki query templates for detection tools#30

Merged
l50 merged 2 commits intomainfrom
jayson/cap-834-optimize-logql-query-templates-to-prevent-blue-agent
Jan 11, 2026
Merged

refactor: optimize and clarify Loki query templates for detection tools#30
l50 merged 2 commits intomainfrom
jayson/cap-834-optimize-logql-query-templates-to-prevent-blue-agent

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented Jan 11, 2026

Key Changes:

  • Refactored query construction to prioritize label selectors for better Loki performance
  • Reduced default log query range from 4 hours to 1 hour for faster results
  • Replaced broad regex and line filters with more targeted event and tool filters
  • Centralized and documented query-building logic for maintainability

Added:

  • Optimized query builder methods in QueryTemplateTools:
    • _build_selector: Composes efficient label selectors, prioritizing narrow
      host/job filters as recommended by Grafana Loki best practices
    • _build_event_filter: Selectively filters by event IDs using contains or
      simple regex for performance
    • _build_pattern_filter: Generates case-insensitive regex filters for
      attacker/tool patterns

Changed:

  • Default query label selector is now {job=~".+"} with an option to override
    using default_label_selector for better targeting
  • Default query time range reduced to 1 hour via default_hours_back for
    speedier queries (configurable per query)
  • All detection methods now:
    • Use optimized label selectors for host/job targeting rather than line-based
      regex
    • Place event ID filters before tool/pattern filters for greater selectivity
    • Use contains (|=) before regex (|~) wherever possible for faster
      matching
    • Accept hours_back as an optional parameter, defaulting to the optimized
      value
  • In detection functions (e.g., detect_port_scanning, detect_brute_force),
    reorganized the logical construction of queries for clarity and efficiency,
    and updated docstrings to reflect the new defaults and optimizations
  • Host and user activity functions now leverage _build_selector for label
    targeting instead of regex line filters, improving query performance and
    readability
  • Added extensive inline documentation explaining Loki query optimization
    rationale throughout the query template code

Removed:

  • Deprecated line-based regex host filters (e.g., hostname=~".*host.*") in
    favor of direct label selectors
  • Obsolete multi-hour (4h) query defaults for all detection methods, replaced
    by a 1-hour default to minimize load and improve responsiveness

Addresses CAP-834

- Add configurable default_label_selector parameter to QueryTemplateTools
  for better label filtering instead of scanning all streams
- Create _build_selector(), _build_event_filter(), and _build_pattern_filter()
  helper methods for optimized query construction
- Reduce default time range from 4 hours to 1 hour for faster queries
- Put event ID filters first (most selective) before tool/pattern filters
- Remove leading .* from hostname regex patterns per Grafana best practices
- Update all query methods to use new optimized helpers
- Document configuration options in blue_factory.py

Addresses CAP-834
@linear
Copy link
Copy Markdown

linear Bot commented Jan 11, 2026

CAP-834 Optimize LogQL Query Templates to Prevent Blue Agent Timeouts

Description:
The blue agent SOC investigator is experiencing persistent timeouts when running LogQL queries against Loki, due to inefficient query patterns in query_templates.py. This task involves refactoring query templates to follow Grafana best practices, reducing query timeouts and improving overall performance.


Objective:

Refactor LogQL query templates to eliminate anti-patterns and optimize performance, ensuring blue agent queries execute successfully within acceptable timeframes.


Scope of Work:

  • Add configurable default_label_selector parameter to QueryTemplateTools
  • Replace {job=~".+"} with specific labels (e.g., {job="eventlog"}) in all templates
  • Refactor event ID filters to use |= (contains) instead of regex, and place them before tool name patterns
  • Update hostname selectors to remove leading .* in regex patterns (e.g., hostname=~"host")
  • Reorder filters to put the most selective filters (event IDs) first
  • Reduce default query time range from 4 hours to 1 hour
  • Implement a query_loki_stats pre-check before executing expensive queries
  • Update blue_factory.py to pass label context when creating QueryTemplateTools

Dependencies:

  • Access to and understanding of src/ares/tools/blue/query_templates.py and src/ares/core/factories/blue_factory.py
  • Review of Grafana Loki best practices documentation
  • None identified beyond above

Acceptance Criteria:

  1. All query templates use specific label selectors instead of {job=~".+"}
  2. Event ID filters are implemented using |= and are the first filter in the query
  3. Hostname matching uses optimized regex patterns (no leading .*)
  4. Default time range for queries is set to 1 hour
  5. query_loki_stats pre-check is called before running expensive queries
  6. Blue agent queries no longer timeout under normal operating conditions (verified in test or staging)
  7. All code changes are reviewed and pass existing tests

Additional Notes:


- Add _optimize_logql_query() function to warn about broad selectors
- Call optimizer in rate_limited_wrapper before executing queries
- Add LogQL Performance Optimization section to agent system prompt
- Update example queries to use specific label selectors
- Guide agent to use list_loki_label_values before constructing queries

This complements the QueryTemplateTools fix by also addressing queries
constructed directly by the LLM agent when using MCP tools.
@dreadnode-renovate-bot dreadnode-renovate-bot Bot added the area/templates Changes made to warpgate template configurations label Jan 11, 2026
@l50 l50 merged commit da466b7 into main Jan 11, 2026
8 checks passed
@l50 l50 deleted the jayson/cap-834-optimize-logql-query-templates-to-prevent-blue-agent branch January 11, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/templates Changes made to warpgate template configurations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant