Skip to content

feat: add Grafana detection rule creation and improve LogQL guidance#33

Merged
l50 merged 3 commits intomainfrom
jayson/cap-835-optimize-blue-agent-loki-querying-and-evidence-validation
Jan 11, 2026
Merged

feat: add Grafana detection rule creation and improve LogQL guidance#33
l50 merged 3 commits intomainfrom
jayson/cap-835-optimize-blue-agent-loki-querying-and-evidence-validation

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented Jan 11, 2026

Key Changes:

  • Introduced create_detection_rule tool for automated Grafana alert rule creation
  • Updated agent instructions to include advanced alert rule creation workflow
  • Enforced stricter use of specific LogQL label selectors in templates and docs
  • Improved query planning and performance optimization guidance for LogQL
  • Auto-rewrite broad selectors to prevent timeouts

Added:

  • create_detection_rule async method to GrafanaTools for creating alert rules based
    on investigation findings, with validation for label selectors, severity, and folder
    management
  • _ensure_alert_folder internal helper to automatically provision the required
    Grafana folder for new alert rules
  • _is_mitre_technique_description to skip validation for MITRE technique IDs (they're
    classifications, not raw log data)
  • Auto-rewriting of broad LogQL selectors ({deployment=~".+"}, etc.) to {job="eventlog"}
    to prevent query timeouts
  • New documentation in agent system instructions for alert rule creation, including
    guidance on when and how to use create_detection_rule, with examples

Changed:

  • Enhanced instructions in system_instructions.md.jinja to require label discovery
    (using list_loki_label_names and list_loki_label_values) before running queries,
    and to use only specific labels like {job="eventlog"} to avoid timeouts
  • Expanded the LogQL query optimization section with stricter requirements and practical
    examples for contains (|=) vs regex (|~) usage, combining queries, and avoiding
    anti-patterns
  • Updated default label selector in QueryTemplateTools to {job="eventlog"} instead
    of broad patterns, and clarified override instructions
  • Strengthened query planning workflow, instructing agents to plan and combine queries
    before execution

Removed:

  • Deprecated or redundant broad LogQL label selectors from default values and examples
    in query templates and agent instructions
  • Outdated anti-patterns in agent workflow that allowed repeated broad queries without
    evidence recording or label discovery

l50 added 3 commits January 11, 2026 14:07
**Added:**

- Introduced `create_detection_rule` method to GrafanaTools for creating alert
  rules via the provisioning API, including label validation, severity handling,
  and automatic folder management
- Added `_ensure_alert_folder` helper to automatically manage the alert rule
  folder in Grafana
- Implemented `_is_mitre_technique_description` utility to skip MITRE technique
  descriptions during evidence value validation

**Changed:**

- Updated evidence validation logic to skip validation for MITRE technique
  description values and log the action for transparency
- Improved documentation and step-by-step instructions in
  `system_instructions.md.jinja`:
  - Emphasized mandatory label discovery before querying logs
  - Updated LogQL performance advice and provided concrete selector examples
  - Clarified and reordered investigation workflow steps for better clarity
  - Provided explicit examples for Windows event log queries with correct label usage
  - Added section detailing when and how to create alert rules using the new
    `create_detection_rule` tool
- Changed default label selector in `QueryTemplateTools` from a broad
  `{job=~".+"}` to `{job="eventlog"}` for better performance and safer queries

**Removed:**

- Removed outdated examples and advice using broad label selectors from
  documentation in favor of specific, performant patterns
**Changed:**

- Removed redundant or obvious comments that restated code actions across multiple
  modules, including alert correlation, evidence validation, query templates, and
  factory files, to enhance readability and reduce noise
- Improved clarity by keeping only non-obvious comments or those with explanatory
  value
…ry usage

**Added:**

- Added '{hostname=~".+"}' to list of broad selector patterns to detect more
  problematic queries
- Updated logic to auto-rewrite broad LogQL selectors to '{job="eventlog"}'
  instead of just warning, reducing risk of timeouts
- Modified query handling to update kwargs with the optimized query if
  rewriting occurs, ensuring downstream code uses the corrected query

**Changed:**

- Changed _optimize_logql_query docstring to clarify that it now rewrites
  queries for safety rather than just warning
- Adjusted logging messages to reflect that queries are auto-rewritten instead
  of only warning the user
- Improved pattern matching to allow multiple broad selectors in a single
  query to be rewritten, rather than stopping after the first
- Ensured duplicate query checking and later logic operates on the optimized
  query when a rewrite occurs
@linear
Copy link
Copy Markdown

linear Bot commented Jan 11, 2026

CAP-835 Optimize Blue Agent Loki Querying and Evidence Validation

Description:
Enhance the Blue Agent's interaction with Loki by refining query selectors, improving evidence validation, reducing redundant queries, and optimizing filter usage. These improvements aim to decrease timeouts, improve accuracy, and boost overall agent performance.


Objective:

Reduce query timeouts and duplication while improving evidence validation accuracy and query efficiency in the Blue Agent's Loki integration.


Scope of Work:

  • Replace broad Loki selectors (e.g., {deployment=~".+"}) with specific labels based on discovered label names.
  • Implement logic for the agent to retrieve and utilize available Loki label names via list_loki_label_names.
  • Update evidence validation to either exclude MITRE technique descriptions or enhance extraction to match technique IDs in query results.
  • Refactor query planning to minimize duplicate or redundant Loki queries.
  • Update agent/system prompts to prefer literal filters (|=) over regex filters (|~) for exact string matching.
  • Research and outline approach for the agent to create alert rules based on observed patterns or evidence.

Dependencies:

  • Access to Loki API and permissions to use list_loki_label_names.
  • Knowledge of existing agent query planning and evidence validation logic.
  • Existing prompt templates for system/agent interaction.
  • None identified beyond these.

Acceptance Criteria:

  1. Agent no longer uses broad selectors such as {deployment=~".+"}; queries utilize specific, discovered labels.
  2. Evidence validation either excludes MITRE technique descriptions or successfully matches technique IDs in query results.
  3. The number of redundant or duplicate queries issued per investigation is reduced by at least 50%.
  4. Queries use literal (|=) filters for exact string matching in at least 90% of applicable cases.
  5. Documentation or implementation plan exists for agent alert rule creation based on identified patterns.

Additional Notes:


@dreadnode-renovate-bot dreadnode-renovate-bot Bot added area/templates Changes made to warpgate template configurations area/src labels Jan 11, 2026
@l50 l50 merged commit 9626281 into main Jan 11, 2026
8 checks passed
@l50 l50 deleted the jayson/cap-835-optimize-blue-agent-loki-querying-and-evidence-validation branch January 11, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/templates Changes made to warpgate template configurations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant