Skip to content

feat: add agent decision reasoning extraction and tracing for tool selection#153

Merged
l50 merged 4 commits intomainfrom
feature/agent-decision-tracing
Mar 18, 2026
Merged

feat: add agent decision reasoning extraction and tracing for tool selection#153
l50 merged 4 commits intomainfrom
feature/agent-decision-tracing

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented Mar 17, 2026

Key Changes:

  • Introduced extraction and storage of agent tool selection reasoning for traceability
  • Added OpenTelemetry span creation for agent tool selection decisions
  • Enhanced TimelineEvent model with structured extra_data for decision events
  • Provided helper functions and comprehensive unit tests for reasoning extraction and confidence estimation

Added:

  • Decision reasoning extraction helpers _extract_reasoning_text and _estimate_confidence to parse LLM responses and estimate confidence heuristically
  • Async hook in create_role_hooks to capture agent reasoning and tool selection, storing results in both the operation timeline and Redis for crash recovery
  • OpenTelemetry tracing via trace_decision, recording tool selection decisions and reasoning for post-hoc analysis
  • TimelineEvent model now includes an extra_data_json field for structured, JSON-encoded decision metadata
  • Multi-forest helpers in SharedRedTeamState to identify and track undominated forests for multi-domain operations
  • Unit tests for reasoning extraction, confidence estimation, and decision tracing in test_red_agents.py and test_tracing.py

Changed:

  • Agent instruction templates now receive multi-forest context and undominated forests for more accurate role guidance
  • Role-specific instruction loading updated to handle additional context
  • TimelineEvent serialization in publishing and Redis persistence now uses the model's to_dict(), with consistent timestamp formatting and extra_data handling
  • Domain admin tracking updated to manage multi-forest mode, allowing for DA status across multiple trusted domains

Removed:

  • Manual, per-field TimelineEvent dict serialization in publishing; replaced with single to_dict() call for consistency

**Added:**

- Decision reasoning extraction helpers (`_extract_reasoning_text`, `_estimate_confidence`)
  for capturing agent LLM reasoning behind tool selection
- Hook to log and persist agent tool selection decisions with reasoning,
  confidence, and OTel span correlation for traceability
- TimelineEvent `extra_data_json` field to store structured metadata (e.g.,
  decision details) and `extra_data` property for deserialization
- `trace_decision` function to record agent decision spans with tool,
  reasoning, confidence, and MITRE/attack category enrichment for Tempo/OTel
- Multi-forest helpers to track DA status per trusted domain/forest
  (`all_forests_dominated`, `get_undominated_forests`) for red team ops
- Tests for decision reasoning helpers and decision tracing

**Changed:**

- Role-specific agent instruction loading now injects multi-forest context
  (`multi_forest_mode`, `undominated_forests`) for template rendering
- Timeline event serialization now consistently uses `to_dict()` and ensures
  ISO timestamp formatting for backend storage
- DA hash acquisition now updates per-domain DA tracking for multi-forest ops
  and only sets global DA flag on first acquisition
- TimelineEvent `to_dict()` now outputs `extra_data` as a dict for storage if
  present

**Removed:**

- Legacy manual TimelineEvent dict construction in publishing, replaced with
  `to_dict()` for consistency
l50 added 3 commits March 17, 2026 22:52
**Added:**

- Introduced `multi_forest_mode` option to operation configuration to allow
  operations to continue until domain admin is achieved on all trusted forests
- Added `get_multi_forest_mode()` function to access the new config option

**Changed:**

- Updated internal config builder to support the new `multi_forest_mode` field

**Removed:**

- None
…est environments

**Changed:**

- Updated SharedRedTeamState logic to ensure domain admin achievement is only
  logged, weaknesses added, and OpenTelemetry traces emitted once per domain,
  supporting multi-forest scenarios and preventing duplicate signals for the
  same domain - src/ares/core/models.py
- Improved unit test to verify that traces for domain admin achievement are
  emitted once for each unique domain, and not duplicated for repeated hashes
  from the same domain - tests/core/dispatcher/test_dispatcher.py
- Fixed test monkeypatch for loading agent instructions to accept generic kwargs,
  preventing argument errors in agent factory tests - tests/core/factories/test_red_agents.py
@l50 l50 merged commit 03d701f into main Mar 18, 2026
7 checks passed
@l50 l50 deleted the feature/agent-decision-tracing branch March 18, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant