Skip to content

Report research ROI in token accounting and run-health artifacts #202

Description

@azalio

Summary
MAP records tokens by subtask/agent/phase, including research-agent tokens, but it does not surface whether delegated research reduced downstream Actor/Monitor exploration or paid for itself. Add run-level metrics that make research ROI visible.

Paper basis
FastContext reports both end-to-end score and main-agent token savings. Their key efficiency claim is that exploration tokens move out of the expensive main trajectory and return compact evidence. For MAP, the practical equivalent is tracking research-agent cost versus downstream Actor/Monitor token/turn cost and artifact quality.

Current evidence

  • src/mapify_cli/templates_src/hooks/map-token-meter.py.jinja:13-31 meters SubagentStop and Stop events.
  • src/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinja:839-923 records token events attributed to subtask/phase/agent.
  • src/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinja:929-1028 rolls up token_accounting.json by subtask, agent, and phase.
  • tests/test_map_token_meter.py:86-206 validates subagent and main-session token attribution.
  • README.md:166-172 advertises token budget and run-health diagnostics, but not research ROI.

Proposal
Extend token_accounting.json, token_report, or run_health_report with research-specific metrics:

  • research token share by subtask
  • Actor/Monitor tokens after research
  • count of research artifacts and locations returned
  • malformed/low-confidence research counts once a research validator exists
  • optional before/after proxy: Actor broad-search commands after research, when detectable from transcripts

Acceptance criteria

  • Token report shows research-agent/researcher cost separately from Actor/Monitor/orchestrator.
  • Run health includes a concise research section: artifacts present, confidence/status if parseable, low-confidence warnings, and token share.
  • Tests cover aggregation when research-agent has tokens, when only direct research is saved, and when no research tokens are available.
  • The metric is advisory and never blocks workflow completion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions