Skip to content

[Enhancement]: Track topology-aware ACG cache/token benefit #323

Description

@teerthsharma

Affected area

Adaptive runtime

Problem or opportunity

This is a feature request.

ACG can observe repeated agent workflows, but the current benefit is hard to evaluate if the metric is only "converged" or "not converged". For repeated agent tasks, the useful question is whether ACG can keep the reusable prompt prefix stable enough for cache reuse while allowing task-specific suffix content to change.

A repeated agent pipe can include the system policy, tool schema, workflow scaffold, and output contract. The task-specific suffix changes each run, but the reusable prefix should remain measurable as stable structure.

Proposed enhancement

Track topology-aware ACG benefit with explicit cache/token-oriented metrics for repeated agent workflows.

The enhancement should make it possible to report whether ACG learned a reusable prefix, whether that prefix stayed stable across changed tasks, and whether the run avoided repeated uncached prefix processing.

Useful metrics include:

  • stable prefix length and fingerprint
  • profile convergence state
  • observations per profile
  • profile reuse count
  • modeled uncached tokens
  • modeled cached/read tokens
  • repeated uncached prefix tokens avoided
  • provider/local cache hit count when available
  • full end-to-end timing when measured by a harness

Runtime contract and binding impact

The source of truth should remain in the Rust adaptive runtime. Rust should expose the telemetry/state needed to report topology-aware ACG benefit. Node.js and Python should preserve access to the same adaptive fields where they expose adaptive runtime behavior.

Go, WebAssembly, and C FFI impact is N/A unless those experimental bindings expose the same adaptive telemetry surface later.

Alternatives considered

Leave convergence as the only visible metric. That is weaker because a converged/not-converged result does not show whether repeated prefix work was avoided or whether the stable workflow structure stayed reusable while task content changed.

Benchmark only end-to-end latency. That is useful but not sufficient because local harness latency can be affected by machine load, ordering, time drift, or model scheduling. Token/cache counters are needed alongside latency to explain the mechanism.

Acceptance criteria

  • ACG can report stable reusable prefix behavior separately from task-specific suffix variation.
  • Topology-aware profiles expose convergence state, observations, and reuse count.
  • Cache/token benefit can be reported with uncached tokens, cached/read tokens, and repeated uncached prefix tokens avoided.
  • Latency claims are supported by full end-to-end timing, success/timeout counts, and exact experiment setup.
  • Baseline reporting uses N/A for topology convergence fields that do not apply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions