Skip to content

Instrument metadata digest utilisation#93

Closed
jbarnes850 wants to merge 5 commits intomainfrom
feature/adapter-metadata-digest
Closed

Instrument metadata digest utilisation#93
jbarnes850 wants to merge 5 commits intomainfrom
feature/adapter-metadata-digest

Conversation

@jbarnes850
Copy link
Copy Markdown
Contributor

Summary

  • add per-section size accounting and budget utilisation metrics to the prompt digest helper
  • log a warning when the digest exceeds 75% of the provider budget so prompt bloat is visible without extra config
  • extend docs and unit tests to cover the new stats and warning path while keeping the digested payload unchanged for users

Testing

  • pytest tests/unit/connectors/test_prompt_digest.py tests/unit/test_openai_adapter.py

Copilot AI review requested due to automatic review settings October 29, 2025 19:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements a comprehensive learning evaluation infrastructure for the Atlas system, introducing playbook entries (formerly "policy nuggets"), usage tracking instrumentation, impact metrics, and prompt digest functionality to handle large metadata payloads. The changes enable systematic measurement of adaptive efficiency and cross-incident transfer for learning-based agent behavior.

Key Changes:

  • Renamed "policy nuggets" to "playbook entries" with structured schema, rubric gates, and provenance tracking
  • Added runtime usage instrumentation to track cue hits, action adoptions, and failure signals per playbook entry
  • Implemented prompt digest system to trim large metadata blobs for providers with smaller context windows (e.g., Claude)

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
atlas/learning/playbook_entries.py New module implementing playbook entry schema validation, rubric scoring (actionability, generality, hookability, concision), and gate enforcement
atlas/learning/usage.py New runtime tracker capturing cue hits, action adoptions, and session outcomes (reward, tokens, incident IDs, retry counts)
atlas/learning/synthesizer.py Extended to evaluate playbook entries against rubric gates, merge impact metrics into learning state, and maintain provenance metadata
atlas/connectors/prompt_digest.py New digest builder that trims metadata sections to fit provider-specific character budgets while preserving high-signal content
atlas/connectors/openai.py Integrated prompt digest into message building; metadata now digested before serialization
atlas/evaluation/learning_report.py Added playbook metrics, lifecycle summary, per-entry impact tracking, usage metrics, and efficiency snapshots to summary/markdown outputs
atlas/core/__init__.py Wired usage tracker into session lifecycle; captures reward/token deltas, incident context, and merges impact rolls into learning state
atlas/personas/student.py & teacher.py Instrumented cue detection and action adoption recording at runtime
atlas/config/models.py Added config models for digest, playbook schema, gates, rubric weights, and usage tracking
scripts/eval_learning.py Extended CLI with prompt variant, synthesis model, pamphlet injection, and playbook entry label flags
docs/learning_eval.md Documented prompt digest, playbook schema, impact metrics, experiment configs, and evaluation workflow
configs/eval/*.yaml New evaluation configs for baseline, scope-shift, and Claude synthesis variants
tests/unit/*.py Added test coverage for usage tracker, prompt digest, learning report impact sections, and synthesizer gate failures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


__all__ = ["OpenAIAdapter"]

logger = logging.getLogger(__name__)
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate logger initialization. The logger is already initialized at line 27. Remove this redundant declaration at line 205.

Suggested change
logger = logging.getLogger(__name__)

Copilot uses AI. Check for mistakes.
Comment thread atlas/config/models.py
allowed_runtime_handles: List[str] = Field(default_factory=list)
runtime_handle_prefixes: List[str] = Field(default_factory=list)
cue_types: List[str] = Field(default_factory=lambda: ["regex", "keyword", "predicate"])
default_scope_category: str = "differentiation"
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The default scope category is set to \"differentiation\" in the schema config, but in learning_overhaul_base.yaml and learning_overhaul_claude.yaml it's overridden to \"reinforcement\" (lines 183 and 127 respectively). This inconsistency could lead to confusion. Consider documenting why the global default differs from the config-specific defaults or aligning them for clarity.

Suggested change
default_scope_category: str = "differentiation"
default_scope_category: str = "reinforcement"

Copilot uses AI. Check for mistakes.
Comment thread atlas/core/__init__.py
try:
return float(total)
except (TypeError, ValueError):
total = None
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable total is not used.

Suggested change
total = None
pass

Copilot uses AI. Check for mistakes.
@jbarnes850 jbarnes850 self-assigned this Oct 29, 2025
@jbarnes850 jbarnes850 added the enhancement New feature or request label Oct 29, 2025
@jbarnes850
Copy link
Copy Markdown
Contributor Author

Superseded by #94

@jbarnes850 jbarnes850 closed this Oct 29, 2025
@jbarnes850 jbarnes850 deleted the feature/adapter-metadata-digest branch November 1, 2025 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Development

Successfully merging this pull request may close these issues.

2 participants