Conversation
There was a problem hiding this comment.
Pull Request Overview
This pull request implements a comprehensive learning evaluation infrastructure for the Atlas system, introducing playbook entries (formerly "policy nuggets"), usage tracking instrumentation, impact metrics, and prompt digest functionality to handle large metadata payloads. The changes enable systematic measurement of adaptive efficiency and cross-incident transfer for learning-based agent behavior.
Key Changes:
- Renamed "policy nuggets" to "playbook entries" with structured schema, rubric gates, and provenance tracking
- Added runtime usage instrumentation to track cue hits, action adoptions, and failure signals per playbook entry
- Implemented prompt digest system to trim large metadata blobs for providers with smaller context windows (e.g., Claude)
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
atlas/learning/playbook_entries.py |
New module implementing playbook entry schema validation, rubric scoring (actionability, generality, hookability, concision), and gate enforcement |
atlas/learning/usage.py |
New runtime tracker capturing cue hits, action adoptions, and session outcomes (reward, tokens, incident IDs, retry counts) |
atlas/learning/synthesizer.py |
Extended to evaluate playbook entries against rubric gates, merge impact metrics into learning state, and maintain provenance metadata |
atlas/connectors/prompt_digest.py |
New digest builder that trims metadata sections to fit provider-specific character budgets while preserving high-signal content |
atlas/connectors/openai.py |
Integrated prompt digest into message building; metadata now digested before serialization |
atlas/evaluation/learning_report.py |
Added playbook metrics, lifecycle summary, per-entry impact tracking, usage metrics, and efficiency snapshots to summary/markdown outputs |
atlas/core/__init__.py |
Wired usage tracker into session lifecycle; captures reward/token deltas, incident context, and merges impact rolls into learning state |
atlas/personas/student.py & teacher.py |
Instrumented cue detection and action adoption recording at runtime |
atlas/config/models.py |
Added config models for digest, playbook schema, gates, rubric weights, and usage tracking |
scripts/eval_learning.py |
Extended CLI with prompt variant, synthesis model, pamphlet injection, and playbook entry label flags |
docs/learning_eval.md |
Documented prompt digest, playbook schema, impact metrics, experiment configs, and evaluation workflow |
configs/eval/*.yaml |
New evaluation configs for baseline, scope-shift, and Claude synthesis variants |
tests/unit/*.py |
Added test coverage for usage tracker, prompt digest, learning report impact sections, and synthesizer gate failures |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| __all__ = ["OpenAIAdapter"] | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Duplicate logger initialization. The logger is already initialized at line 27. Remove this redundant declaration at line 205.
| logger = logging.getLogger(__name__) |
| allowed_runtime_handles: List[str] = Field(default_factory=list) | ||
| runtime_handle_prefixes: List[str] = Field(default_factory=list) | ||
| cue_types: List[str] = Field(default_factory=lambda: ["regex", "keyword", "predicate"]) | ||
| default_scope_category: str = "differentiation" |
There was a problem hiding this comment.
[nitpick] The default scope category is set to \"differentiation\" in the schema config, but in learning_overhaul_base.yaml and learning_overhaul_claude.yaml it's overridden to \"reinforcement\" (lines 183 and 127 respectively). This inconsistency could lead to confusion. Consider documenting why the global default differs from the config-specific defaults or aligning them for clarity.
| default_scope_category: str = "differentiation" | |
| default_scope_category: str = "reinforcement" |
| try: | ||
| return float(total) | ||
| except (TypeError, ValueError): | ||
| total = None |
There was a problem hiding this comment.
Variable total is not used.
| total = None | |
| pass |
|
Superseded by #94 |
Summary
Testing