Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a provider-aware prompt digest system to prevent oversized execution metadata from exceeding LLM context window limits. The digest compresses large metadata sections into summaries while preserving high-signal information.
- Adds
MetadataDigestConfigto configure budget limits and trimming behavior per provider - Implements
build_prompt_digest()to intelligently trim metadata sections based on character budgets - Integrates digest logic into OpenAI adapter's message building pipeline
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| atlas/connectors/prompt_digest.py | New module implementing metadata digest builder with provider-aware budgets and intelligent section trimming |
| atlas/config/models.py | Adds MetadataDigestConfig class and integrates it into OpenAIAdapterConfig |
| atlas/connectors/openai.py | Updates _build_messages to use digest system and handle errors |
| tests/unit/test_openai_adapter.py | Adds test verifying adapter trims large metadata blobs |
| tests/unit/connectors/test_prompt_digest.py | New test file with comprehensive digest functionality tests |
| docs/learning_eval.md | Documents the prompt digest feature, configuration options, and playbook entry schema updates |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class MetadataDigestConfig(BaseModel): | ||
| """Controls how execution metadata is projected into LLM-facing prompts. | ||
|
|
||
| Defaults reserve roughly 10%% of the provider's published context window, assuming ~4 characters per token. |
There was a problem hiding this comment.
Double percent sign '%%' should be a single '%'.
| Defaults reserve roughly 10%% of the provider's published context window, assuming ~4 characters per token. | |
| Defaults reserve roughly 10% of the provider's published context window, assuming ~4 characters per token. |
| assert "session_learning_audit" in digest["digest_stats"]["omitted_metadata_keys"] | ||
| audit_summary = digest["session"]["session_reward_audit_summary"] | ||
| assert audit_summary["entries"] == 50 | ||
| assert "payload" not in audit_summary["sample"][0] |
There was a problem hiding this comment.
This assertion checks if the string 'payload' is not in audit_summary['sample'][0], but based on line 16 where the metadata creates a list of strings ['payload'] * 200, the sample entries would be the string 'payload' itself, not a dict/mapping. The assertion should verify the sample was trimmed, but this logic appears incorrect for the data structure being tested.
| assert "payload" not in audit_summary["sample"][0] | |
| # The sample should be trimmed; check its length is less than the original (200) | |
| assert len(audit_summary["sample"]) < 200 |
| if provider in config.provider_char_budgets: | ||
| return config.provider_char_budgets[provider] | ||
| token_limit = PROVIDER_TOKEN_LIMITS.get(provider) | ||
| if token_limit: |
There was a problem hiding this comment.
[nitpick] The magic number calculation token_limit * 0.025 * 4 could be clearer. Consider extracting this formula to a named constant or adding an inline comment explaining the relationship (e.g., '2.5% of context window * 4 chars/token').
| if token_limit: | |
| if token_limit: | |
| # Character budget: 2.5% of context window (token_limit) * average chars per token (4) |
| if value.isdigit(): | ||
| return int(value), value | ||
| try: | ||
| return int(value.split("-", 1)[0]), value |
There was a problem hiding this comment.
The step key parsing assumes step IDs follow a 'number-suffix' format but doesn't handle the case where the split produces an empty string before the hyphen (e.g., '-123'). Consider validating that the first part is non-empty before calling int().
| return int(value.split("-", 1)[0]), value | |
| prefix = value.split("-", 1)[0] | |
| if prefix == "": | |
| return math.inf, value | |
| return int(prefix), value |
| digest = build_prompt_digest(metadata, self._config.llm, self._config.metadata_digest) | ||
| except PromptDigestTooLargeError as exc: | ||
| raise AdapterError(str(exc)) from exc | ||
| messages.append({"role": "system", "content": digest}) | ||
| try: | ||
| stats = json.loads(digest).get("digest_stats", {}) | ||
| except json.JSONDecodeError: | ||
| stats = {} |
There was a problem hiding this comment.
The digest JSON is parsed twice: once in build_prompt_digest() during encoding (line 48), and again here (line 53) for logging. Consider having build_prompt_digest() return both the encoded string and stats dict to avoid redundant parsing.
| digest = build_prompt_digest(metadata, self._config.llm, self._config.metadata_digest) | |
| except PromptDigestTooLargeError as exc: | |
| raise AdapterError(str(exc)) from exc | |
| messages.append({"role": "system", "content": digest}) | |
| try: | |
| stats = json.loads(digest).get("digest_stats", {}) | |
| except json.JSONDecodeError: | |
| stats = {} | |
| digest, stats = build_prompt_digest(metadata, self._config.llm, self._config.metadata_digest) | |
| except PromptDigestTooLargeError as exc: | |
| raise AdapterError(str(exc)) from exc | |
| messages.append({"role": "system", "content": digest}) |
705c9b8 to
6c33ad7
Compare
Summary
Testing