feature: Enhance agent context management with compression and metrics logging#2875
Conversation
Defines architecture, data flow, file layout, and acceptance criteria for a standalone document Q&A agent built on the Nexent SDK. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er, reference counting in register_agent_run/unregister_agent_run
… (though we didn't modify it, we analyzed it)
- Split agent_context.py into smaller modules: summary_cache.py, summary_config.py - Convert all Chinese comments/docstrings to English (per .cursor/rules/english_comments.mdc) - Add module-level docstrings for public API documentation - Update __init__.py exports to include new module classes - Convert test files' Chinese comments to English for compliance - Default summary prompts to English with proper documentation
- Add context_manager_config field to AgentConfig - Create ContextManagerConfig in create_agent_info - Enhanced step metrics with compression ratio and cache hit tracking - Add _render_steps_with_truncation for fallback truncation - Add cache hit logging (previous_cache_hit, current_cache_hit, stable_bypass) - Add cache_types to compression stats output - Simplify estimate_tokens to flat message list approach - Remove auto-clear ContextManager logic (keeps cache valid) - Stop tracking test scripts (keep locally)
- Fix TestM13StepLocalLogCleared: cache hit is recorded in _step_local_log (count_after_second should be 1, not 0) - Update summary_json_schema: chars -> words for clearer units
…time context metrics - Add TokenUsageIndicator component with circular progress visualization - Emit TOKEN_COUNT messages via observer for real-time frontend updates - Include step_number, input/output tokens, estimated context, and threshold - Preserve context manager and metrics logging from refactor/agent_context Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…ynamic token threshold
…timation - Add stream_options to request usage info from streaming API - Handle empty choices in streaming chunks (usage-only chunks) - Add fallback token estimation when API doesn't return usage - Add None handling in msg_token_count and _extract_text_from_chat_message
There was a problem hiding this comment.
Pull request overview
This PR introduces conversation-level context compression via a reusable ContextManager, adds step-level token metrics emission/collection for UI display, and improves configuration/devops scaffolding (token estimation utilities, docker env/docs, compose version detection).
Changes:
- Add
ContextManager+ summary cache/config modules and integrate them into agent execution flows (including per-conversation reuse and cleanup). - Emit token-usage JSON per step from backend and parse/display it in the frontend (new
TokenUsageIndicator). - Add shared token estimation utilities and update OpenAI streaming usage handling; add Docker env template and compose version parsing improvements.
Reviewed changes
Copilot reviewed 39 out of 41 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
test/sdk/core/agents/test_agent_context/unit/test_pure_functions.py |
Adds unit coverage for ContextManager pure helpers (formatting/fingerprints/text rendering). |
test/sdk/core/agents/test_agent_context/unit/test_estimate_token.py |
Adds consistency tests for token estimation paths. |
test/sdk/core/agents/test_agent_context/unit/test_compress_with_cache_extra.py |
Adds extra branch coverage for cache-based compression fall-through cases. |
test/sdk/core/agents/test_agent_context/unit/test_compress_with_cache.py |
Tests previous/current cache hit/incremental/fresh compression behavior. |
test/sdk/core/agents/test_agent_context/unit/test_compress_if_needed_extra.py |
Expands coverage for compress_if_needed branch matrix and edge cases. |
test/sdk/core/agents/test_agent_context/unit/test_compress_if_needed.py |
Core tests for threshold behavior, cache shortcuts, and structure expectations. |
test/sdk/core/agents/test_agent_context/unit/test_cache_valid.py |
Tests cache validation helpers for previous/current summaries. |
test/sdk/core/agents/test_agent_context/unit/test_build_message.py |
Tests message building with/without summaries and system prompt. |
test/sdk/core/agents/test_agent_context/unit/test_budget_trim.py |
Tests budget trimming for pairs/actions, including tool-call/observation pairing rules. |
test/sdk/core/agents/test_agent_context/stubs.py |
Adds isolated stubs for smolagents types used by context logic. |
test/sdk/core/agents/test_agent_context/loader.py |
Loads agent_context.py in isolation with stubbed module tree/token estimation. |
test/sdk/core/agents/test_agent_context/factories.py |
Provides factories for ContextManager/memory/test steps and mock model responses. |
test/sdk/core/agents/test_agent_context/conftest.py |
Adjusts test import pathing for the new test harness. |
sdk/nexent/core/utils/token_estimation.py |
Introduces reusable token estimation utilities extracted from context logic. |
sdk/nexent/core/utils/observer.py |
Changes TOKEN_COUNT transformation to pass through JSON for frontend parsing. |
sdk/nexent/core/models/openai_llm.py |
Ensures streaming usage inclusion and estimates tokens when API doesn’t return usage. |
sdk/nexent/core/agents/summary_config.py |
Adds ContextManagerConfig for compression thresholds/budgets/prompts. |
sdk/nexent/core/agents/summary_cache.py |
Adds cache/metrics dataclasses for previous/current summaries and call records. |
sdk/nexent/core/agents/run_agent.py |
Mounts a provided conversation-level ContextManager onto agents during runs. |
sdk/nexent/core/agents/nexent_agent.py |
Creates/mounts ContextManager from config and emits per-step token metrics JSON; logs step metrics. |
sdk/nexent/core/agents/core_agent.py |
Integrates compression before model calls and collects per-step compression/token metrics. |
sdk/nexent/core/agents/agent_model.py |
Extends agent/run models with context manager config/instance fields. |
sdk/nexent/core/agents/agent_context.py |
Adds full ContextManager implementation: cache validation, compression, trimming, metrics. |
sdk/nexent/core/agents/__init__.py |
Re-exports new context and cache/config types from the agents package. |
frontend/types/chat.ts |
Introduces TokenMetrics and changes AgentStep.metrics to structured JSON or null. |
frontend/lib/chatMessageExtractor.ts |
Parses TOKEN_COUNT JSON into metrics (null on parse failure). |
frontend/lib/chat/chatMessageExtractor.ts |
Same TOKEN_COUNT JSON parsing for the alternate extractor entrypoint. |
frontend/components/ui/tokenUsageIndicator.tsx |
Adds UI indicator component for latest token metrics. |
frontend/app/[locale]/chat/streaming/chatStreamMain.tsx |
Extracts latest step metrics and passes them to ChatInput. |
frontend/app/[locale]/chat/streaming/chatStreamHandler.tsx |
Updates streaming step creation and matches TOKEN_COUNT to steps (pending/out-of-order support). |
frontend/app/[locale]/chat/components/chatInput.tsx |
Displays TokenUsageIndicator in the chat input UI. |
frontend/app/[locale]/agents/components/agentInfo/DebugConfig.tsx |
Aligns debug step metrics initialization to null. |
docker/deploy.sh |
Loosens docker compose version parsing to handle suffixes (e.g. -desktop.1). |
docker/.env.bak |
Adds a docker env template/documentation file. |
backend/services/conversation_management_service.py |
Clears conversation-level ContextManager on conversation deletion. |
backend/services/agent_service.py |
Attaches conversation-level ContextManager to agent runs when enabled. |
backend/nexent_context_metrics.log |
Adds a metrics log artifact file. |
backend/agents/create_agent_info.py |
Creates/passes ContextManagerConfig in agent configuration based on model max tokens. |
backend/agents/agent_run_manager.py |
Adds per-conversation ContextManager storage and explicit cleanup API. |
.gitignore |
Ignores *.log and ensures logs/ is ignored. |
.claude/settings.local.json |
Adds local Claude permissions/settings file. |
Comments suppressed due to low confidence (1)
sdk/nexent/core/agents/agent_model.py:59
AgentConfigdefinesmanaged_agentstwice (once asList[AgentConfig]and again asList["AgentConfig"]). In Pydantic this will override/duplicate the field definition and can lead to validation or schema issues. Remove the duplicate definition and keep a single field with the intended type/Field configuration.
model_name: str = Field(description="Model alias from ModelConfig")
provide_run_summary: Optional[bool] = Field(description="Whether to provide run summary to upper-level Agent", default=False)
managed_agents: List[AgentConfig] = Field(description="Managed Agents", default=[])
instructions: Optional[str] = Field(description="Additional instructions to prepend to system prompt", default=None)
managed_agents: List["AgentConfig"] = Field(
description="Internal managed sub-agents created locally",
default=[]
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| prefix: str = "Summary of earlier steps in this task:" # default prefix | ||
|
|
||
| def to_messages(self, summary_mode: bool = False) -> list: | ||
| content = [{"type": "text", "text": f"{self.prefix}:\n{self.task}"}] |
There was a problem hiding this comment.
SummaryTaskStep.prefix already ends with a colon, but to_messages() appends another : (f"{self.prefix}:\n..."), resulting in duplicated punctuation in user-visible text. Either remove the trailing colon from the default prefix or stop appending : in to_messages().
| content = [{"type": "text", "text": f"{self.prefix}:\n{self.task}"}] | |
| content = [{"type": "text", "text": f"{self.prefix}\n{self.task}"}] |
| def _effective_tokens(self, memory: AgentMemory, current_run_start_idx: int) -> int: | ||
| """Estimates the actual token burden of the upcoming _build_messages call. | ||
| Uses summary_text for the covered prefix when cache is valid; falls back to raw otherwise. | ||
| """ | ||
| system_prompt_tokens = estimate_tokens_for_system_prompt(memory) | ||
| prev_steps = memory.steps[:current_run_start_idx] | ||
| curr_steps = memory.steps[current_run_start_idx:] | ||
| return (system_prompt_tokens + self._effective_prev_tokens(prev_steps) | ||
| + self._effective_curr_tokens(curr_steps)) |
There was a problem hiding this comment.
_effective_tokens() calls estimate_tokens_for_system_prompt(memory) without passing chars_per_token, so it always uses the default 1.5 even when ContextManagerConfig.chars_per_token is customized. Pass self.config.chars_per_token to keep effective-token estimation consistent with the rest of the ContextManager estimators.
| for step_log in self.agent.run(query, stream=True, reset=reset): | ||
| # Add content to observer | ||
| print(f"DEBUG step_log type: {type(step_log)}") | ||
| if not isinstance(step_log, ActionStep): | ||
| continue |
There was a problem hiding this comment.
There are unconditional print() debug statements in the agent run loop. This will spam stdout in production, impact performance, and can leak internal data. Replace with structured logging at an appropriate log level (or guard behind a debug flag) and avoid printing per-step in the hot path.
| # Format as JSON with truncation for readability | ||
| messages_json = json.dumps(messages_data, indent=2, ensure_ascii=False, default=str) | ||
| truncated_messages = truncate_content(messages_json, max_length=1000) | ||
| truncated_messages = messages_json |
There was a problem hiding this comment.
truncated_messages is computed with truncate_content(...) but then immediately overwritten with the full messages_json, defeating truncation. This can produce extremely large logs and may leak sensitive prompt/history content. Keep truncation (or make full logging explicitly opt-in) to avoid excessive logging and data exposure.
| truncated_messages = messages_json |
| case chatConfig.messageTypes.STEP_COUNT: | ||
| // Increment the counter for each new step | ||
| // Increment the counter for each new step (for unique ID generation) | ||
| stepIdCounter.current += 1; | ||
|
|
||
| // Create a new step - use the counter and UUID combination to generate a unique ID | ||
| // Extract the raw numeric step number from formatted content like "\n**Step 1** \n" | ||
| // TOKEN_COUNT sends step_number as an integer, so IDs must use only the digit | ||
| const stepTitle = messageContent.trim(); | ||
| const stepNumMatch = stepTitle.match(/\d+/); | ||
| const stepNumber = stepNumMatch ? stepNumMatch[0] : String(stepIdCounter.current); | ||
|
|
||
| // Create a new step - use step number as part of ID for reliable matching | ||
| currentStep = { | ||
| id: `step-${ | ||
| stepIdCounter.current | ||
| }-${Date.now()}-${Math.random() | ||
| .toString(36) | ||
| .substring(2, 9)}`, | ||
| title: messageContent.trim(), | ||
| id: `step-${stepNumber}`, | ||
| title: stepTitle, | ||
| content: "", | ||
| expanded: true, | ||
| contents: [], // Use an array to store all content in order | ||
| metrics: "", | ||
| metrics: null, |
There was a problem hiding this comment.
Step IDs are now generated as step-${stepNumber} (derived only from the displayed step number). Step numbers typically restart from 1 for each assistant message/run, so these IDs can collide across messages and break React key stability and your pending-metrics matching logic. Keep the parsed step number for matching TOKEN_COUNT, but include an additional unique component (e.g., current assistant message id, a run UUID, or the existing counter/timestamp) in step.id.
| # Optional: write to local file | ||
| with open("nexent_context_metrics.log", "a", encoding="utf-8") as f: | ||
| f.write("\n".join(lines) + "\n") No newline at end of file |
There was a problem hiding this comment.
_log_step_metrics() always appends to a local file (nexent_context_metrics.log). Writing to the container filesystem on every run can cause disk growth and operational issues, and it’s not configurable. Consider emitting via the existing logger/monitoring system or gating file output behind an explicit config/env flag and ensuring the path is writable/rotated.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…factor for cognitive complexity - Add enable_context_manager field to AgentInfo database model (default False) - Update create_agent_config to read setting from agent_info instead of hardcoded True - Add field to API request models (backend and frontend) - Add database migration for new column - Refactor _trim_actions_to_budget to reduce cognitive complexity (19 -> 15) - Refactor _render_steps_with_truncation to reduce cognitive complexity (23 -> 15)
greenlet 3.5.0 lacks wheels for ARM64 Linux (aarch64), causing CI failures. Pin to <3.5.0 to ensure compatible version (3.4.0) is resolved.
| ChatCompletionMessage(role=role if role else "assistant", # If there is no explicit role, default to "assistant" | ||
| content=model_output).model_dump(include={"role", "content", "tool_calls"})) | ||
|
|
||
| from smolagents.monitoring import TokenUsage |
- Add enable_context_manager attribute to MockAgent in test_agent_db.py - Add AgentRunInfo, agent_context, and agent_run_manager stubs in test_conversation_management_service.py - Add nexent.core.agents.agent_context stub in test_create_agent_info.py - Add smolagents.memory stub with AgentMemory/MemoryStep in SDK model tests - Update TokenCountTransformer tests to match new passthrough behavior - Update test_create_agent_config assertions to include context_manager_config parameter - Fix TaskStep/ActionStep mocks to use real classes for dataclass inheritance - Add proper package stubs for sdk.nexent.core.agents and utils modules
- Add timing.duration attribute to mock action steps (implementation expects step_log.timing.duration) - Add step_number attribute to mock action steps - Import ANY from unittest.mock for flexible assertions - Update TOKEN_COUNT assertions to use ANY (implementation now sends JSON token data) - Fix test_agent_run_with_observer_with_none_duration: implementation now handles None gracefully (0.0)
Implementation now accesses context_manager and step_metrics in _collect_step_metrics method. Tests need these attributes initialized to avoid AttributeError.
This pull request introduces a robust, reusable conversation-level ContextManager for agent runs, improving memory management and token usage tracking across the backend and frontend. It also enhances environment configuration and improves Docker Compose version detection. The most important changes are grouped below:
Backend: Conversation Context Management & Cleanup
ContextManagertoagent_run_manager, with logic to create, retrieve, and clean up context managers for each conversation, preventing memory leaks and supporting token threshold configuration. (backend/agents/agent_run_manager.py[1] [2]ContextManagerConfiginto agent configuration creation, passing model max token limits and enabling context manager features. (backend/agents/create_agent_info.py[1] [2] [3]ContextManagerif enabled, ensuring its availability during agent execution. (backend/services/agent_service.pybackend/services/agent_service.pyR1646-R1656)ContextManagerinstances when deleting conversations to avoid memory leaks in edge cases. (backend/services/conversation_management_service.pybackend/services/conversation_management_service.pyR369-R374)Frontend: Token Usage Metrics Handling
metricsfields from empty string tonullfor consistency. (frontend/app/[locale]/agents/components/agentInfo/DebugConfig.tsxfrontend/app/[locale]/agents/components/agentInfo/DebugConfig.tsxL217-R217,frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxfrontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL91-R100, frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL194-R214)frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxfrontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL194-R214, frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL222-R241)TokenUsageIndicatorUI component to the chat input area, displaying the latest token usage metrics. (frontend/app/[locale]/chat/components/chatInput.tsxfrontend/app/[locale]/chat/components/chatInput.tsxR30-R31, frontend/app/[locale]/chat/components/chatInput.tsxR125, frontend/app/[locale]/chat/components/chatInput.tsxR144, frontend/app/[locale]/chat/components/chatInput.tsxR823-R824)DevOps & Configuration
.env.bakfile for Docker, documenting all major environment variables and service endpoints for easier setup and migration. (docker/.env.bakdocker/.env.bakR1-R168)-desktop.1). (docker/deploy.shdocker/deploy.shL410-R419)Security/Permissions
.claude/settings.local.jsonwith new Bash command permissions for development and debugging. (.claude/settings.local.json.claude/settings.local.jsonR1-R18)