Skip to content

feature: Enhance agent context management with compression and metrics logging#2875

Merged
Dallas98 merged 38 commits intoModelEngine-Group:developfrom
liudfgoo:refactor/agent_context
Apr 27, 2026
Merged

feature: Enhance agent context management with compression and metrics logging#2875
Dallas98 merged 38 commits intoModelEngine-Group:developfrom
liudfgoo:refactor/agent_context

Conversation

@JasonW404
Copy link
Copy Markdown
Contributor

This pull request introduces a robust, reusable conversation-level ContextManager for agent runs, improving memory management and token usage tracking across the backend and frontend. It also enhances environment configuration and improves Docker Compose version detection. The most important changes are grouped below:

Backend: Conversation Context Management & Cleanup

  • Added a reusable, per-conversation ContextManager to agent_run_manager, with logic to create, retrieve, and clean up context managers for each conversation, preventing memory leaks and supporting token threshold configuration. (backend/agents/agent_run_manager.py [1] [2]
  • Integrated ContextManagerConfig into agent configuration creation, passing model max token limits and enabling context manager features. (backend/agents/create_agent_info.py [1] [2] [3]
  • Updated agent run preparation to mount the conversation-level ContextManager if enabled, ensuring its availability during agent execution. (backend/services/agent_service.py backend/services/agent_service.pyR1646-R1656)
  • Ensured explicit cleanup of ContextManager instances when deleting conversations to avoid memory leaks in edge cases. (backend/services/conversation_management_service.py backend/services/conversation_management_service.pyR369-R374)

Frontend: Token Usage Metrics Handling

DevOps & Configuration

  • Added a comprehensive .env.bak file for Docker, documenting all major environment variables and service endpoints for easier setup and migration. (docker/.env.bak docker/.env.bakR1-R168)
  • Improved Docker Compose version detection in the deployment script to handle versions with suffixes (e.g., -desktop.1). (docker/deploy.sh docker/deploy.shL410-R419)

Security/Permissions

  • Updated .claude/settings.local.json with new Bash command permissions for development and debugging. (.claude/settings.local.json .claude/settings.local.jsonR1-R18)

liudfgoo and others added 30 commits April 16, 2026 17:41
Defines architecture, data flow, file layout, and acceptance criteria
for a standalone document Q&A agent built on the Nexent SDK.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er, reference counting in register_agent_run/unregister_agent_run
… (though we didn't modify it, we analyzed it)
- Split agent_context.py into smaller modules: summary_cache.py, summary_config.py
- Convert all Chinese comments/docstrings to English (per .cursor/rules/english_comments.mdc)
- Add module-level docstrings for public API documentation
- Update __init__.py exports to include new module classes
- Convert test files' Chinese comments to English for compliance
- Default summary prompts to English with proper documentation
- Add context_manager_config field to AgentConfig
- Create ContextManagerConfig in create_agent_info
- Enhanced step metrics with compression ratio and cache hit tracking
- Add _render_steps_with_truncation for fallback truncation
- Add cache hit logging (previous_cache_hit, current_cache_hit, stable_bypass)
- Add cache_types to compression stats output
- Simplify estimate_tokens to flat message list approach
- Remove auto-clear ContextManager logic (keeps cache valid)
- Stop tracking test scripts (keep locally)
- Fix TestM13StepLocalLogCleared: cache hit is recorded in _step_local_log
  (count_after_second should be 1, not 0)
- Update summary_json_schema: chars -> words for clearer units
…time context metrics

- Add TokenUsageIndicator component with circular progress visualization

- Emit TOKEN_COUNT messages via observer for real-time frontend updates

- Include step_number, input/output tokens, estimated context, and threshold

- Preserve context manager and metrics logging from refactor/agent_context

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…timation

- Add stream_options to request usage info from streaming API
- Handle empty choices in streaming chunks (usage-only chunks)
- Add fallback token estimation when API doesn't return usage
- Add None handling in msg_token_count and _extract_text_from_chat_message
Copilot AI review requested due to automatic review settings April 27, 2026 09:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces conversation-level context compression via a reusable ContextManager, adds step-level token metrics emission/collection for UI display, and improves configuration/devops scaffolding (token estimation utilities, docker env/docs, compose version detection).

Changes:

  • Add ContextManager + summary cache/config modules and integrate them into agent execution flows (including per-conversation reuse and cleanup).
  • Emit token-usage JSON per step from backend and parse/display it in the frontend (new TokenUsageIndicator).
  • Add shared token estimation utilities and update OpenAI streaming usage handling; add Docker env template and compose version parsing improvements.

Reviewed changes

Copilot reviewed 39 out of 41 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
test/sdk/core/agents/test_agent_context/unit/test_pure_functions.py Adds unit coverage for ContextManager pure helpers (formatting/fingerprints/text rendering).
test/sdk/core/agents/test_agent_context/unit/test_estimate_token.py Adds consistency tests for token estimation paths.
test/sdk/core/agents/test_agent_context/unit/test_compress_with_cache_extra.py Adds extra branch coverage for cache-based compression fall-through cases.
test/sdk/core/agents/test_agent_context/unit/test_compress_with_cache.py Tests previous/current cache hit/incremental/fresh compression behavior.
test/sdk/core/agents/test_agent_context/unit/test_compress_if_needed_extra.py Expands coverage for compress_if_needed branch matrix and edge cases.
test/sdk/core/agents/test_agent_context/unit/test_compress_if_needed.py Core tests for threshold behavior, cache shortcuts, and structure expectations.
test/sdk/core/agents/test_agent_context/unit/test_cache_valid.py Tests cache validation helpers for previous/current summaries.
test/sdk/core/agents/test_agent_context/unit/test_build_message.py Tests message building with/without summaries and system prompt.
test/sdk/core/agents/test_agent_context/unit/test_budget_trim.py Tests budget trimming for pairs/actions, including tool-call/observation pairing rules.
test/sdk/core/agents/test_agent_context/stubs.py Adds isolated stubs for smolagents types used by context logic.
test/sdk/core/agents/test_agent_context/loader.py Loads agent_context.py in isolation with stubbed module tree/token estimation.
test/sdk/core/agents/test_agent_context/factories.py Provides factories for ContextManager/memory/test steps and mock model responses.
test/sdk/core/agents/test_agent_context/conftest.py Adjusts test import pathing for the new test harness.
sdk/nexent/core/utils/token_estimation.py Introduces reusable token estimation utilities extracted from context logic.
sdk/nexent/core/utils/observer.py Changes TOKEN_COUNT transformation to pass through JSON for frontend parsing.
sdk/nexent/core/models/openai_llm.py Ensures streaming usage inclusion and estimates tokens when API doesn’t return usage.
sdk/nexent/core/agents/summary_config.py Adds ContextManagerConfig for compression thresholds/budgets/prompts.
sdk/nexent/core/agents/summary_cache.py Adds cache/metrics dataclasses for previous/current summaries and call records.
sdk/nexent/core/agents/run_agent.py Mounts a provided conversation-level ContextManager onto agents during runs.
sdk/nexent/core/agents/nexent_agent.py Creates/mounts ContextManager from config and emits per-step token metrics JSON; logs step metrics.
sdk/nexent/core/agents/core_agent.py Integrates compression before model calls and collects per-step compression/token metrics.
sdk/nexent/core/agents/agent_model.py Extends agent/run models with context manager config/instance fields.
sdk/nexent/core/agents/agent_context.py Adds full ContextManager implementation: cache validation, compression, trimming, metrics.
sdk/nexent/core/agents/__init__.py Re-exports new context and cache/config types from the agents package.
frontend/types/chat.ts Introduces TokenMetrics and changes AgentStep.metrics to structured JSON or null.
frontend/lib/chatMessageExtractor.ts Parses TOKEN_COUNT JSON into metrics (null on parse failure).
frontend/lib/chat/chatMessageExtractor.ts Same TOKEN_COUNT JSON parsing for the alternate extractor entrypoint.
frontend/components/ui/tokenUsageIndicator.tsx Adds UI indicator component for latest token metrics.
frontend/app/[locale]/chat/streaming/chatStreamMain.tsx Extracts latest step metrics and passes them to ChatInput.
frontend/app/[locale]/chat/streaming/chatStreamHandler.tsx Updates streaming step creation and matches TOKEN_COUNT to steps (pending/out-of-order support).
frontend/app/[locale]/chat/components/chatInput.tsx Displays TokenUsageIndicator in the chat input UI.
frontend/app/[locale]/agents/components/agentInfo/DebugConfig.tsx Aligns debug step metrics initialization to null.
docker/deploy.sh Loosens docker compose version parsing to handle suffixes (e.g. -desktop.1).
docker/.env.bak Adds a docker env template/documentation file.
backend/services/conversation_management_service.py Clears conversation-level ContextManager on conversation deletion.
backend/services/agent_service.py Attaches conversation-level ContextManager to agent runs when enabled.
backend/nexent_context_metrics.log Adds a metrics log artifact file.
backend/agents/create_agent_info.py Creates/passes ContextManagerConfig in agent configuration based on model max tokens.
backend/agents/agent_run_manager.py Adds per-conversation ContextManager storage and explicit cleanup API.
.gitignore Ignores *.log and ensures logs/ is ignored.
.claude/settings.local.json Adds local Claude permissions/settings file.
Comments suppressed due to low confidence (1)

sdk/nexent/core/agents/agent_model.py:59

  • AgentConfig defines managed_agents twice (once as List[AgentConfig] and again as List["AgentConfig"]). In Pydantic this will override/duplicate the field definition and can lead to validation or schema issues. Remove the duplicate definition and keep a single field with the intended type/Field configuration.
    model_name: str = Field(description="Model alias from ModelConfig")
    provide_run_summary: Optional[bool] = Field(description="Whether to provide run summary to upper-level Agent", default=False)
    managed_agents: List[AgentConfig] = Field(description="Managed Agents", default=[])
    instructions: Optional[str] = Field(description="Additional instructions to prepend to system prompt", default=None)
    managed_agents: List["AgentConfig"] = Field(
        description="Internal managed sub-agents created locally",
        default=[]
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

prefix: str = "Summary of earlier steps in this task:" # default prefix

def to_messages(self, summary_mode: bool = False) -> list:
content = [{"type": "text", "text": f"{self.prefix}:\n{self.task}"}]
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SummaryTaskStep.prefix already ends with a colon, but to_messages() appends another : (f"{self.prefix}:\n..."), resulting in duplicated punctuation in user-visible text. Either remove the trailing colon from the default prefix or stop appending : in to_messages().

Suggested change
content = [{"type": "text", "text": f"{self.prefix}:\n{self.task}"}]
content = [{"type": "text", "text": f"{self.prefix}\n{self.task}"}]

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +112
def _effective_tokens(self, memory: AgentMemory, current_run_start_idx: int) -> int:
"""Estimates the actual token burden of the upcoming _build_messages call.
Uses summary_text for the covered prefix when cache is valid; falls back to raw otherwise.
"""
system_prompt_tokens = estimate_tokens_for_system_prompt(memory)
prev_steps = memory.steps[:current_run_start_idx]
curr_steps = memory.steps[current_run_start_idx:]
return (system_prompt_tokens + self._effective_prev_tokens(prev_steps)
+ self._effective_curr_tokens(curr_steps))
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_effective_tokens() calls estimate_tokens_for_system_prompt(memory) without passing chars_per_token, so it always uses the default 1.5 even when ContextManagerConfig.chars_per_token is customized. Pass self.config.chars_per_token to keep effective-token estimation consistent with the rest of the ContextManager estimators.

Copilot uses AI. Check for mistakes.
Comment on lines 328 to 332
for step_log in self.agent.run(query, stream=True, reset=reset):
# Add content to observer
print(f"DEBUG step_log type: {type(step_log)}")
if not isinstance(step_log, ActionStep):
continue
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are unconditional print() debug statements in the agent run loop. This will spam stdout in production, impact performance, and can leak internal data. Replace with structured logging at an appropriate log level (or guard behind a debug flag) and avoid printing per-step in the hot path.

Copilot uses AI. Check for mistakes.
# Format as JSON with truncation for readability
messages_json = json.dumps(messages_data, indent=2, ensure_ascii=False, default=str)
truncated_messages = truncate_content(messages_json, max_length=1000)
truncated_messages = messages_json
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncated_messages is computed with truncate_content(...) but then immediately overwritten with the full messages_json, defeating truncation. This can produce extremely large logs and may leak sensitive prompt/history content. Keep truncation (or make full logging explicitly opt-in) to avoid excessive logging and data exposure.

Suggested change
truncated_messages = messages_json

Copilot uses AI. Check for mistakes.
Comment on lines 197 to +214
case chatConfig.messageTypes.STEP_COUNT:
// Increment the counter for each new step
// Increment the counter for each new step (for unique ID generation)
stepIdCounter.current += 1;

// Create a new step - use the counter and UUID combination to generate a unique ID
// Extract the raw numeric step number from formatted content like "\n**Step 1** \n"
// TOKEN_COUNT sends step_number as an integer, so IDs must use only the digit
const stepTitle = messageContent.trim();
const stepNumMatch = stepTitle.match(/\d+/);
const stepNumber = stepNumMatch ? stepNumMatch[0] : String(stepIdCounter.current);

// Create a new step - use step number as part of ID for reliable matching
currentStep = {
id: `step-${
stepIdCounter.current
}-${Date.now()}-${Math.random()
.toString(36)
.substring(2, 9)}`,
title: messageContent.trim(),
id: `step-${stepNumber}`,
title: stepTitle,
content: "",
expanded: true,
contents: [], // Use an array to store all content in order
metrics: "",
metrics: null,
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step IDs are now generated as step-${stepNumber} (derived only from the displayed step number). Step numbers typically restart from 1 for each assistant message/run, so these IDs can collide across messages and break React key stability and your pending-metrics matching logic. Keep the parsed step number for matching TOKEN_COUNT, but include an additional unique component (e.g., current assistant message id, a run UUID, or the existing counter/timestamp) in step.id.

Copilot uses AI. Check for mistakes.
Comment thread .claude/settings.local.json Outdated
Comment thread docker/.env.bak
Comment on lines +488 to +490
# Optional: write to local file
with open("nexent_context_metrics.log", "a", encoding="utf-8") as f:
f.write("\n".join(lines) + "\n") No newline at end of file
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_log_step_metrics() always appends to a local file (nexent_context_metrics.log). Writing to the container filesystem on every run can cause disk growth and operational issues, and it’s not configurable. Consider emitting via the existing logger/monitoring system or gating file output behind an explicit config/env flag and ensuring the path is writable/rotated.

Copilot uses AI. Check for mistakes.
Comment thread docker/.env.bak Outdated
Comment thread backend/agents/agent_run_manager.py
JasonW404 and others added 4 commits April 27, 2026 20:10
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…factor for cognitive complexity

- Add enable_context_manager field to AgentInfo database model (default False)
- Update create_agent_config to read setting from agent_info instead of hardcoded True
- Add field to API request models (backend and frontend)
- Add database migration for new column
- Refactor _trim_actions_to_budget to reduce cognitive complexity (19 -> 15)
- Refactor _render_steps_with_truncation to reduce cognitive complexity (23 -> 15)
greenlet 3.5.0 lacks wheels for ARM64 Linux (aarch64), causing CI failures.
Pin to <3.5.0 to ensure compatible version (3.4.0) is resolved.
ChatCompletionMessage(role=role if role else "assistant", # If there is no explicit role, default to "assistant"
content=model_output).model_dump(include={"role", "content", "tool_calls"}))

from smolagents.monitoring import TokenUsage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import放到文件头集中管理。

Jinglong Wang added 3 commits April 27, 2026 22:12
- Add enable_context_manager attribute to MockAgent in test_agent_db.py
- Add AgentRunInfo, agent_context, and agent_run_manager stubs in test_conversation_management_service.py
- Add nexent.core.agents.agent_context stub in test_create_agent_info.py
- Add smolagents.memory stub with AgentMemory/MemoryStep in SDK model tests
- Update TokenCountTransformer tests to match new passthrough behavior
- Update test_create_agent_config assertions to include context_manager_config parameter
- Fix TaskStep/ActionStep mocks to use real classes for dataclass inheritance
- Add proper package stubs for sdk.nexent.core.agents and utils modules
- Add timing.duration attribute to mock action steps (implementation expects step_log.timing.duration)
- Add step_number attribute to mock action steps
- Import ANY from unittest.mock for flexible assertions
- Update TOKEN_COUNT assertions to use ANY (implementation now sends JSON token data)
- Fix test_agent_run_with_observer_with_none_duration: implementation now handles None gracefully (0.0)
Implementation now accesses context_manager and step_metrics in _collect_step_metrics method.
Tests need these attributes initialized to avoid AttributeError.
@Dallas98 Dallas98 merged commit 55079a6 into ModelEngine-Group:develop Apr 27, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants