feature: Enhance agent context management with compression and metrics logging by JasonW404 · Pull Request #2875 · ModelEngine-Group/nexent

JasonW404 · 2026-04-27T09:36:34Z

This pull request introduces a robust, reusable conversation-level ContextManager for agent runs, improving memory management and token usage tracking across the backend and frontend. It also enhances environment configuration and improves Docker Compose version detection. The most important changes are grouped below:

Backend: Conversation Context Management & Cleanup

Added a reusable, per-conversation ContextManager to agent_run_manager, with logic to create, retrieve, and clean up context managers for each conversation, preventing memory leaks and supporting token threshold configuration. (backend/agents/agent_run_manager.py [1] [2]
Integrated ContextManagerConfig into agent configuration creation, passing model max token limits and enabling context manager features. (backend/agents/create_agent_info.py [1] [2] [3]
Updated agent run preparation to mount the conversation-level ContextManager if enabled, ensuring its availability during agent execution. (backend/services/agent_service.py backend/services/agent_service.pyR1646-R1656)
Ensured explicit cleanup of ContextManager instances when deleting conversations to avoid memory leaks in edge cases. (backend/services/conversation_management_service.py backend/services/conversation_management_service.pyR369-R374)

Frontend: Token Usage Metrics Handling

Changed all step metrics fields from empty string to null for consistency. (frontend/app/[locale]/agents/components/agentInfo/DebugConfig.tsx frontend/app/[locale]/agents/components/agentInfo/DebugConfig.tsxL217-R217, frontend/app/[locale]/chat/streaming/chatStreamHandler.tsx frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL91-R100, frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL194-R214)
Improved step identification and token metrics assignment by parsing step numbers from messages and reliably matching incoming token usage data to the correct step, including handling out-of-order arrival. (frontend/app/[locale]/chat/streaming/chatStreamHandler.tsx frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL194-R214, frontend/app/[locale]/chat/streaming/chatStreamHandler.tsxL222-R241)
Added TokenUsageIndicator UI component to the chat input area, displaying the latest token usage metrics. (frontend/app/[locale]/chat/components/chatInput.tsx frontend/app/[locale]/chat/components/chatInput.tsxR30-R31, frontend/app/[locale]/chat/components/chatInput.tsxR125, frontend/app/[locale]/chat/components/chatInput.tsxR144, frontend/app/[locale]/chat/components/chatInput.tsxR823-R824)

DevOps & Configuration

Added a comprehensive .env.bak file for Docker, documenting all major environment variables and service endpoints for easier setup and migration. (docker/.env.bak docker/.env.bakR1-R168)
Improved Docker Compose version detection in the deployment script to handle versions with suffixes (e.g., -desktop.1). (docker/deploy.sh docker/deploy.shL410-R419)

Security/Permissions

Updated .claude/settings.local.json with new Bash command permissions for development and debugging. (.claude/settings.local.json .claude/settings.local.jsonR1-R18)

Defines architecture, data flow, file layout, and acceptance criteria for a standalone document Q&A agent built on the Nexent SDK. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…er, reference counting in register_agent_run/unregister_agent_run

… (though we didn't modify it, we analyzed it)

- Split agent_context.py into smaller modules: summary_cache.py, summary_config.py - Convert all Chinese comments/docstrings to English (per .cursor/rules/english_comments.mdc) - Add module-level docstrings for public API documentation - Update __init__.py exports to include new module classes - Convert test files' Chinese comments to English for compliance - Default summary prompts to English with proper documentation

- Add context_manager_config field to AgentConfig - Create ContextManagerConfig in create_agent_info - Enhanced step metrics with compression ratio and cache hit tracking - Add _render_steps_with_truncation for fallback truncation - Add cache hit logging (previous_cache_hit, current_cache_hit, stable_bypass) - Add cache_types to compression stats output - Simplify estimate_tokens to flat message list approach - Remove auto-clear ContextManager logic (keeps cache valid) - Stop tracking test scripts (keep locally)

- Fix TestM13StepLocalLogCleared: cache hit is recorded in _step_local_log (count_after_second should be 1, not 0) - Update summary_json_schema: chars -> words for clearer units

…time context metrics - Add TokenUsageIndicator component with circular progress visualization - Emit TOKEN_COUNT messages via observer for real-time frontend updates - Include step_number, input/output tokens, estimated context, and threshold - Preserve context manager and metrics logging from refactor/agent_context Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

…ynamic token threshold

…timation - Add stream_options to request usage info from streaming API - Handle empty choices in streaming chunks (usage-only chunks) - Add fallback token estimation when API doesn't return usage - Add None handling in msg_token_count and _extract_text_from_chat_message

…lution

Copilot

Pull request overview

This PR introduces conversation-level context compression via a reusable ContextManager, adds step-level token metrics emission/collection for UI display, and improves configuration/devops scaffolding (token estimation utilities, docker env/docs, compose version detection).

Changes:

Add ContextManager + summary cache/config modules and integrate them into agent execution flows (including per-conversation reuse and cleanup).
Emit token-usage JSON per step from backend and parse/display it in the frontend (new TokenUsageIndicator).
Add shared token estimation utilities and update OpenAI streaming usage handling; add Docker env template and compose version parsing improvements.

Reviewed changes

Copilot reviewed 39 out of 41 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`test/sdk/core/agents/test_agent_context/unit/test_pure_functions.py`	Adds unit coverage for ContextManager pure helpers (formatting/fingerprints/text rendering).
`test/sdk/core/agents/test_agent_context/unit/test_estimate_token.py`	Adds consistency tests for token estimation paths.
`test/sdk/core/agents/test_agent_context/unit/test_compress_with_cache_extra.py`	Adds extra branch coverage for cache-based compression fall-through cases.
`test/sdk/core/agents/test_agent_context/unit/test_compress_with_cache.py`	Tests previous/current cache hit/incremental/fresh compression behavior.
`test/sdk/core/agents/test_agent_context/unit/test_compress_if_needed_extra.py`	Expands coverage for `compress_if_needed` branch matrix and edge cases.
`test/sdk/core/agents/test_agent_context/unit/test_compress_if_needed.py`	Core tests for threshold behavior, cache shortcuts, and structure expectations.
`test/sdk/core/agents/test_agent_context/unit/test_cache_valid.py`	Tests cache validation helpers for previous/current summaries.
`test/sdk/core/agents/test_agent_context/unit/test_build_message.py`	Tests message building with/without summaries and system prompt.
`test/sdk/core/agents/test_agent_context/unit/test_budget_trim.py`	Tests budget trimming for pairs/actions, including tool-call/observation pairing rules.
`test/sdk/core/agents/test_agent_context/stubs.py`	Adds isolated stubs for `smolagents` types used by context logic.
`test/sdk/core/agents/test_agent_context/loader.py`	Loads `agent_context.py` in isolation with stubbed module tree/token estimation.
`test/sdk/core/agents/test_agent_context/factories.py`	Provides factories for ContextManager/memory/test steps and mock model responses.
`test/sdk/core/agents/test_agent_context/conftest.py`	Adjusts test import pathing for the new test harness.
`sdk/nexent/core/utils/token_estimation.py`	Introduces reusable token estimation utilities extracted from context logic.
`sdk/nexent/core/utils/observer.py`	Changes TOKEN_COUNT transformation to pass through JSON for frontend parsing.
`sdk/nexent/core/models/openai_llm.py`	Ensures streaming usage inclusion and estimates tokens when API doesn’t return usage.
`sdk/nexent/core/agents/summary_config.py`	Adds `ContextManagerConfig` for compression thresholds/budgets/prompts.
`sdk/nexent/core/agents/summary_cache.py`	Adds cache/metrics dataclasses for previous/current summaries and call records.
`sdk/nexent/core/agents/run_agent.py`	Mounts a provided conversation-level ContextManager onto agents during runs.
`sdk/nexent/core/agents/nexent_agent.py`	Creates/mounts ContextManager from config and emits per-step token metrics JSON; logs step metrics.
`sdk/nexent/core/agents/core_agent.py`	Integrates compression before model calls and collects per-step compression/token metrics.
`sdk/nexent/core/agents/agent_model.py`	Extends agent/run models with context manager config/instance fields.
`sdk/nexent/core/agents/agent_context.py`	Adds full ContextManager implementation: cache validation, compression, trimming, metrics.
`sdk/nexent/core/agents/__init__.py`	Re-exports new context and cache/config types from the agents package.
`frontend/types/chat.ts`	Introduces `TokenMetrics` and changes `AgentStep.metrics` to structured JSON or null.
`frontend/lib/chatMessageExtractor.ts`	Parses TOKEN_COUNT JSON into `metrics` (null on parse failure).
`frontend/lib/chat/chatMessageExtractor.ts`	Same TOKEN_COUNT JSON parsing for the alternate extractor entrypoint.
`frontend/components/ui/tokenUsageIndicator.tsx`	Adds UI indicator component for latest token metrics.
`frontend/app/[locale]/chat/streaming/chatStreamMain.tsx`	Extracts latest step metrics and passes them to `ChatInput`.
`frontend/app/[locale]/chat/streaming/chatStreamHandler.tsx`	Updates streaming step creation and matches TOKEN_COUNT to steps (pending/out-of-order support).
`frontend/app/[locale]/chat/components/chatInput.tsx`	Displays `TokenUsageIndicator` in the chat input UI.
`frontend/app/[locale]/agents/components/agentInfo/DebugConfig.tsx`	Aligns debug step `metrics` initialization to `null`.
`docker/deploy.sh`	Loosens docker compose version parsing to handle suffixes (e.g. `-desktop.1`).
`docker/.env.bak`	Adds a docker env template/documentation file.
`backend/services/conversation_management_service.py`	Clears conversation-level ContextManager on conversation deletion.
`backend/services/agent_service.py`	Attaches conversation-level ContextManager to agent runs when enabled.
`backend/nexent_context_metrics.log`	Adds a metrics log artifact file.
`backend/agents/create_agent_info.py`	Creates/passes `ContextManagerConfig` in agent configuration based on model max tokens.
`backend/agents/agent_run_manager.py`	Adds per-conversation ContextManager storage and explicit cleanup API.
`.gitignore`	Ignores `*.log` and ensures `logs/` is ignored.
`.claude/settings.local.json`	Adds local Claude permissions/settings file.

Comments suppressed due to low confidence (1)

sdk/nexent/core/agents/agent_model.py:59

AgentConfig defines managed_agents twice (once as List[AgentConfig] and again as List["AgentConfig"]). In Pydantic this will override/duplicate the field definition and can lead to validation or schema issues. Remove the duplicate definition and keep a single field with the intended type/Field configuration.

    model_name: str = Field(description="Model alias from ModelConfig")
    provide_run_summary: Optional[bool] = Field(description="Whether to provide run summary to upper-level Agent", default=False)
    managed_agents: List[AgentConfig] = Field(description="Managed Agents", default=[])
    instructions: Optional[str] = Field(description="Additional instructions to prepend to system prompt", default=None)
    managed_agents: List["AgentConfig"] = Field(
        description="Internal managed sub-agents created locally",
        default=[]
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T09:43:00Z

+    prefix: str = "Summary of earlier steps in this task:"  # default prefix
+
+    def to_messages(self, summary_mode: bool = False) -> list:
+        content = [{"type": "text", "text": f"{self.prefix}:\n{self.task}"}]


SummaryTaskStep.prefix already ends with a colon, but to_messages() appends another : (f"{self.prefix}:\n..."), resulting in duplicated punctuation in user-visible text. Either remove the trailing colon from the default prefix or stop appending : in to_messages().

Suggested change

content = [{"type": "text", "text": f"{self.prefix}:\n{self.task}"}]

content = [{"type": "text", "text": f"{self.prefix}\n{self.task}"}]

Copilot · 2026-04-27T09:43:00Z

+    def _effective_tokens(self, memory: AgentMemory, current_run_start_idx: int) -> int:
+        """Estimates the actual token burden of the upcoming _build_messages call.
+        Uses summary_text for the covered prefix when cache is valid; falls back to raw otherwise.
+        """
+        system_prompt_tokens = estimate_tokens_for_system_prompt(memory)
+        prev_steps = memory.steps[:current_run_start_idx]
+        curr_steps = memory.steps[current_run_start_idx:]
+        return (system_prompt_tokens + self._effective_prev_tokens(prev_steps)
+                + self._effective_curr_tokens(curr_steps))


_effective_tokens() calls estimate_tokens_for_system_prompt(memory) without passing chars_per_token, so it always uses the default 1.5 even when ContextManagerConfig.chars_per_token is customized. Pass self.config.chars_per_token to keep effective-token estimation consistent with the rest of the ContextManager estimators.

Copilot · 2026-04-27T09:43:00Z

            for step_log in self.agent.run(query, stream=True, reset=reset):
                # Add content to observer
+                print(f"DEBUG step_log type: {type(step_log)}")
                if not isinstance(step_log, ActionStep):
                    continue


There are unconditional print() debug statements in the agent run loop. This will spam stdout in production, impact performance, and can leak internal data. Replace with structured logging at an appropriate log level (or guard behind a debug flag) and avoid printing per-step in the hot path.

Copilot · 2026-04-27T09:43:01Z

            # Format as JSON with truncation for readability
            messages_json = json.dumps(messages_data, indent=2, ensure_ascii=False, default=str)
            truncated_messages = truncate_content(messages_json, max_length=1000)
+            truncated_messages = messages_json


truncated_messages is computed with truncate_content(...) but then immediately overwritten with the full messages_json, defeating truncation. This can produce extremely large logs and may leak sensitive prompt/history content. Keep truncation (or make full logging explicitly opt-in) to avoid excessive logging and data exposure.

Suggested change

truncated_messages = messages_json

Copilot · 2026-04-27T09:43:01Z

                case chatConfig.messageTypes.STEP_COUNT:
-                  // Increment the counter for each new step
+                  // Increment the counter for each new step (for unique ID generation)
                  stepIdCounter.current += 1;

-                  // Create a new step - use the counter and UUID combination to generate a unique ID
+                  // Extract the raw numeric step number from formatted content like "\n**Step 1** \n"
+                  // TOKEN_COUNT sends step_number as an integer, so IDs must use only the digit
+                  const stepTitle = messageContent.trim();
+                  const stepNumMatch = stepTitle.match(/\d+/);
+                  const stepNumber = stepNumMatch ? stepNumMatch[0] : String(stepIdCounter.current);
+
+                  // Create a new step - use step number as part of ID for reliable matching
                  currentStep = {
-                    id: `step-${
-                      stepIdCounter.current
-                    }-${Date.now()}-${Math.random()
-                      .toString(36)
-                      .substring(2, 9)}`,
-                    title: messageContent.trim(),
+                    id: `step-${stepNumber}`,
+                    title: stepTitle,
                    content: "",
                    expanded: true,
                    contents: [], // Use an array to store all content in order
-                    metrics: "",
+                    metrics: null,


Step IDs are now generated as step-${stepNumber} (derived only from the displayed step number). Step numbers typically restart from 1 for each assistant message/run, so these IDs can collide across messages and break React key stability and your pending-metrics matching logic. Keep the parsed step number for matching TOKEN_COUNT, but include an additional unique component (e.g., current assistant message id, a run UUID, or the existing counter/timestamp) in step.id.

Copilot · 2026-04-27T09:43:02Z

+        # Optional: write to local file
+        with open("nexent_context_metrics.log", "a", encoding="utf-8") as f:
+            f.write("\n".join(lines) + "\n")


_log_step_metrics() always appends to a local file (nexent_context_metrics.log). Writing to the container filesystem on every run can cause disk growth and operational issues, and it’s not configurable. Consider emitting via the existing logger/monitoring system or gating file output behind an explicit config/env flag and ensuring the path is writable/rotated.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…factor for cognitive complexity - Add enable_context_manager field to AgentInfo database model (default False) - Update create_agent_config to read setting from agent_info instead of hardcoded True - Add field to API request models (backend and frontend) - Add database migration for new column - Refactor _trim_actions_to_budget to reduce cognitive complexity (19 -> 15) - Refactor _render_steps_with_truncation to reduce cognitive complexity (23 -> 15)

greenlet 3.5.0 lacks wheels for ARM64 Linux (aarch64), causing CI failures. Pin to <3.5.0 to ensure compatible version (3.4.0) is resolved.

YehongPan · 2026-04-27T12:51:24Z

                ChatCompletionMessage(role=role if role else "assistant",  # If there is no explicit role, default to "assistant"
                                      content=model_output).model_dump(include={"role", "content", "tool_calls"}))

+            from smolagents.monitoring import TokenUsage


import放到文件头集中管理。

- Add enable_context_manager attribute to MockAgent in test_agent_db.py - Add AgentRunInfo, agent_context, and agent_run_manager stubs in test_conversation_management_service.py - Add nexent.core.agents.agent_context stub in test_create_agent_info.py - Add smolagents.memory stub with AgentMemory/MemoryStep in SDK model tests - Update TokenCountTransformer tests to match new passthrough behavior - Update test_create_agent_config assertions to include context_manager_config parameter - Fix TaskStep/ActionStep mocks to use real classes for dataclass inheritance - Add proper package stubs for sdk.nexent.core.agents and utils modules

- Add timing.duration attribute to mock action steps (implementation expects step_log.timing.duration) - Add step_number attribute to mock action steps - Import ANY from unittest.mock for flexible assertions - Update TOKEN_COUNT assertions to use ANY (implementation now sends JSON token data) - Fix test_agent_run_with_observer_with_none_duration: implementation now handles None gracefully (0.0)

Implementation now accesses context_manager and step_metrics in _collect_step_metrics method. Tests need these attributes initialized to avoid AttributeError.

liudfgoo and others added 30 commits April 16, 2026 17:41

docs: Add design spec for doc-qa-agent MVP

161e991

Defines architecture, data flow, file layout, and acceptance criteria for a standalone document Q&A agent built on the Nexent SDK. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

agent context

28da5e1

context management for agent

5b57e89

agent context

2781964

Added get_or_create_context_manager, clear_conversation_context_manag…

70defda

…er, reference counting in register_agent_run/unregister_agent_run

Modified prepare_agent_run to mount CM, and stop_agent_tasks analysis…

875f4ca

… (though we didn't modify it, we analyzed it)

Added cleanup in delete_conversation_service

1d26cdb

Added context_manager field to AgentRunInfo

958a219

Modified agent_run_thread to reuse CM

eaaa46b

add compress_if_needed and quantitative collection

1fb382f

reuse context_manager; extract and log token usage per step

260ad04

add TokenUsage for message to facilitate displaying token usage

b3ed15c

main code for context compression

103a09d

add test multi run to understand previous run and current run

2ec7367

token usage metrics log

dd686d4

using sdk to conduct end-to-end test for anget_context.py

15f8128

reusable functions for testing agent_context

c373b0e

fix bug: reuse cache and calc effective tokens

c7e327d

from utils.token_estimation import msg_token_count

eb83b01

estimate token

494e141

Merge upstream/develop into feature/agent_context

2c3de80

Add test_agent_context unit tests

4713ab7

update agent_context

368d313

🧪 Fix test assertions to align with feature branch standards

8b757b6

- Fix TestM13StepLocalLogCleared: cache hit is recorded in _step_local_log (count_after_second should be 1, not 0) - Update summary_json_schema: chars -> words for clearer units

Enhance agent context management with token compression logging and d…

7247601

…ynamic token threshold

chore: add *.log to gitignore to exclude runtime log files

ed9b72b

Merge upstream develop into refactor/agent_context with conflict reso…

5a3e310

…lution

Copilot AI review requested due to automatic review settings April 27, 2026 09:36

JasonW404 requested review from Dallas98 and WMC001 as code owners April 27, 2026 09:36

Copilot started reviewing on behalf of JasonW404 April 27, 2026 09:37 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

JasonW404 and others added 4 commits April 27, 2026 20:10

Apply suggestion from @Copilot

266b4a5

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

ed3f1c6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: pin greenlet<3.5.0 for aarch64 Linux compatibility

f922d54

greenlet 3.5.0 lacks wheels for ARM64 Linux (aarch64), causing CI failures. Pin to <3.5.0 to ensure compatible version (3.4.0) is resolved.

YehongPan reviewed Apr 27, 2026

View reviewed changes

Jinglong Wang added 3 commits April 27, 2026 22:12

fix: add context_manager and step_metrics to CoreAgent test mocks

f366022

Implementation now accesses context_manager and step_metrics in _collect_step_metrics method. Tests need these attributes initialized to avoid AttributeError.

Dallas98 approved these changes Apr 27, 2026

View reviewed changes

Dallas98 merged commit 55079a6 into ModelEngine-Group:develop Apr 27, 2026
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Enhance agent context management with compression and metrics logging#2875

feature: Enhance agent context management with compression and metrics logging#2875
Dallas98 merged 38 commits intoModelEngine-Group:developfrom
liudfgoo:refactor/agent_context

JasonW404 commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

YehongPan Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	content = [{"type": "text", "text": f"{self.prefix}:\n{self.task}"}]
	content = [{"type": "text", "text": f"{self.prefix}\n{self.task}"}]

Conversation

JasonW404 commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YehongPan Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants