Skip to content

fix(llm): ensure second LLM gets independent metrics and telemetry#1997

Merged
xingyaoww merged 5 commits intomainfrom
openhands/fix-condenser-metrics-accumulation
Feb 13, 2026
Merged

fix(llm): ensure second LLM gets independent metrics and telemetry#1997
xingyaoww merged 5 commits intomainfrom
openhands/fix-condenser-metrics-accumulation

Conversation

@xingyaoww
Copy link
Copy Markdown
Collaborator

@xingyaoww xingyaoww commented Feb 11, 2026

Summary

This PR fixes issue #418 where condensation breaks the metrics accumulation.

Root Cause

When LLM.model_copy() was called (e.g., to create a condenser LLM from an agent LLM), the _metrics and _telemetry private attributes were shared between the original and copied LLM instances. This is because Pydantic's model_copy() does a shallow copy of private attributes by default.

This caused metrics to be double-counted when both LLMs were used, as they were updating the same Metrics object. The visualizer would then display incorrect metrics in the "Condensation" cell block.

Fix

This PR overrides model_copy() in the LLM class to create fresh Metrics and Telemetry instances for the copied LLM, ensuring each LLM tracks its own usage independently.

Testing

Added 3 new tests to verify:

  1. test_llm_model_copy_creates_independent_metrics - Verifies that copied LLMs have separate metrics objects
  2. test_llm_model_copy_creates_independent_telemetry - Verifies that copied LLMs have separate telemetry objects
  3. test_llm_model_copy_preserves_configuration - Verifies that the LLM configuration is preserved when copying

Fixes #418

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:b05e9e0-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-b05e9e0-python \
  ghcr.io/openhands/agent-server:b05e9e0-python

All tags pushed for this build

ghcr.io/openhands/agent-server:b05e9e0-golang-amd64
ghcr.io/openhands/agent-server:b05e9e0-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:b05e9e0-golang-arm64
ghcr.io/openhands/agent-server:b05e9e0-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:b05e9e0-java-amd64
ghcr.io/openhands/agent-server:b05e9e0-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:b05e9e0-java-arm64
ghcr.io/openhands/agent-server:b05e9e0-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:b05e9e0-python-amd64
ghcr.io/openhands/agent-server:b05e9e0-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:b05e9e0-python-arm64
ghcr.io/openhands/agent-server:b05e9e0-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:b05e9e0-golang
ghcr.io/openhands/agent-server:b05e9e0-java
ghcr.io/openhands/agent-server:b05e9e0-python

About Multi-Architecture Support

  • Each variant tag (e.g., b05e9e0-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., b05e9e0-python-amd64) are also available if needed

When LLM.model_copy() was called (e.g., to create a condenser LLM from
an agent LLM), the _metrics and _telemetry private attributes were
shared between the original and copied LLM instances. This caused
metrics to be double-counted when both LLMs were used, as they were
updating the same Metrics object.

This fix overrides model_copy() to create fresh Metrics and Telemetry
instances for the copied LLM, ensuring each LLM tracks its own usage
independently.

Fixes #418

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable

Core logic solves a real bug, but violates parameter semantics and bundles unrelated changes.


VERDICT: ✅ Worth merging after addressing the critical parameter inconsistency

KEY INSIGHT: You fixed the symptom (shared metrics) but ignored the deep parameter, creating an API that lies about what it does—document this choice or respect the parameter.


See inline comments for specific issues. The metrics fix is sound, but the implementation needs clarity around parameter semantics and change scope.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 11, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   llm.py4658681%400, 421–422, 458, 656, 757, 759–760, 788, 838, 849–851, 855–859, 867–869, 879–881, 884–885, 889, 891–892, 894, 917–922, 1045, 1050–1051, 1248–1249, 1258, 1271, 1273–1278, 1280–1297, 1300–1304, 1306–1307, 1313–1322, 1373, 1375
TOTAL17855536769% 

Replace the model_copy override with __copy__ implementation for cleaner
separation of concerns. Pydantic's model_copy() uses __copy__ internally,
so this achieves the same result without overriding a Pydantic method.

Changes:
- Implement __copy__ to reset _metrics and _telemetry to None
- Change metrics and telemetry properties to use lazy initialization
- Remove unused Mapping import

This approach is cleaner because:
1. __copy__ is a standard Python protocol, not Pydantic-specific
2. Lazy initialization in properties centralizes the creation logic
3. Less likely to break with future Pydantic updates

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Feb 11, 2026

Could we use the LLMRegistry instead? I think maybe the registry should be responsible with this

@xingyaoww xingyaoww marked this pull request as draft February 11, 2026 22:56
@xingyaoww
Copy link
Copy Markdown
Collaborator Author

@OpenHands can you think about this #1997 (comment) and come up with a better solution for this?

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 11, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

This change addresses the review feedback to use LLMRegistry for ensuring
metrics independence instead of overriding __copy__ in the LLM class.

Changes:
- Remove __copy__ override from LLM class
- Add reset_metrics() method to LLM for resetting metrics/telemetry
- Add _ensure_independent_metrics() to LLMRegistry that detects and fixes
  shared metrics when LLMs are registered
- Update tests to verify the registry-based approach

The registry is the right place for this logic because:
1. It's the central point where LLMs are registered for a conversation
2. It has visibility into all registered LLMs
3. It can detect shared metrics by tracking metrics object IDs

This approach fixes the issue where model_copy() creates LLMs with shared
metrics (due to Pydantic's shallow copy of private attributes), which caused
metrics to be double-counted when both the original and copied LLM were used.

Fixes #418
@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 11, 2026

I've addressed @enyst's feedback to use the LLMRegistry instead of overriding __copy__ in the LLM class. The changes have been pushed to the PR branch.

Summary of Changes

Approach

Instead of modifying copy behavior at the LLM level, the fix is now implemented in the LLMRegistry, which is the central point where LLMs are registered for a conversation. This is a better design because:

  1. The registry has visibility into all registered LLMs
  2. It can detect shared metrics by tracking metrics object IDs
  3. It centralizes the concern of metrics independence

Changes Made

openhands-sdk/openhands/sdk/llm/llm.py:

  • Removed the __copy__ override
  • Added reset_metrics() method that resets _metrics and _telemetry to None (for lazy recreation)
  • Changed metrics and telemetry properties to use lazy initialization (return fresh instances if None)

openhands-sdk/openhands/sdk/llm/llm_registry.py:

  • Added _metrics_ids: set[int] to track metrics object IDs
  • Added _ensure_independent_metrics() method that:
    • Detects if an LLM's metrics are already tracked by another LLM
    • Calls llm.reset_metrics() to create fresh metrics if sharing is detected
  • Updated add() to call _ensure_independent_metrics() before registering

Tests:

  • Updated tests/sdk/llm/test_llm.py: Replaced __copy__ tests with test_llm_reset_metrics()
  • Added 3 new tests in tests/sdk/llm/test_llm_registry.py:
    • test_llm_registry_ensures_independent_metrics_for_copied_llms
    • test_llm_registry_ensures_independent_telemetry_for_copied_llms
    • test_llm_registry_does_not_reset_metrics_for_independent_llms

All tests pass (586 passed in the LLM test suite).

View full conversation

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 11, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Review Thread Gate

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1997 at branch `openhands/fix-condenser-metrics-accumulation`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Pragmatic fix for real bug

Context: All previous review threads are outdated - they commented on an old implementation that overrode model_copy(). This new approach is fundamentally different and better: lazy initialization + registry-based detection.

Why this works: Instead of fighting Pydantic's shallow-copy behavior, you work with it. Metrics are created on-demand, and the registry detects/fixes shared instances. Clean delegation of responsibility.


Review Feedback

🟡 Lazy Initialization Trade-off (llm.py, metrics/telemetry properties)

Before: Assertion failed immediately if metrics wasn't initialized (fail-fast)
After: Silently creates new metrics (hides potential bugs)

This is intentional for your use case (reset_metrics + lazy recreation), but worth noting: you're trading fail-fast for convenience. If someone expects metrics to be pre-initialized and finds None, they now get a fresh object instead of an error.

Not a blocker - the approach works for your design.

🟡 Unbounded Set Growth (llm_registry.py, _metrics_ids)

_metrics_ids grows forever. Each registered LLM adds an ID that's never removed, even if the LLM is later unregistered.

Impact: Probably negligible (registries are typically small), but technically a memory leak.
Fix if needed: Track LLM-to-metrics-id mapping in a dict instead of a set, and clean up on removal. But honestly, I'd ship this as-is unless you expect hundreds of registrations.

🟢 Side Effect in add() (llm_registry.py)

add() now modifies the LLM being added (via reset_metrics()). Registries typically don't mutate their inputs.

But: You documented this clearly in the docstring, and it solves a real problem. Pragmatism wins over purity here. Just noting it for future maintainers.

🟢 Good: Tests Prove the Fix (test_llm_registry.py)

Your tests actually exercise the real scenario:

  1. Create original LLM
  2. Copy with model_copy()
  3. Register both
  4. Verify metrics are independent

This is exactly what was needed. Well done.


Verdict

Approve - Fixes issue #418 pragmatically without over-engineering. The suggestions above are minor and don't block merging.

Key insight: When you can't change library behavior (Pydantic shallow copy), design around it rather than fighting it.

@xingyaoww
Copy link
Copy Markdown
Collaborator Author

@OpenHands can you review diff against main, understand the purpose of this PR, then help me reproduce the original issue, and I believe this PR contains a test case that will reproduce the original issue.Can you try to first Git reset to the original commit where no fix is applied,Make sure the test failed,and then you can reset your Git commit to the latest state of this PR and then rerun the same test again.Make sure that this PR actually fixed the original issue

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 12, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 12, 2026

Summary: PR #1997 Verification Complete

I successfully verified that this PR fixes issue #418 (condensation breaks metrics accumulation) by reproducing the original issue and confirming the fix works.

What I Did

  1. Reviewed the PR diff - The fix is implemented in LLMRegistry by:

    • Adding reset_metrics() method to LLM class that resets _metrics and _telemetry to None
    • Modifying metrics and telemetry properties to lazily initialize if None
    • Adding _ensure_independent_metrics() method to LLMRegistry that detects shared metrics and resets them
    • Tracking metrics object IDs in _metrics_ids set to detect sharing
  2. Reproduced the bug on pre-fix commit (7fe69a0):

    Before registering:
      original.metrics id: 134986697522576
      copied.metrics id: 134986697522576
      Metrics are shared: True
    
    After registering:
      original.metrics id: 134986697522576
      copied.metrics id: 134986697522576
      Metrics are shared: True
    
    ❌ TEST FAILED: BUG: Metrics are still shared after registering!
    
  3. Verified the fix on PR branch (b700c2f):

    Before registering:
      original.metrics id: 131807657327936
      copied.metrics id: 131807657327936
      Metrics are shared: True
    
    After registering:
      original.metrics id: 131807657327936
      copied.metrics id: 131807657494160
      Metrics are shared: False
    
    ✅ TEST PASSED: Metrics are independent after registering!
    
  4. Ran the PR's test suite - All 3 new tests pass:

    • test_llm_registry_ensures_independent_metrics_for_copied_llms
    • test_llm_registry_ensures_independent_telemetry_for_copied_llms
    • test_llm_registry_does_not_reset_metrics_for_independent_llms

Conclusion

This PR correctly fixes the original issue. When LLM.model_copy() is used (e.g., to create a condenser LLM from an agent LLM), the LLMRegistry now detects shared metrics and creates independent instances, preventing metrics from being double-counted.

View full conversation

@xingyaoww xingyaoww marked this pull request as ready for review February 12, 2026 11:13
@xingyaoww xingyaoww requested a review from enyst February 12, 2026 11:16
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable - But Incomplete Fix

This fixes the reported bug (#418) in the specific scenario (agent + condenser using registry), but the solution is a pragmatic workaround rather than a root cause fix.


Critical Issues

🟠 Incomplete Fix - Registry-Dependent Solution

File: openhands-sdk/openhands/sdk/llm/llm.py (lazy metrics initialization)

The core problem: model_copy() still creates shared metrics. You've added a registry-layer patch to detect and fix it, but this means:

# THIS STILL BREAKS (shared metrics):
original = LLM(model="gpt-4", api_key=SecretStr("test"))
copied = original.model_copy(update={"usage_id": "copied"})
original.metrics.add_cost(1.0)
print(copied.metrics.accumulated_cost)  # 1.0 - STILL SHARED!

Anyone using model_copy() outside the registry still gets shared metrics. The bug lives at the LLM level, not the registry level.

Simpler alternative:

def model_copy(self, *, update=None, deep=False):
    copied = super().model_copy(update=update, deep=deep)
    copied._metrics = None
    copied._telemetry = None
    return copied

This is 4 lines and fixes the bug at the source for all cases, not just registry usage.

🟠 Registry Side Effects - Surprising Behavior

File: openhands-sdk/openhands/sdk/llm/llm_registry.py (lines 80-105, _ensure_independent_metrics)

registry.add() now mutates the LLM by calling reset_metrics(). This violates the principle of least surprise:

llm = create_my_llm()
llm.metrics.add_cost(5.0)  # Track some usage
registry.add(llm)  # SURPRISE! Your metrics just got reset

A registry's add() method should register the object, not modify it. This is a leaky abstraction - the registry is now responsible for fixing copy-related bugs in LLM.


Improvement Opportunities

🟡 Lazy Init Removes Safety Guarantees

File: openhands-sdk/openhands/sdk/llm/llm.py (lines 507-514)

Changed from assertions (fail-fast) to lazy initialization (create-on-demand):

# Old: Explicit failure if not initialized
assert self._metrics is not None, "Metrics should be initialized"

# New: Silently create if missing
if self._metrics is None:
    self._metrics = Metrics(...)

This hides bugs. If there's a code path where metrics aren't properly initialized, you'll never know - they'll just be created lazily with potentially incorrect state.

🟡 Memory Leak Potential

File: openhands-sdk/openhands/sdk/llm/llm_registry.py (line 50)

self._metrics_ids: set[int] = set()

This set grows forever. In a long-lived registry:

  • Add 1000 LLMs → track 1000 IDs
  • LLMs get garbage collected → IDs remain in the set
  • No cleanup mechanism exists

Not critical (IDs are small ints), but it's sloppy.

🟡 Missing Test for Failure Case

File: tests/sdk/llm/test_llm_registry.py

Your tests verify the registry fixes shared metrics, but don't test that the bug still exists outside the registry. Consider adding:

def test_model_copy_without_registry_still_shares_metrics():
    """Document that the underlying bug exists if registry isn't used."""
    from pydantic import SecretStr
    
    original = LLM(model="gpt-4o", api_key=SecretStr("test-key"))
    copied = original.model_copy(update={"usage_id": "copied"})
    
    # Without registry, metrics are still shared (the incomplete fix)
    assert original.metrics is copied.metrics
    
    original.metrics.add_cost(1.0)
    assert copied.metrics.accumulated_cost == 1.0  # Bug still exists

If you're shipping an incomplete fix, at least document the limitation explicitly.


Verdict

🟡 Worth merging - Fixes the reported bug in practice, but the design is compromised

KEY INSIGHT: You've built a registry-layer detection system to patch a problem that could be fixed in 4 lines at the LLM level with a model_copy() override. The tests pass and it works for the agent+condenser use case, but you've traded simplicity for avoiding a straightforward model_copy() override.

@enyst
Copy link
Copy Markdown
Collaborator

enyst commented Feb 12, 2026

Ugh, OH fights with itself, and the last comment sounds quite bad, if it's really the case. 😓

Please let me take a step back here. I feel I have a question:

  • why are we using model_copy or copy in the first place?
  • these are independent LLM instances, after all
  • because we don't want to force the user to enter another api key, so we'd like to default them to the same LLM profile as they have already defined.

If so, the way forward here is to load the default profile for 'different_usage_id' as long as the user hasn't specified another. That has to pass through the LLMRegistry, it seems to me. And that won't be a model_copy, just initialize a new LLM from some profile. 🤔

Copy link
Copy Markdown
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this is actually better than overriding model_copy for a situation that is a bit of a temporary shortcut in the first place...

Unless you hate it, I'd go with it, and refactor later - this way, only llm and registry will need changes, but they do anyway; and we're not fighting Pydantic.

@enyst enyst changed the title fix(llm): ensure model_copy creates independent metrics and telemetry fix(llm): ensure second LLM gets independent metrics and telemetry Feb 12, 2026
@xingyaoww
Copy link
Copy Markdown
Collaborator Author

@OpenHands, please clone OpenHands/Openhands-CLI and OpenHands/OpenHands, and search for all occurrences of model_copy for the LLM class (including in the current repo) -- Tell me, how will this PR impact that code? Will it introduce some weird behaviors?

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 12, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 12, 2026

Analysis: Impact of PR #1997 on model_copy Usage for LLM Class

I cloned and searched all three repositories for model_copy usages on the LLM class. Here's my analysis:

Summary of model_copy Usages Found

software-agent-sdk (5 usages):

  • llm.model_copy(update={"usage_id": "condenser"}) - in default.py, gpt5.py, gemini.py, planning.py (condenser LLMs)
  • parent_llm.model_copy(update={"stream": False}) - in delegate/impl.py (sub-agent LLMs)

OpenHands (4 usages):

  • agent.llm.model_copy(update={'usage_id': 'condenser'}) - in experiment code
  • llm.model_copy(...) - in app_conversation services for updating LLM configurations

Openhands-CLI (7 usages):

  • llm.model_copy(update={"usage_id": "condenser"}) - in utils.py, settings/utils.py
  • Various llm.model_copy() calls in agent_store.py for configuration updates

Will This PR Introduce Weird Behaviors?

Short answer: No, but with caveats.

How the fix works:

  1. The PR modifies LLMRegistry.add() to detect when an LLM shares metrics with an already-registered LLM
  2. When detected, it calls llm.reset_metrics() to create fresh, independent metrics
  3. The metrics and telemetry properties now lazily initialize if None

Why it's safe:

  1. All LLMs go through the registry: In both LocalConversation and RemoteConversation, LLMs are registered via agent.get_all_llms()llm_registry.add(llm). The fix will apply.

  2. Sub-agents get their own registry: When delegate creates sub-agents via model_copy(), each LocalConversation creates its own LLMRegistry, so the fix applies there too.

  3. Order is correct: The original LLM keeps its metrics; the copied LLM gets fresh metrics (since it's registered second).

  4. Deep copy works correctly: Pydantic's model_copy(deep=True) already creates independent private attributes, but none of the LLM usages use deep=True.

Potential edge cases (unlikely to cause issues):

  1. LLMs not registered: If someone creates an LLM via model_copy() but never registers it, metrics will still be shared. However, this is an unusual pattern not seen in the codebase.

  2. Metrics before registration: If an LLM accumulates metrics before registration and is detected as sharing metrics, those metrics would be lost. But in practice, LLMs are registered before use.

  3. Telemetry callbacks: When reset_metrics() is called, any telemetry callbacks would be lost. But LLMs are typically registered before callbacks are set.

Conclusion

This PR is a safe and correct fix for the metrics accumulation issue. It properly handles all the model_copy patterns found across the three repositories by ensuring the LLMRegistry creates independent metrics for copied LLMs. The fix is transparent to existing code - no changes needed in OpenHands or Openhands-CLI.

View full conversation

@xingyaoww xingyaoww merged commit fa4583c into main Feb 13, 2026
30 checks passed
@xingyaoww xingyaoww deleted the openhands/fix-condenser-metrics-accumulation branch February 13, 2026 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Condensation breaks the metrics accumulation

5 participants