Skip to content

[Bug]: KvEvent Metrics, usedNumBlocks, can have negative block sizes in disagg/prefill mode #11879

@michaelfeil

Description

@michaelfeil

System Info

To my surprise, using tensorrt-llm 1.3.0rc4, I have seen these metrics:

prefillworker-1       | 2026-03-04T00:39:03.034016Z [warning  ] Received negative values for kv blocks: kv_active_block: -2, kv_total_blocks: 7934. Setting them to 0 in published metrics. [common.publisher] file=/workspace/trtllm/common/publisher.py line=385
stats = await self._llm_engine.get_kv_cache_events_async(timeout=5)

async for stat in stats:
            request_active_slots = stat["numActiveRequests"]
            request_total_slots = stat["maxNumActiveRequests"]
            kv_active_block = stat["kvCacheStats"]["usedNumBlocks"]
            kv_total_blocks = stat["kvCacheStats"]["maxNumBlocks"]

Who can help?

Just using a minimal example for disagg prefill, with prefill first

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Disaggregated serving<NV>Deploying with separated, distributed components (params, kv-cache, compute). Arch & perf.bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions