Skip to content

[BUG] Token count inflated due to duplicate/transformed messages in monitoring #2376

@epamLDadayan

Description

@epamLDadayan

Title: Token count inflated due to duplicate/transformed messages in monitoring

Description:
Token counting for monitoring (Tokens In/Out) is inaccurate because the indexer receives duplicate or transformed versions of the same user message. For a simple input like "hi," three messages are counted instead of one, inflating the token count significantly.

Steps to reproduce:

  1. Start a conversation with an LLM (e.g., GPT-4o).
  2. Send a simple message (e.g., "hi").
  3. Check the Tokens In count in Monitoring.

Actual Result:
For input "hi," the indexer receives and counts 3 messages:

[
   SystemMessage(content='You are a helpful assistant.', ...),  # 18 tokens
   HumanMessage(content=[{'type': 'text', 'text': 'hi'}], ...),  # 46 tokens
   HumanMessage(content='hi', ...)  # 14 tokens
]

Total: 78 tokens in (expected: ~14–18 tokens for "hi" + system message).

Expected Result:
Token count should reflect the actual tokens sent to the LLM:

  • System message: ~18 tokens
  • User message "hi": ~14 tokens
  • Total: ~32 tokens (not 78)

Notes (optional):

  • The issue is in num_tokens_from_messages() in the indexer's worker utils.
  • Duplicate/transformed messages (e.g., structured content [{'type': 'text', 'text': 'hi'}] vs plain 'hi') are being counted separately.
  • Action required: Review and rewrite num_tokens_from_messages() to:
    • Deduplicate messages before counting.
    • Normalize message formats (structured vs plain text).
    • Count only the final payload sent to the LLM.

Metadata

Metadata

Labels

MonitoringTo group issues regarding Monitoring functionality

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions