-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Closed
Copy link
Labels
MonitoringTo group issues regarding Monitoring functionalityTo group issues regarding Monitoring functionality
Milestone
Description
Title: Token count inflated due to duplicate/transformed messages in monitoring
Description:
Token counting for monitoring (Tokens In/Out) is inaccurate because the indexer receives duplicate or transformed versions of the same user message. For a simple input like "hi," three messages are counted instead of one, inflating the token count significantly.
Steps to reproduce:
- Start a conversation with an LLM (e.g., GPT-4o).
- Send a simple message (e.g., "hi").
- Check the Tokens In count in Monitoring.
Actual Result:
For input "hi," the indexer receives and counts 3 messages:
[
SystemMessage(content='You are a helpful assistant.', ...), # 18 tokens
HumanMessage(content=[{'type': 'text', 'text': 'hi'}], ...), # 46 tokens
HumanMessage(content='hi', ...) # 14 tokens
]Total: 78 tokens in (expected: ~14–18 tokens for "hi" + system message).
Expected Result:
Token count should reflect the actual tokens sent to the LLM:
- System message: ~18 tokens
- User message "hi": ~14 tokens
- Total: ~32 tokens (not 78)
Notes (optional):
- The issue is in
num_tokens_from_messages()in the indexer's worker utils. - Duplicate/transformed messages (e.g., structured content
[{'type': 'text', 'text': 'hi'}]vs plain'hi') are being counted separately. - Action required: Review and rewrite
num_tokens_from_messages()to:- Deduplicate messages before counting.
- Normalize message formats (structured vs plain text).
- Count only the final payload sent to the LLM.
Metadata
Metadata
Assignees
Labels
MonitoringTo group issues regarding Monitoring functionalityTo group issues regarding Monitoring functionality
Type
Projects
Status
Done