Skip to content

[BUG] Rough token estimation in WMC may overflow Cosmos 2048 token limit #14

@OppaAI

Description

@OppaAI

Description

wmc.py uses 1 token ≈ 4 characters for token estimation. Cosmos Reason2 2B uses a sub-word tokenizer where complex vocabulary, code, emojis, or non-English text can tokenize to significantly more tokens than this estimate. The WMC budget of 1400 tokens may actually be 1600–1800 real tokens, pushing the total context over the 2048 hard limit.

Steps to Reproduce

  1. Have a conversation containing code snippets or technical terms
  2. Let WMC fill up to budget
  3. Send another message — vLLM may return 400 Bad Request

Expected Behavior

Context window always stays safely under 2048 tokens.

Actual Behavior

Rough estimation underestimates real token count. Combined with system prompt (~300 tokens) and EMC injection (~300 tokens), total may exceed 2048, causing vLLM to return an error or truncate the prompt.

Error Logs

# Possible:
[ERROR] [cnc]: ❌ vLLM HTTP 400
# Or silent truncation mid-sentence

Environment

  • Hardware: Jetson Orin Nano Super 8GB
  • OS: Ubuntu 22.04 / JetPack 6.2.2
  • ROS2: Humble
  • Python: 3.10
  • Package: scs
  • Node: wmc

Affected Files

wmc.py line 34 — CHARS_PER_TOKEN = 4
wmc.py line 37 — _estimate_tokens()

Possible Fix

Option A — Conservative budget (quick fix):

WMC_TOKEN_BUDGET = 1100  # reduce budget to give more headroom
CHARS_PER_TOKEN  = 3     # more conservative estimate

Option B — Use actual tokenizer (accurate fix):

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-2B-Instruct")

def _estimate_tokens(text: str) -> int:
    return len(tokenizer.encode(text))

Additional Context

Cosmos Reason2 2B is based on Qwen3-VL-2B. Option A is safe for M1. Option B is accurate but adds startup latency and RAM usage. Recommended: Option A now, Option B in M2.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions