Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

Reduces app startup time by deferring tokenizer module loading until first use.

Changes

  • Converted ai-tokenizer imports to dynamic imports
  • Uses /4 character approximation until tokenizer modules are loaded
  • Background loading starts on first getTokenizerForModel() call
  • Cached tokens use accurate count once modules are loaded

Performance

Before: ~8.83 seconds baseline startup
After: Tokenizer modules are no longer loaded during initialization

Testing

Added CMUX_DEBUG_START_TIME environment variable to measure baseline startup time without full initialization:

time CMUX_DEBUG_START_TIME=1 make start

Generated with cmux

When set to '1', the app exits immediately after logging version info.
This allows measuring the baseline startup time without full initialization.
- Convert ai-tokenizer imports to dynamic imports
- Use /4 approximation until tokenizer modules are loaded
- Background load starts on first getTokenizerForModel call
- Cached tokens use accurate count once loaded

This should significantly reduce initial app startup time by deferring
the expensive tokenizer module loading.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Addresses Codex P1 feedback: The previous implementation always used
async callbacks, which caused countTokensCached to return approximations
even after tokenizer modules were loaded.

Now:
- If tokenizer is loaded: use synchronous callback → accurate counts
- If tokenizer is loading: use async callback → approximation until loaded

This ensures token budgeting, cost estimation, and max-token checks
use accurate numbers once the tokenizer is ready.
When unresolved review comments are found, now shows the thread ID
and suggests the exact command to resolve it:

  ./scripts/resolve_codex_comment.sh <thread_id>
Export loadTokenizerModules() and preload it in test environment setup.
This ensures accurate token counts before API calls are made.

Root cause: On slow CI machines, the tokenizer modules weren't loaded
by the time the first API call happened, causing /4 approximation to
drastically overestimate tokens (215k instead of actual count), which
exceeded the 200k limit and caused streams to never start.

Now tests preload the tokenizer during environment setup, ensuring
accurate counts from the first call.
Moved tokenizer preloading from createTestEnvironment() to
setupWorkspace() to avoid slowing down tests that don't make API calls.

Tests using setupWorkspace() need accurate token counts, while tests
like doubleRegister don't make API calls and were timing out waiting
for the tokenizer to load.
@ammario ammario enabled auto-merge October 13, 2025 15:48
@ammario ammario added this pull request to the merge queue Oct 13, 2025
@ammario ammario removed this pull request from the merge queue due to a manual request Oct 13, 2025
Start loading tokenizer modules in background immediately after app.whenReady().
This ensures the tokenizer is loaded by the time e2e tests make their first
API call, preventing /4 approximation from causing token count overflow.

The loading is non-blocking (void promise) so it doesn't slow down window
creation or affect the startup time improvement.
@ammario ammario enabled auto-merge October 13, 2025 15:57
@ammario ammario added this pull request to the merge queue Oct 13, 2025
Merged via the queue into main with commit 4463436 Oct 13, 2025
7 checks passed
@ammario ammario deleted the startup-slow branch October 13, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants