-
Notifications
You must be signed in to change notification settings - Fork 14
🤖 Lazy load ai-tokenizer to reduce startup time #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When set to '1', the app exits immediately after logging version info. This allows measuring the baseline startup time without full initialization.
- Convert ai-tokenizer imports to dynamic imports - Use /4 approximation until tokenizer modules are loaded - Background load starts on first getTokenizerForModel call - Cached tokens use accurate count once loaded This should significantly reduce initial app startup time by deferring the expensive tokenizer module loading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Addresses Codex P1 feedback: The previous implementation always used async callbacks, which caused countTokensCached to return approximations even after tokenizer modules were loaded. Now: - If tokenizer is loaded: use synchronous callback → accurate counts - If tokenizer is loading: use async callback → approximation until loaded This ensures token budgeting, cost estimation, and max-token checks use accurate numbers once the tokenizer is ready.
When unresolved review comments are found, now shows the thread ID and suggests the exact command to resolve it: ./scripts/resolve_codex_comment.sh <thread_id>
Export loadTokenizerModules() and preload it in test environment setup. This ensures accurate token counts before API calls are made. Root cause: On slow CI machines, the tokenizer modules weren't loaded by the time the first API call happened, causing /4 approximation to drastically overestimate tokens (215k instead of actual count), which exceeded the 200k limit and caused streams to never start. Now tests preload the tokenizer during environment setup, ensuring accurate counts from the first call.
Moved tokenizer preloading from createTestEnvironment() to setupWorkspace() to avoid slowing down tests that don't make API calls. Tests using setupWorkspace() need accurate token counts, while tests like doubleRegister don't make API calls and were timing out waiting for the tokenizer to load.
Start loading tokenizer modules in background immediately after app.whenReady(). This ensures the tokenizer is loaded by the time e2e tests make their first API call, preventing /4 approximation from causing token count overflow. The loading is non-blocking (void promise) so it doesn't slow down window creation or affect the startup time improvement.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reduces app startup time by deferring tokenizer module loading until first use.
Changes
getTokenizerForModel()callPerformance
Before: ~8.83 seconds baseline startup
After: Tokenizer modules are no longer loaded during initialization
Testing
Added
CMUX_DEBUG_START_TIMEenvironment variable to measure baseline startup time without full initialization:time CMUX_DEBUG_START_TIME=1 make startGenerated with
cmux