🤖 feat: update Gemini Flash to Gemini 3.5 Flash#3334
Conversation
|
/coder-agents-review |
|
@codex review |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Well-structured model repoint with good separation of concerns: shared Flash detection helper prevents policy/provider drift, defensive xhigh/max clamping replaces an unsafe type assertion, and Flash off-to-minimal explicitly sends what the old code left to server defaults. Test coverage is thorough, ratio is solid, and the diff is proportional.
Severity count: 3 P2, 4 P3, 1 Nit.
The P2s are a wrong knowledge cutoff (trivial fix, verified against GA-day sources), versioned Flash ID fallthrough in the exact-match Set, and a misleading test name that hides a coverage gap for Pro+off. The P3s are naming debt in two locations plus a comment/test gap.
Pariston tried to break the change and couldn't: "I tried to build a case against this change and could not. The problem is correctly understood across all four framings."
Process note: the commit subject lacks a type prefix (feat(knownModels): repoint gemini-flash alias to Gemini 3.5 Flash or similar would match the PR title convention).
🤖 This review was automatically generated with Coder Agents.
|
Addressed coder-agents-review findings:
Validation rerun after these fixes:
|
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c4505dd4e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
All 8 R1 findings (3 P2, 4 P3, 1 Nit) addressed in a single clean fix commit. Each fix targets the root cause: versioned ID detection uses prefix matching instead of adding one more entry to the Set, the mislabeled test was both renamed and supplemented with the missing Pro+off coverage, and naming changes reflect actual semantics rather than implementation details. Fix-to-finding ratio is 1:1 with no scope drift.
R2 panel (9 reviewers): 6 found no new issues; 3 raised minor gaps (3 P3, 3 Nit). The P3s are untested branches in the new code: the legacy Flash model's behavior change, the flash-lite exclusion guard, and a JSDoc that omits the new Flash branch. The Nits are polish on the new doc comment, the now-single-entry Set name, and missing source URL in metadata.
Pariston again tried to break the change: "I tried to build a case against this change and couldn't. The problem is correctly understood, the solution is proportional, and the fix is at the right level."
src/common/utils/thinking/policy.ts:53
P3 [DEREM-12] getThinkingPolicyForModel JSDoc rules list (lines 42-58) enumerates every model branch but omits Gemini Flash entirely. The code has a dedicated Flash branch at line 112-114 returning ["off", "low", "medium", "high"], but the JSDoc only shows gemini-3 → ["low", "high"]. Someone reading the doc to understand Flash levels gets the Pro policy instead.
Add a rule line:
* - gemini-3.5-flash (and gemini-3-flash-preview) → ["off", "low", "medium", "high"] (Flash thinking levels)
(Leorio)
🤖
🤖 This review was automatically generated with Coder Agents.
|
Addressed the Codex finding about versioned Gemini 3 Flash Preview IDs:
Validation rerun:
|
|
Addressed coder-agents-review round 2 findings:
Validation rerun after these fixes:
|
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70d6ea6272
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
5 of 6 R2 findings addressed (DEREM-10, 11, 13, 14, 15). One finding is unaddressed:
DEREM-12 (P3, policy.ts:53): getThinkingPolicyForModel JSDoc rules list omits the Gemini Flash branch entirely. The code has a dedicated Flash branch at line 112-114 returning ["off", "low", "medium", "high"], but the JSDoc only shows gemini-3 → ["low", "high"]. Someone reading the doc to understand Flash thinking levels gets the Pro policy instead.
Further review is blocked until DEREM-12 is addressed (fix or explicit response). The fix is a one-line JSDoc addition.
🤖 This review was automatically generated with Coder Agents.
|
Addressed the latest review feedback:
Validation rerun:
|
|
/coder-agents-review |
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
All prior findings addressed. DEREM-12 (the R3 blocker) is fixed. DEREM-3 (knowledge cutoff) is resolved in the author's favor: Google DeepMind's official model card at deepmind.google/models/gemini/flash/ lists "Knowledge cutoff: January 2025" for Gemini 3.5 Flash. The third-party aggregators (llm-stats.com, handyai, felloai) that reported January 2026 were wrong. The R1 finding was based on those aggregators; the author's reversion to "2025-01" based on the primary source was correct. Apologies for the churn.
R4 panel (8 reviewers): 5 found no new issues. New findings are 1 P3 and 2 Nits.
Pariston investigated the source conflict independently and reached the right conclusion: "Google DeepMind's own model card lists 'Knowledge cutoff: January 2025' for the 3.5 Flash row. The third-party aggregators appear to be inferring or copying from each other rather than from Google's published model card."
🤖 This review was automatically generated with Coder Agents.
|
Addressed coder-agents-review round 4 findings:
Validation rerun:
|
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7ed21f11db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed Codex feedback about non-preview Gemini 3 Flash IDs:
Validation rerun:
|
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 769e8977d6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
All 18 findings from rounds 1-4 are addressed. R5 panel (6 reviewers): no new findings above Note.
One observation worth flagging: Hisoka noted that the old curated Flash (gemini-3-flash-preview) inherited max_pdf_size_mb: 30 from models.json, but the new gemini-3.5-flash entry in models-extra.ts does not include it. Three call sites guard on caps?.maxPdfSizeMb !== undefined and skip validation when absent, meaning oversized PDFs now pass the client and fail at the Google API instead of getting a clean rejection. If Gemini 3.5 Flash has the same 30MB ceiling, adding max_pdf_size_mb: 30 restores parity.
This PR is in good shape. Clean test coverage, proportional scope, all review findings addressed across 5 rounds.
🤖 This review was automatically generated with Coder Agents.
|
Addressed Codex feedback about namespaced Google model IDs:
Validation rerun:
|
|
/coder-agents-review |
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
All prior findings addressed. R6 delta (18 lines) broadens old Flash detection to cover non-preview IDs (gemini-3-flash, gemini-3-flash-001). Netero and 5 of 6 panel reviewers found no issues. One P3 from Bisky: the new gemini-3-flash-lite exclusion guard lacks its own test (sibling of the gemini-3.5-flash-lite guard that DEREM-11 covered).
🤖 This review was automatically generated with Coder Agents.
|
Addressed coder-agents-review DEREM-19:
Validation rerun:
|
|
/coder-agents-review |
|
@codex review |
There was a problem hiding this comment.
DEREM-19 (P3) is unaddressed: gemini-3-flash-lite exclusion guard at policy.ts:37 has no test. The R7 commit addressed Codex's namespaced ID feedback but did not respond to DEREM-19.
Further review is blocked until DEREM-19 is addressed. The fix is one line:
expect(isGeminiFlashThinkingLevelModelName("gemini-3-flash-lite")).toBe(false);🤖 This review was automatically generated with Coder Agents.
|
Clarifying DEREM-19: this is already addressed at PR head The requested assertion exists in expect(isGeminiFlashThinkingLevelModelName("gemini-3-flash-lite")).toBe(false);It is in the test named Validation after adding it:
|
|
/coder-agents-review |
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
All 19 findings from rounds 1-6 are closed. DEREM-19 (the last open P3) is fixed. R8 panel (6 reviewers): zero new findings. Netero clean. 214 tests pass.
This PR is ready for human review and merge.
🤖 This review was automatically generated with Coder Agents.
Summary
Updates the curated Gemini Flash slot so the stable \ alias now resolves to , with matching local metadata, docs, and provider thinking controls.
Background
Gemini Flash is a stable user-facing alias in Mux. The new Gemini 3.5 Flash release should be the first-class Flash target without adding a separate curated preview entry for the older Gemini 3 Flash Preview model.
Implementation
Validation
Risks
Low-to-moderate risk, scoped to model selection, model metadata, and Google thinking options. Existing Gemini 3.1 Pro behavior is covered by tests and left unchanged.
📋 Implementation Plan
Plan: Repoint Gemini Flash to Gemini 3.5 Flash
Decision
Use Option A: update the existing first-class Flash slot so
gemini-flashtracks the latest Flash tier.google:gemini-3-flash-previewto the Gemini 3.5 Flash API model ID after verifying the exact ID from Google API/AI Studio (gemini-3.5-flashis the likely ID, but the implementer must confirm against an API model list or official developer docs before committing metadata).gemini-flashas the stable user-facing alias.gemini-3-flash-previewunless verification shows the old preview must remain curated for compatibility.Recommended approach net product LoC estimate: ~45–75 LoC if local
models-extra.tsmetadata is needed; ~20–35 LoC ifbun scripts/update_models.tsnow pulls complete LiteLLM metadata. This excludes tests, docs, and generatedmodels.jsonchurn.Evidence and constraints
src/common/constants/knownModels.ts;KNOWN_MODELS, aliases, tokenizer overrides, and selector built-ins derive fromMODEL_DEFINITIONS.GEMINI_31_PRO→google:gemini-3.1-pro-preview, aliasesgemini,gemini-pro.GEMINI_3_FLASH→google:gemini-3-flash-preview, aliasgemini-flash.gemini-flash, implying it should track latest Flash.src/common/utils/tokens/models.jsonprobe foundgemini-3-flash-preview, but notgemini-3.5-flash.src/common/constants/knownModels.test.tswill fail unless the newproviderModelIdexists in eithermodels.jsonormodels-extra.ts.gemini-3.5-flash-style ID:includes("gemini-3-flash")misses it, while genericincludes("gemini-3")catches it as Pro-style.Phase 0 — Verify exact provider facts before editing
listModelscall using a configured Google API key, if available.bun scripts/update_models.ts, orminimal,low,medium,highon the Google API side.off,low,medium,high, mappingoffto Googleminimalfor Flash models that do not support true thinking-off.Quality gate: record the exact source used for model ID, limits, pricing, and thinking levels in code comments near local metadata or provider mapping if official docs are incomplete/ambiguous.
Phase 1 — Repoint the curated model registry
Edit
src/common/constants/knownModels.ts:Keep the existing
GEMINI_3_FLASHkey by default for a minimal Option A diff. Add or update its comment to say it tracks the latest Flash tier. Only rename toGEMINI_35_FLASHifrg "GEMINI_3_FLASH"shows negligible references and the resulting diff is smaller/clearer.Set
providerModelIdto the verified API ID, expected:providerModelId: "gemini-3.5-flash"Keep only the stable alias unless product explicitly wants version-specific slash aliases:
Users can still select the exact full model string with
/model google:gemini-3.5-flash; avoiding a version alias minimizes future cleanup.Keep tokenizer override unless
ai-tokenizerhas added a better exact tokenizer:tokenizerOverride: "google/gemini-2.5-pro"Quality gate: run
bun test src/common/constants/knownModels.test.tsafter metadata work; alias uniqueness and token metadata coverage should pass. Add a targeted alias assertion if not already covered by nearby tests:MODEL_ABBREVIATIONS["gemini-flash"] === "google:<verified-id>"orresolveModelAlias("gemini-flash") === "google:<verified-id>".Phase 2 — Add or refresh token/capability metadata
Preferred path:
bun scripts/update_models.tsbefore adding manual metadata."gemini-3.5-flash", with complete pricing/context/capability fields.gemini/gemini-3.5-flash,knownModels.test.tswill still fail for agoogle:known model; add a bare-key fallback inmodels-extra.tsinstead of relying on scoped-only metadata.Fallback path if LiteLLM is not updated, creates broad unrelated churn, or lacks a bare key:
src/common/utils/tokens/models-extra.tskeyed by the bare provider model ID, expected"gemini-3.5-flash".max_input_tokens: 1048576max_output_tokens: 65536input_cost_per_tokenandoutput_cost_per_tokenfrom a verified pricing sourcecache_read_input_token_costonly if the verified pricing source confirms context-cache pricinglitellm_provider: "vertex_ai-language-models"mode: "chat"supports_function_calling: truesupports_vision: truesupports_pdf_input: truesupports_reasoning: truesupports_response_schema: trueknowledge_cutoff: "2025-01"ModelDatainterface inmodels-extra.tsto include:supports_audio_input?: booleansupports_video_input?: booleanQuality gate: add/adjust
src/common/utils/tokens/modelStats.test.tsandsrc/common/utils/ai/modelCapabilities.test.tsonly around behavior that matters: context size, nonzero pricing, and media support. Avoid tautological tests that only repeat static prose.Phase 3 — Fix Gemini Flash thinking policy and provider mapping
Edit
src/common/utils/thinking/policy.ts:Replace literal substring detection for Flash with a narrow helper that matches only verified chat Flash IDs, for example:
Use the helper before the generic Gemini 3/3.1 Pro branch. Avoid a broad regex that accidentally treats
gemini-3.1-flash-lite-preview, image, TTS, or other non-chat variants as the same model.Return Mux levels for verified Flash chat models:
Keep Pro behavior separate. If current docs now say Gemini 3.1 Pro supports
medium, decide whether to broaden Pro in a separate change; do not conflate that with Gemini 3.5 Flash support unless required by failing tests or verified product behavior.Edit
src/common/utils/ai/providerOptions.tsas a required part of this change:Reuse the same Flash detection helper, or extract a tiny shared helper, so policy and provider option mapping cannot drift.
The current Google branch sends
thinkingConfig.thinkingLevelforcapModelName.includes("gemini-3");gemini-3.5-flashshould still enter that branch.For verified Flash chat models, map Mux
offto Googleminimaland do not setincludeThoughtsfor that lowest mode unless verified docs require it:Do not rely on omitting
thinkingConfig; Gemini 3.5 Flash may default tomedium, which would make Muxoffmisleading.For Flash
low,medium, andhigh, pass through the level and keepincludeThoughts: true:If
xhighormaxsomehow reaches provider mapping despite policy enforcement, defensively map tohighrather than throwing in the request path. Add a short comment that policy should clamp before provider options, but the provider adapter avoids sending invalid Google values.Quality gate: extend
src/common/utils/thinking/policy.test.tsandsrc/common/utils/ai/providerOptions.test.tsto prove:google:gemini-3.5-flashgetsoff/low/medium/high.mux-gateway:google/gemini-3.5-flashbehaves the same.openrouter:google/gemini-3.5-flashbehaves correctly if current normalization supports it.offmaps to{ thinkingConfig: { thinkingLevel: "minimal" } }withoutincludeThoughtsunless docs prove otherwise.mediummaps to{ thinkingConfig: { includeThoughts: true, thinkingLevel: "medium" } }.mappedToModel: "google:gemini-3.5-flash"uses Flash mapping for policy/provider options.Phase 4 — Update docs and generated/model-adjacent outputs
scripts/gen_docs.tsoutput sodocs/config/models.mdxlists:Gemini 3.5 Flashgoogle:<verified-id>gemini-flashsrc/common/utils/ai/modelDisplay.test.tscase. The current generic Gemini formatter likely needs no production change, but a dotted-version expectation is cheap if touched nearby.KNOWN_MODELS.GEMINI_3_FLASHreferences only if the key is renamed. If the key is kept, no reference churn is expected.Quality gate: do not hand-edit generated docs if an existing generation script owns the table; run the generator and keep only expected diffs.
Phase 5 — Validation
Run targeted tests first:
Then run broader checks:
If
bun scripts/update_models.tsproduces broad generated churn, inspect whether it is acceptable; if too broad, prefermodels-extra.tsfor this targeted launch support.Phase 6 — Dogfooding plan
Because this is a model-selection/provider behavior change, dogfood in the desktop app with a configured Google provider.
Start Mux:
In Settings → Providers, confirm Google is configured and enabled.
Use the model selector and confirm:
Gemini 3.5 Flashappears.gemini-flashresolves togoogle:<verified-id>.Gemini 3 Flash Previewis no longer the curatedgemini-flashtarget.Send smoke prompts at all Flash thinking levels:
off/ numeric0lowmediumhighUse
agent-browserto capture reviewer evidence:gemini-flash.Multimodal smoke check if provider/API key allows it:
Acceptance criteria
gemini-flashresolves to the verified Gemini 3.5 Flash Google model ID.models.jsonormodels-extra.ts.off/low/medium/high.offto Googleminimalfor Gemini 3.5 Flash instead of accidentally using the API default, and omitincludeThoughtsfor this lowest mode unless docs prove otherwise.low/medium/highthrough withincludeThoughts: true.Risks and mitigations
off→minimaland absence ofincludeThoughtsfor the lowest mode unless docs require it.models.jsonrefresh touches many unrelated entries or lacks a bare key, usemodels-extra.tsfor a surgical release.google:gemini-3-flash-previewexplicitly can still use it as a custom model; only the curatedgemini-flashalias changes._Generated with [](https://github.com/coder/mux) • Model: \ • Thinking: \ • Cost: _