From 27167f9d5f2ee7db40f88042bec5fe00c7c71712 Mon Sep 17 00:00:00 2001 From: Atharva Deosthale Date: Wed, 20 May 2026 16:20:22 +0530 Subject: [PATCH 1/7] blog(gemini-3-5-flash): add deep-dive benchmark and capability review --- .optimize-cache.json | 1 + .../gemini-3-5-flash-deep-dive/+page.markdoc | 177 ++++++++++++++++++ .../gemini-3-5-flash-deep-dive/cover.avif | Bin 0 -> 6921 bytes 3 files changed, 178 insertions(+) create mode 100644 src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc create mode 100644 static/images/blog/gemini-3-5-flash-deep-dive/cover.avif diff --git a/.optimize-cache.json b/.optimize-cache.json index ecaeb30e0c..578e19c11f 100644 --- a/.optimize-cache.json +++ b/.optimize-cache.json @@ -653,6 +653,7 @@ "static/images/blog/gdpr-mobile-apps-guide/1.png": "d3521c227ad9fa7fce40e66caa3e3f5fc982cf95086c590cc0a326031f6646d5", "static/images/blog/gdpr-mobile-apps-guide/cover.png": "11d53b8884d5ca45e7d9ba8fb904633795886d30cff6cc06a6af6b9fb7d1225f", "static/images/blog/gdpr.png": "e253390207e4d3e0ff28d3a4b94bee549aa6c8dc040bce604f5c6ff746dd9a1b", + "static/images/blog/gemini-3-5-flash-deep-dive/cover.png": "6b9257a7ba879bc37e7f81ca7d5c014e8760260dfd0d53d089016ed5c2f27f39", "static/images/blog/get-inspired-for-hackathon/1.png": "bdb21244945f4c483d23f84e5c429f548a45047a34d24d9c7f263cfca951ec3e", "static/images/blog/get-inspired-for-hackathon/2.png": "cec920ba9aa9996041e2b9134c52fdb09f91db307035334d458e47f6f116146c", "static/images/blog/get-inspired-for-hackathon/3.png": "87d6484adbe6049ab39bead992ffb57ab13bf1e1b3157b736f0bb5ad3ef1dde4", diff --git a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc new file mode 100644 index 0000000000..1b1c4ff4a8 --- /dev/null +++ b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc @@ -0,0 +1,177 @@ +--- +layout: post +title: "Gemini 3.5 Flash: a detailed benchmark and capability review" +description: "A detailed look at Gemini 3.5 Flash: what shipped at Google I/O 2026, pricing, Google's own benchmark table, Artificial Analysis numbers, and how it scores on the Appwrite Arena benchmark." +date: 2026-05-20 +cover: /images/blog/gemini-3-5-flash-deep-dive/cover.avif +timeToRead: 11 +author: atharva +category: ai +unlisted: true +faqs: + - question: "What is Gemini 3.5 Flash?" + answer: "Gemini 3.5 Flash is Google DeepMind's mid-2026 Flash-tier reasoning model, built on the Gemini 3 Flash foundation with explicit thinking levels that trade quality for cost and latency. It accepts text, images, audio, video, and PDF input, outputs up to 64K text tokens, and has a 1M token context window." + - question: "Is Gemini 3.5 Flash better than Gemini 3.1 Pro?" + answer: "On the benchmarks Google publishes, 3.5 Flash beats 3.1 Pro on Terminal-Bench 2.1 (76.2% vs 70.3%), MCP Atlas (83.6% vs 78.2%), Finance Agent v2 (57.9% vs 43.0%), and GDPval-AA Elo (1656 vs 1314). It still trails 3.1 Pro on Humanity's Last Exam, ARC-AGI-2, and the 128K MRCR v2 long-context test, so it is not a clean replacement for the Pro tier." + - question: "How much does Gemini 3.5 Flash cost?" + answer: "API pricing is $1.50 per million input tokens and $9.00 per million output tokens, with a 90% discount on cached input ($0.15 per million tokens). It is free to use in the Gemini app and inside AI Mode in Google Search." + - question: "What is the context window for Gemini 3.5 Flash?" + answer: "1 million input tokens, with a 64K token output cap. The knowledge cutoff is January 2025." + - question: "Is Gemini 3.5 Flash multimodal?" + answer: "Yes. It accepts text, images, audio, video, and PDFs as input. Output is text only. Function calling, structured output, code execution, and search-as-a-tool are all supported." + - question: "Where can I use Gemini 3.5 Flash?" + answer: "Through the Gemini app, the Gemini API, Google AI Studio, Gemini Enterprise, the Gemini Enterprise Agent Platform, Google AI Mode in Search, Google Antigravity, and Android Studio." +--- + +Gemini 3.5 Flash shipped on May 19, 2026 at Google I/O. Google positions it as "Pro-level reasoning at Flash-class latency," with the claim that a mid-tier model can carry agentic and coding workloads previously handled by the Pro tier. + +This post evaluates that claim against three data sources: Google's published model card, [Artificial Analysis](https://artificialanalysis.ai/models/gemini-3-5-flash), and [Appwrite Arena](https://arena.appwrite.io), an open-source benchmark covering 191 questions across nine Appwrite service categories. + +# Model overview + +Gemini 3.5 Flash is built on the Gemini 3 Flash reasoning foundation with explicit thinking levels that control quality, cost, and latency. The variant on the [Artificial Analysis leaderboard](https://artificialanalysis.ai/models/gemini-3-5-flash) and in most of Google's published numbers is the "high" thinking configuration. + +Model specifications: + +- **Inputs.** Text, images, audio, video, and PDFs, up to a 1M token context window. +- **Output.** Text only, with a 64K token output cap. +- **Knowledge cutoff.** January 2025. +- **Tooling.** Function calling, structured output, code execution, and search-as-a-tool are all first-party. +- **Distribution.** Gemini app, Gemini API, Google AI Studio, Gemini Enterprise, the Gemini Enterprise Agent Platform, Google Search AI Mode, Google Antigravity, and Android Studio. +- **Status.** Public preview at launch, free in the consumer Gemini app and Search AI Mode. + +# Pricing + +API pricing per million tokens: + +- **Input:** $1.50 +- **Output:** $9.00 +- **Cached input:** $0.15 (90% discount) + +How it compares: + +- **vs Gemini 3 Flash** ($0.50 / $3.00): 3x more on both input and output. +- **vs Gemini 3.1 Pro** (blended): approximately 40% cheaper. +- **Within the Flash tier:** the most expensive Flash-tier model Google has released. + +# Google's published benchmark table + +The model card lists head-to-head numbers against Gemini 3 Flash, Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7, and GPT-5.5. The full table: + +| Category | Benchmark | Gemini 3.5 Flash | Gemini 3 Flash | Gemini 3.1 Pro | Claude Sonnet 4.6 | Claude Opus 4.7 | GPT-5.5 | +| -------- | --------- | ---------------- | -------------- | -------------- | ----------------- | --------------- | ------- | +| Coding | Terminal-bench 2.1 (Terminus-2 harness) | **76.2%** | 58.0% | 70.3% | n/a | 66.1% | 78.2% | +| Coding | SWE-Bench Pro (Public, single attempt) | 55.1% | 49.6% | 54.2% | n/a | **64.3%** | 58.6% | +| Agentic | MCP Atlas (multi-step MCP workflows) | **83.6%** | 62.0% | 78.2% | 69.5% | 79.1% | 75.3% | +| Agentic | Toolathlon (real-world tool use) | **56.5%** | 49.4% | n/a | n/a | n/a | 55.6% | +| UI Control | OSWorld-Verified | 78.4% | 65.1% | 76.2% | 72.5% | 78.0% | **78.7%** | +| Expert tasks | Finance Agent v2 | **57.9%** | 42.6% | 43.0% | 51.0% | 51.5% | 51.8% | +| Expert tasks | GDPval-AA (Elo) | 1656 | 1204 | 1314 | 1676 | 1753 | **1769** | +| Multimodal | CharXiv Reasoning (no tools) | **84.2%** | 80.3% | 83.3% | 72.4% | 82.1% | 84.1% | +| Multimodal | MMMU-Pro (no tools) | **83.6%** | 81.2% | 80.5% | 74.5% | 75.2% | 81.2% | +| Multimodal | Blueprint-Bench 2 (normalized) | 33.6% | 0.0% | 26.5% | 6.7% | 24.5% | **36.2%** | +| Long context | MRCR v2 (8-needle, 128k average) | 77.3% | 67.2% | 84.9% | 84.9% | 59.3% | **94.8%** | +| Long context | MRCR v2 (1M, pointwise) | **26.6%** | 22.1% | 26.3% | n/a | n/a | n/a | +| Reasoning | Humanity's Last Exam (full set) | 40.2% | 33.7% | 44.4% | 33.2% | **46.9%** | 41.4% | +| Reasoning | ARC-AGI-2 | 72.1% | 33.6% | 77.1% | 58.3% | 75.8% | **84.6%** | + +Gemini 3.5 Flash leads Pro-class models on agentic tasks (MCP Atlas, Toolathlon, Finance Agent v2) and on multimodal reasoning (CharXiv, MMMU-Pro). It trails on academic reasoning (Humanity's Last Exam, ARC-AGI-2). For coding, results sit between 3.1 Pro and GPT-5.5 depending on the benchmark. + +The largest gain is MCP Atlas: a 21.6 point increase over Gemini 3 Flash and 4.5 points over 3.1 Pro. On MCP tool-call workloads, 3.5 Flash is Google's strongest model in the Gemini 3 series. + +# Artificial Analysis + +[Artificial Analysis](https://artificialanalysis.ai/models/gemini-3-5-flash) runs an independent evaluation suite and ranks models by Intelligence Index, a composite of 10 evaluations: GDPval-AA, 𝜏²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, and CritPt. + +Gemini 3.5 Flash on Artificial Analysis: + +- **Intelligence Index: 55** (rank #7 of 147). Top three: GPT-5.5 (xhigh) at 60.2, GPT-5.5 (high) at 58.9, Claude Opus 4.7 (max) at 57.3. +- **Speed: 278 output tokens per second** (rank #2 of 147). Faster: gpt-oss-120b (high) at 246. Slower frontier-class models: Gemini 3.1 Pro Preview at 123, GPT-5.5 (xhigh) at 64, Claude Opus 4.7 (max) at 50. +- **Verbosity: 73M tokens** generated across the Intelligence Index suite, against a leaderboard average of 36M. Verbosity counts how many output tokens the model produced to complete the eval suite. Higher means the model spent more reasoning tokens per answer, which raises latency and bill size even when the per-token price is low. +- **Cost to evaluate the Intelligence Index: $1,551.60.** That is 5.5x Gemini 3 Flash and 75% more than Gemini 3.1 Pro despite the lower per-token rate. This is the total dollar cost to run the full Intelligence Index once, combining per-token pricing and token volume. It serves as a proxy for what the model costs on heavy reasoning workloads in production. +- **Hallucination rate: 61%** on the AA hallucination measure, 31 points lower than Gemini 3 Flash. The hallucination measure is the share of responses on a fabrication-probing prompt set where the model produces incorrect or invented content. Lower is better, and a 31-point drop versus the predecessor indicates a material gain in factual reliability. + +On the intelligence-versus-speed axis, Artificial Analysis ranks Gemini 3.5 Flash as the Pareto leader. No model in the same intelligence bracket runs near 278 tokens per second. + +## Intelligence per token against SOTA peers + +Per-model summaries from Artificial Analysis: + +| Model | AA Intelligence Index | Output tokens (full Index) | Total eval cost | Speed (tok/s) | Input $/Mtok | Output $/Mtok | +| ----- | --------------------: | -------------------------: | --------------: | ------------: | -----------: | ------------: | +| GPT-5.5 (xhigh) | 60.2 | 75M | $3,357 | 65 | $5.00 | $30.00 | +| Claude Opus 4.7 (max) | 57.3 | 110M | $5,117 | 50 | $6.25 | $25.00 | +| Gemini 3.1 Pro Preview | 57.2 | 57M | $892 | 123 | $2.00 | $12.00 | +| **Gemini 3.5 Flash (high)** | **55.3** | **73M** | **$1,552** | **278** | **$1.50** | **$9.00** | +| Kimi K2.6 | 53.9 | 170M | $948 | 98 | $0.95 | $4.00 | + +Two points are worth calling out. + +**GPT-5.5 is more intelligent on a similar token budget.** GPT-5.5 (xhigh) generates 75M tokens for the full Intelligence Index against 3.5 Flash's 73M, a 3% difference. For roughly the same output token count, GPT-5.5 scores 60.2 versus 55.3. The reason GPT-5.5's eval cost lands at $3,357 against 3.5 Flash's $1,552 is per-token pricing ($5/$30 vs $1.50/$9), not token efficiency. On quality per token, GPT-5.5 leads. + +**Gemini 3.1 Pro is the sharper internal comparison.** 3.1 Pro Preview generates 57M tokens, 22% fewer than 3.5 Flash, and scores 57.2 on the Intelligence Index, 1.9 points higher. Total eval cost is $892, 42% lower than 3.5 Flash. The only axis where 3.5 Flash leads is speed: 278 tokens per second versus 3.1 Pro's 123. Google's "Pro-level reasoning at Flash-class latency" claim holds on latency. On the Intelligence Index itself, 3.5 Flash is the second-best Gemini and uses more tokens than 3.1 Pro to reach a lower score. + +# Appwrite Arena: backend SDK and API performance + +Public leaderboards measure general capability, not whether a model can drive an SDK without hallucinating method names. [Appwrite Arena](https://arena.appwrite.io) is an open-source benchmark covering 191 questions across nine Appwrite service categories: Foundation, Auth, Databases, Functions, Storage, Sites, Messaging, Realtime, and CLI. Each model is evaluated twice: once with the relevant [Appwrite Skill](/docs/tooling/ai/skills) loaded into context, and once without. Results are published on [GitHub](https://github.com/appwrite/arena). + +Top finishers on the May 20, 2026 run: + +**With Skills loaded (Skill files in context, 191 questions):** + +| Model | Overall | MCQ | Freeform | Cost (USD) | Duration | +| ----- | ------: | --: | -------: | ---------: | -------: | +| GPT 5.5 | 97.70 | 98.20 | 94.80 | $4.51 | 33m | +| Claude Opus 4.7 | 97.10 | 97.60 | 94.20 | $3.07 | 53m | +| Qwen 3.6 Plus | 96.50 | 97.60 | 89.80 | $0.58 | 54m | +| Kimi K2.6 | 96.30 | 97.00 | 91.90 | $1.64 | 135m | +| **Gemini 3.5 Flash** | **96.20** | **96.90** | **91.90** | **$3.78** | **20m** | +| DeepSeek V4 Flash | 96.10 | 96.40 | 94.20 | $0.37 | 125m | +| Gemini 3.1 Pro (Preview) | 92.70 | 93.30 | 88.80 | $4.44 | 45m | +| Gemini 3.1 Flash Lite (Preview) | 88.30 | 89.70 | 79.40 | $0.59 | 19m | + +**Without Skills (model's built-in knowledge only):** + +| Model | Overall | MCQ | Freeform | Cost (USD) | Duration | +| ----- | ------: | --: | -------: | ---------: | -------: | +| Claude Opus 4.7 | 96.20 | 96.40 | 94.80 | $1.89 | 25m | +| GPT 5.5 | 94.20 | 94.50 | 90.00 | $2.19 | 27m | +| Kimi K2.6 | 93.60 | 95.20 | 83.50 | $0.48 | 103m | +| Gemini 3.1 Pro (Preview) | 92.50 | 95.30 | 76.90 | $1.34 | 26m | +| **Gemini 3.5 Flash** | **90.70** | **92.90** | **77.50** | **$1.14** | **13m** | +| GLM 5.1 | 90.20 | 91.50 | 81.90 | $0.30 | 45m | + +Three observations from the Arena data. + +**It is the fastest model in the top tier.** 20 minutes with Skills and 13 minutes without is faster than every model scoring above 90. The next-fastest top finisher is Gemini 3.1 Flash Lite at 19 minutes with Skills, but it scores 7.9 points lower. + +**Skills materially improve the freeform score.** Without Skills, freeform scores 77.5%. With Skills, freeform reaches 91.9%, a 14.4-point increase. The same delta for GPT 5.5 is 4.8 points (94.8 to 90.0), and for Claude Opus 4.7 is 0.6 points (94.8 to 94.2). 3.5 Flash relies more on in-context documentation than its frontier peers, consistent with the January 2025 knowledge cutoff. + +**Category profile.** With Skills, 3.5 Flash scores 100% on Messaging, MCQ Foundation, MCQ Auth, MCQ Functions, and MCQ Sites, and 94.1% on Realtime. The weakest categories are TablesDB (89.1% with Skills, 77.8% without) and CLI (95.0% with Skills, 73.3% without). Both require the most current API surface, which the knowledge cutoff does not cover. + +# Workloads where 3.5 Flash is the right choice + +- **MCP-driven agents.** MCP Atlas at 83.6% is the highest result Google has published on the benchmark. For agents driving an MCP server such as [Appwrite's API MCP](/docs/tooling/ai/mcp-servers/api), 3.5 Flash is the most cost-efficient frontier option. +- **Throughput-bound multimodal pipelines.** CharXiv at 84.2% and MMMU-Pro at 83.6% at 278 tokens per second is a combination no other top-ten Intelligence Index model provides. Document ingestion with charts, audio and video reasoning, and pipelines with many small multimodal calls benefit directly. +- **Iterative coding agents on bounded scope.** Terminal-Bench 2.1 at 76.2%, a 1M context window, and the highest throughput in the top ten allow more iterations per wall-clock minute than any frontier alternative. The reasoning gap to Opus 4.7 and GPT-5.5 only becomes a constraint on research-grade tasks. + +# Model selection for Appwrite projects + +[Appwrite](https://cloud.appwrite.io) provides the primitives an agent needs to operate on a project: typed tables, scoped API keys, an [API MCP server](/docs/tooling/ai/mcp-servers/api), a [Docs MCP server](/docs/tooling/ai/mcp-servers/docs), and [Agent Skills](/docs/tooling/ai/skills) for every major SDK. The Arena results above show how each model performs against this surface. + +Speed is the column where Gemini 3.5 Flash dominates, but speed is not coding intelligence. On the Arena freeform scores and the SOTA Intelligence Index comparison above, GPT 5.5 and Claude Opus 4.7 lead 3.5 Flash by a meaningful margin on the same Appwrite coding tasks. + +Two recommended defaults: + +1. For interactive workloads where a developer waits on the response, **Gemini 3.5 Flash with the Appwrite Skill loaded** is the fastest top-tier option. Use it when iteration speed beats per-response correctness. +2. For coding work where correctness matters more than wall-clock latency, **GPT 5.5 or Claude Opus 4.7** lead. Both produce higher quality code on the same Appwrite tasks, with or without Skills loaded. + +For other cases, optimize on the price-to-throughput frontier, where 3.5 Flash sits. + +# Next steps + +Select Gemini 3.5 Flash inside a tool that supports it: Cursor, Google AI Studio, Google Antigravity, or the Gemini API directly. To connect Appwrite to the model, follow the [Cursor plugin docs](/docs/tooling/ai/ai-dev-tools/cursor) for Cursor, or the [Antigravity MCP setup docs](/docs/tooling/ai/ai-dev-tools/antigravity) for Antigravity. Both walk through adding the Appwrite API MCP and Docs MCP servers so the model can act on your project. + +- [Appwrite Arena](https://arena.appwrite.io) +- [Gemini 3.5 Flash model card](https://deepmind.google/models/model-cards/gemini-3-5-flash/) +- [Artificial Analysis: Gemini 3.5 Flash](https://artificialanalysis.ai/models/gemini-3-5-flash) diff --git a/static/images/blog/gemini-3-5-flash-deep-dive/cover.avif b/static/images/blog/gemini-3-5-flash-deep-dive/cover.avif new file mode 100644 index 0000000000000000000000000000000000000000..d54a1b14f17cdcc541bcac93d92a72bc25e5da12 GIT binary patch literal 6921 zcmZu$V{~5K){Si^jh)75j3>4l+cq0Ec7w*Y)u?eAJ9%Q;w)uSN``$b5pKpzE7Uo=Y z?6daywa*3v10ytd^>j3Hw=xI&Fgq)A*1x%(mC;8H?r3M`YV;TXFfvP1TjzgMFfd0e z6W9Od|BWb)R<3sc6yP78!^+mdZ z+5htz|FGB}h1oxCA8I3a7S_Lt|A8OOf5Bf?j?RuBWTcg&iNi-kGID1X4WRl*i)H2F zX!chL1_u4{Kmb1+nxmD6)n5b>4i4@k{M!#r_!s(^;$IH@FT-u*<{|nQc5}ApwR1GH z_{WLxnpl}Q^167sxSH9y@P71I*;*Mnd-59DI$Hj-Uqn`pcK;+F(vRsO10W#5AfSRF zphG_Z0xM^;f5-e-?n7+*&jx)AiSKJ?YUGLmhJe6hZSj`T13`dH$`-;VDb4_fCWfX2 z7687!MkLs0K#|cA*wy(>U~!yfHK+4i70Iw-q|PvSEy@_gfM};D!O)D^(&QnjU&TU7 zNF2V_=FRZP>{?O&XNowtxOse;V`G?JM!N`)zNGo1bM>kpA*<`R%w3lB5V%HH`|4k* zIYTvf*KV?lP-gBZxOlX+nar{MyRyBgXawpbE{1VoP3(!q1s*zve1PdVAX{wWZq9Uv zCJuOM))s(3B0JLCak%;uneDW-{ zgt9T{GkKmavjNvWyvkABqc1m=*HAp&5KhB*UB`F!S6GYYRI4HlCQZD9nSrJFd^TZ~Nz23wc z`rsWVTGS{0Yjv5Gi9?7Mx-UNBggjV2E~7c8Ix#LPdNzUb1=3y2p?jew)xp$k9^!Bs zch^xH2Bh-VZlondi!lXt(q)kMt^+;nK8!em++YJn+GcFKIP4xFqHcbAx{{a1bJL~Q zKNZItYe-`#?@7C-3=IKd>6$k`w1S*=KFd!U6_EO}tFq-|Wb>L4Y`_39?L7|c`PjNh z8sHrrKAYs-S1B6XE{o26DI-=EW|EaRy^iwZFbt2o8xH>Wbr zoFf2XWSo?zx7PQV2xoPYUJ+5!jr|txZKQWl z5Hub&{DAgl2SCzKK-jW)Qk%gTf2L0kf)2dr zBH$q{cKv9CXVt;*uC+F(%-Xv+Pjkl&EKP1I< z(jF=NZFMY76xQtY8U=IQ2RHGOR`&vxp}*)UI{j60%M$uwXrRU>iS|py3#y#UtBL;8 z;i9Z6eT4`$Lw$hXZ=PzjKjNF_L)YsHU0n=D;x;iZCNI@z%lTJKeqS=qi=+{gca>J_wl}rh{&78jn~Nx%6T#b-nnA0WsJz? z1P(FhFdu7>c_vsVKIk_3MHqRiulB37Aad&8;AsW>_3G>Bua`+r6K}J+f9cEQ`g`?2L z+~-QRl138|)v35=*27=@xFSpuC|EVtHFd7*93%>*D_UE|uH9A0mYUFE+k=Os#}I3b z>T8C>9osyEwb7y+1UqP%VHZ%AH^L2R7rPS%}=OnJp`iR~xsItZKL-pFWN5OKPB!|+8T35nO zRzMw*-V}sXxz|ppo^Ow2U4~eIkP0b-*Sf3y%Pns}roXxI>tH$4q0kWS)Ohu`>@@Fe zsG9(^IXFD(aSFvB_O7gYbfdUB-oqnmV+L;S?MB|)0bPMbyw^kJTmmqiF6p)-3`yv2 z3+8^e5^#+ys*WsfU;L8%2$*?(wcZNW=6caA-Dy}tN7nM?hg#m0%!QFPI36bwOK!@_ z@DC~|HDbSi*MZOjaoMmA*JY!qrMexPt=zWiMWoj!-{fGf8b@V_%ckkVx!BakdN@M$ z8w&V*(pF8!>PwIEUKXG=*~y3MOAB<8I_eb?o?DJ~N;Ww+Mh;pwRg|IhYWAra1*`1? z((vy$$c8GX?$b7zR?FT6t#Mc#3-4#x{0gEsCxx~&6(wAOeLg)9_qru#rEeB~?ljfK z4xvAk0a>`aLGcv{X$$^>s)d`%G&5P#wgx(4N%{81V64GwU+zj7??fYIOaQzDFMN%un5T@q3Dk z)OF}=zZt`_N~d#_sGLMDTO6(1qfccpcD+9HSb(ed9NQ2?yJsBKO zQX85mQ_66xDa}!YBJ=^_X_# zQjq4N=jvBU*xIM=v&+VKP7&j$>u*JecG+PxUCC?~+zgb>>&!vyVy!Z#H3-z9G(ROs zzcBJj4DXFn>5okDWRjw5nP6;+6|=)HNqvgyi+rz=nxKisA(~5JNgg09-}|C5L<67% zQcjCxX@-=eX!0S8#{FE4G0jE^3S?tm6;9Bd_Nl*f*jraF`jSmP;H~>$-5h4sGFggw zl9e(shw#&$D<|<_llp>2 zKQWP*WS6oveUORdPUYdHRFd`);2?1ex~6Dg=d$s z%WR)%{&}qLMJMRwcNOzW-vOX1IRt@p9Ldj+Ge(;W3Zl=8W z*{)C+*<&&H9B^{7P&TyV3&y%p)rh?0#|>DRwnYR$m?B7kttQ|Lh}Ghk4kbQEAM7^I zMy22Xh9-W;^bnGupyAoc*<&DFiTkaHw4NS%J&h~dJVv>zjqBsYG0fY$r9Zhq10 zJ+{rT@?2|0x-0F1K8M@!@f0Y&L9YqyszVR#PEJ`z>IjqW@Xy3TKfuln!EFe1V1T-9 zwUjYBL-i9!N!EsQJv_%P!m8hZz=wLpl~?w*|B!^dO7$!yv0r`0CDz8 zw4m79V9>O4YLX*)b?e?J*)ulMJ%h(sCnxEv2E0G*Egg{W=fiO+*fcyjeN(r>MHqQU zR@Qs^j^j!?icS|th&V}|$euVcV;N@U?ppsR^s@=YGULyq5go+dPqr`Api?bKX>^gb zjbrD+_cHO-MLtYMph)Zen2JaUYfQ?fif6>wls(RnIsn54$zmJp^uR`&I;~P}fS8&+ z+%)N@F?Ft;s09r&; z$l4ksbFpO2I6=G~N~v1+OB?8YUB+d)VyVFX@FZ?k245*c8VtT4uaJ?si zv2f@DtA5ZwPFqMhXQ$HmCwW7q;)7UP&WhV zJV2=%DcS5_5m*X2o;38iqhrZ4S!)vVA`p>-VQpQ%?_f6DUQgEM&=XM|m5HwaaP{gd z;yvR-w906^MoHWxV(zx;7+_}_Dh$#FN%(;VR7cz)uf#VH#d$b0TovcyIA@G(ceI~` z7tRtp!lFTDkzsSsEG%?Od9SNB5J^7YkuvdV3~9ETv29Ma!|R$Qe#I4{AtmljeM#FE^ky=ZBRjMXqeC@?Iv95_LZKMy$sya8sYFLCLSX=DP z3hBgndZm8-^zCpZq$F!XC*y%&j~>}r(%6-j5p(C0as#A&G`u<6neAx*O?^}P*eAh^ z2ZSq`v}}lxHYwr(ISXH3zvjJ9YXQud!cy)kWBMqn5GHZ_2@?ztyih8zin&k)-`T`h z5h=4-h%e$yCT|73O}2wVnB*48_Mxam7p}lAs)S~e=*p?Z-6r^hM!<7T-2f3XC{Y$C z`LzeXtrgf^jfVn`U*fDdT#dqTa%SC^{xlnLrttdNz>M^_rAK(zq?`}-rEPtU4TDrZ zhdHWwBpr`U&L1Xe4-vO+OY~1Q*V6T=!L~G`^ydbD%(GuYA;Z5qacd`cD0&Ep53XK! zKUuv$U~WCbxKQxe-e6#UD}Ex1_Odsu$~Uqkbi}54j)Zb@UXmbI9@SfQe_JB4}+w)s-XB;Y@_3$^!`-a9+=Vf*tP@RgjA!)E%nSN3D^UAtb0`kHwEIy@CYn zzgF80!Lmf@)9E-9c|gQ8#F8Ftml+674YiiwxErrRu8pI0`W8+iT{NPltuGJSylXsE z;TV3Wd95=Bkok#JZVs#E<}$5suqv`&Gc*ouH$zF%@v)%p2=Fyvw0a1;hsj|LO3=Ei z6b?6wyeyn031@W;B4RFa+J*eGh)N;X>|uqn)N|oWoiwqBc{)R{T*6?Q1ALy5l9_zw zm`EgrXY>@7;z;12^puc40^jCFh?BNJle}TX!0FuyaeUgQ+A@aDb zAX(K7HXV9A=V0^JjI1_R&j21Wia6}9#E&`iC*UsTFh=C6j%S@clQ-}je?sQU7wBgj zuEl;{G(?jl+Q;s@CX%UK=+aC7Kr@D6$sLFW`i*;ix4^tTTF0@m02`{YXQ7IIF>%?( zBR?M=Qi^W`qFBQ*9k)wck3KGk{lXZ~I;Zmpvw}?>hgyv58B-u(yg>twUdqjnRdLtI z;f0ZPEP$${hbeNV}3sf!P-b z$8V9RYsb$)#kpIV5v!tO;@~_W;?fd{D8}b$*-YI_)q?cnTTZZzt5i^fMQ@{0F_Y5s zcBU?|Za*Nq)bU4{(5{bMCqcn*N^VqMxMAX`I<8YmVQ}Af2@y_(rg6K7`z@GR9^s|; zmhap?PcrjEzjo#VqZKO@!VxPj&0pHG)Dfbg&Bx{{t=d9?GX0Uz$7@V|Zz>&dYuAhf zp-ho&y1@w(1;5BzKwE1I$(058M|S9-YY~|3_sl*PApC1PIljP*mai%&B9(+ugMax* zb&@|#sM;gERy8`dp8Ec9t2-eY#>0V}9p+mwz7#pCzr;hG4~CG!wf-6h;Ee!OCM@K9 zR_yXdjLq(G6HeB2biq!fQ$*9K?sp1m%M*^zaGxWGv2F*! zjMpq3+j)H^&Mtc{@JgqGB5$~LJ-W3N!6Zp8IpzAlwlI=K>?rJCDNzt`G(AFzsahH&10I|t#(Uz z#rn3T*)i>*y881NIln3GC766ouHaGkD-Sf!9FeNKtwNl2+vWDN>}q(oMvLlio*6#@ z8~F+3u}Z^m)1Pq&hR>=l?Yw;dDU=e~M z2n6cWDHo#HHa2mZm&jT?8DJ}{!hvsKNZJ}uf^gzw(prEPP=?b`tZeSddiY5&(W)M2NW|r@^TVvWmmQlM^}PT zU+Zg%>G$D_$_K)>tiGBNB1*9*NnoR>lk-Wx9o>dj<>U+RFx3d`x-vY!eyS$zyLWgmNpa%u$4OEiM zV?$|Oa;`deEha=W`F&D#Nu+FC0zAqp>_O^Hpse-5YxSFNC2r>X z%)lkFM~_P9!jl#)%5t#`Tx{qf1$&Y_I{w)ZO$BCjVANAFcgEZuCo-p6a9L4O_Zzy6 z2AT}@k*b3AMGtA~|Um|0CL=L;YM;yDwiI0I7@8YF&W;i5oE}s~e zaLUs*##CJjp-mtRF6?dIlvuocg(Xnxz?jkcFi4s(n3S1xc&wOJnld;@w7b35g{tZ5xldmw|ZZX?o#KU=VcMJt1SO=1ZU$8NY99jSY;Q|{< zMu$T)iWEo>X;Xchr<@O;T1AVyn;tRyC99$QQwXa;knCphqe1UaqxlC12x%MZ7Kv>f zZwrw1wc&OzyGp+8mS_i?t}cbgj~}rRZ;Evq1HVsoBMVXaxID(63*oPb+~!#i{rQ{j`XArRwxml3vGL^ChS)IUpR3kscva5;-rcXDHV<4o3JCO{GYO|*;NrPpw# z`29+(u|>9~LYI+Bo9y9yi({vK&Sie5T0i_`p*L+rv2UFtRysoHOD|jA>o@TN*e);@B@-6BqWBFKQY{MZUo2{PcN{ znD|s7nHZHZ4R$YmBhc*Dhh#ZrM+gJr3IV1%5xYkibCK@~gbhhby58LNvvvt zZ9*QObwyAMH}X9%R%~9=WsB!zHd-G#^t=1Ir74~|adtrDbUF}&gO2lg!ndzN+O6Xo z#ly|hpX;{yKdBh(w!ixkVBkw#qdndx2(DzAI}INz>@Z%8IyD2S#_A_%TWs=4f*SV& zwP5Vf-2yTPy%U7qQ8+Z^=eYFkScEim6BVI;^M%3>aF|n zaGb|;tTy|TT{>OIyI|XYe$XZXqN;|_s0f=FIViQi^{X(_a+@6i#CjJ5^IEg4<&d?j zl`{%Ko)w1t#Isb`QD_~LD}qZ7mRH4TBJnAv6D{IT>nXYbdz2TX8Dy86`BtI>q4rp>*H1g7XP1H nkzBT~L|`>wU`k+c7(AF$H;B_${{LSF^Dn#k|8f7r{?qh-aE|rt literal 0 HcmV?d00001 From 615c8bef14b9015681145b75b56af5eda732cd9c Mon Sep 17 00:00:00 2001 From: Atharva Deosthale Date: Wed, 20 May 2026 16:27:02 +0530 Subject: [PATCH 2/7] blog(gemini-3-5-flash): fix contradictory speed comparison Gemini 3.5 Flash is the fastest frontier-class peer at 278 tok/s; gpt-oss-120b (high) at 246 is the next closest, not faster. --- src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc index 1b1c4ff4a8..b3180599b1 100644 --- a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc +++ b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc @@ -86,7 +86,7 @@ The largest gain is MCP Atlas: a 21.6 point increase over Gemini 3 Flash and 4.5 Gemini 3.5 Flash on Artificial Analysis: - **Intelligence Index: 55** (rank #7 of 147). Top three: GPT-5.5 (xhigh) at 60.2, GPT-5.5 (high) at 58.9, Claude Opus 4.7 (max) at 57.3. -- **Speed: 278 output tokens per second** (rank #2 of 147). Faster: gpt-oss-120b (high) at 246. Slower frontier-class models: Gemini 3.1 Pro Preview at 123, GPT-5.5 (xhigh) at 64, Claude Opus 4.7 (max) at 50. +- **Speed: 278 output tokens per second** (rank #2 of 147 in its AA price class). The closest frontier peer is gpt-oss-120b (high) at 246. Other frontier-class models are well behind: Gemini 3.1 Pro Preview at 123, GPT-5.5 (xhigh) at 64, Claude Opus 4.7 (max) at 50. - **Verbosity: 73M tokens** generated across the Intelligence Index suite, against a leaderboard average of 36M. Verbosity counts how many output tokens the model produced to complete the eval suite. Higher means the model spent more reasoning tokens per answer, which raises latency and bill size even when the per-token price is low. - **Cost to evaluate the Intelligence Index: $1,551.60.** That is 5.5x Gemini 3 Flash and 75% more than Gemini 3.1 Pro despite the lower per-token rate. This is the total dollar cost to run the full Intelligence Index once, combining per-token pricing and token volume. It serves as a proxy for what the model costs on heavy reasoning workloads in production. - **Hallucination rate: 61%** on the AA hallucination measure, 31 points lower than Gemini 3 Flash. The hallucination measure is the share of responses on a fabrication-probing prompt set where the model produces incorrect or invented content. Lower is better, and a 31-point drop versus the predecessor indicates a material gain in factual reliability. From 2528c533fc1196d2592a28928ccd8a2a3689b324 Mon Sep 17 00:00:00 2001 From: Atharva Deosthale Date: Wed, 20 May 2026 16:38:09 +0530 Subject: [PATCH 3/7] blog(gemini-3-5-flash): fix Skills delta direction and sign GPT 5.5 parenthetical reversed (90.0 to 94.8 = +4.8). Claude Opus 4.7 delta is -0.6, not +0.6: Skills reduced its freeform score from 94.8 to 94.2. --- src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc index b3180599b1..5be0524f5d 100644 --- a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc +++ b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc @@ -145,7 +145,7 @@ Three observations from the Arena data. **It is the fastest model in the top tier.** 20 minutes with Skills and 13 minutes without is faster than every model scoring above 90. The next-fastest top finisher is Gemini 3.1 Flash Lite at 19 minutes with Skills, but it scores 7.9 points lower. -**Skills materially improve the freeform score.** Without Skills, freeform scores 77.5%. With Skills, freeform reaches 91.9%, a 14.4-point increase. The same delta for GPT 5.5 is 4.8 points (94.8 to 90.0), and for Claude Opus 4.7 is 0.6 points (94.8 to 94.2). 3.5 Flash relies more on in-context documentation than its frontier peers, consistent with the January 2025 knowledge cutoff. +**Skills materially improve the freeform score.** Without Skills, freeform scores 77.5%. With Skills, freeform reaches 91.9%, a 14.4-point increase. The same delta for GPT 5.5 is +4.8 points (90.0 to 94.8), and for Claude Opus 4.7 is −0.6 points (94.8 to 94.2), where Skills slightly lowered the score because the model's built-in Appwrite knowledge is already near the ceiling. 3.5 Flash relies more on in-context documentation than its frontier peers, consistent with the January 2025 knowledge cutoff. **Category profile.** With Skills, 3.5 Flash scores 100% on Messaging, MCQ Foundation, MCQ Auth, MCQ Functions, and MCQ Sites, and 94.1% on Realtime. The weakest categories are TablesDB (89.1% with Skills, 77.8% without) and CLI (95.0% with Skills, 73.3% without). Both require the most current API surface, which the knowledge cutoff does not cover. From 78cd3a72b9038bcec9a0fab45e2212d84643b2a6 Mon Sep 17 00:00:00 2001 From: Atharva Deosthale Date: Wed, 20 May 2026 16:45:26 +0530 Subject: [PATCH 4/7] blog(gemini-3-5-flash): fix arithmetic and consistency errors - MCP Atlas margin over 3.1 Pro: 5.4 points, not 4.5 (83.6 - 78.2). - GPT-5.5 (xhigh) speed: 65 tok/s, matching the SOTA table and AA summary. - Realtime score qualified as MCQ Realtime (94.1%); overall is 94.0. - Reframed Flash Lite reference: it scores 88.3, below the 90-point top tier. --- .../blog/post/gemini-3-5-flash-deep-dive/+page.markdoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc index 5be0524f5d..e5dd34b66c 100644 --- a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc +++ b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc @@ -77,7 +77,7 @@ The model card lists head-to-head numbers against Gemini 3 Flash, Gemini 3.1 Pro Gemini 3.5 Flash leads Pro-class models on agentic tasks (MCP Atlas, Toolathlon, Finance Agent v2) and on multimodal reasoning (CharXiv, MMMU-Pro). It trails on academic reasoning (Humanity's Last Exam, ARC-AGI-2). For coding, results sit between 3.1 Pro and GPT-5.5 depending on the benchmark. -The largest gain is MCP Atlas: a 21.6 point increase over Gemini 3 Flash and 4.5 points over 3.1 Pro. On MCP tool-call workloads, 3.5 Flash is Google's strongest model in the Gemini 3 series. +The largest gain is MCP Atlas: a 21.6 point increase over Gemini 3 Flash and 5.4 points over 3.1 Pro. On MCP tool-call workloads, 3.5 Flash is Google's strongest model in the Gemini 3 series. # Artificial Analysis @@ -86,7 +86,7 @@ The largest gain is MCP Atlas: a 21.6 point increase over Gemini 3 Flash and 4.5 Gemini 3.5 Flash on Artificial Analysis: - **Intelligence Index: 55** (rank #7 of 147). Top three: GPT-5.5 (xhigh) at 60.2, GPT-5.5 (high) at 58.9, Claude Opus 4.7 (max) at 57.3. -- **Speed: 278 output tokens per second** (rank #2 of 147 in its AA price class). The closest frontier peer is gpt-oss-120b (high) at 246. Other frontier-class models are well behind: Gemini 3.1 Pro Preview at 123, GPT-5.5 (xhigh) at 64, Claude Opus 4.7 (max) at 50. +- **Speed: 278 output tokens per second** (rank #2 of 147 in its AA price class). The closest frontier peer is gpt-oss-120b (high) at 246. Other frontier-class models are well behind: Gemini 3.1 Pro Preview at 123, GPT-5.5 (xhigh) at 65, Claude Opus 4.7 (max) at 50. - **Verbosity: 73M tokens** generated across the Intelligence Index suite, against a leaderboard average of 36M. Verbosity counts how many output tokens the model produced to complete the eval suite. Higher means the model spent more reasoning tokens per answer, which raises latency and bill size even when the per-token price is low. - **Cost to evaluate the Intelligence Index: $1,551.60.** That is 5.5x Gemini 3 Flash and 75% more than Gemini 3.1 Pro despite the lower per-token rate. This is the total dollar cost to run the full Intelligence Index once, combining per-token pricing and token volume. It serves as a proxy for what the model costs on heavy reasoning workloads in production. - **Hallucination rate: 61%** on the AA hallucination measure, 31 points lower than Gemini 3 Flash. The hallucination measure is the share of responses on a fabrication-probing prompt set where the model produces incorrect or invented content. Lower is better, and a 31-point drop versus the predecessor indicates a material gain in factual reliability. @@ -143,11 +143,11 @@ Top finishers on the May 20, 2026 run: Three observations from the Arena data. -**It is the fastest model in the top tier.** 20 minutes with Skills and 13 minutes without is faster than every model scoring above 90. The next-fastest top finisher is Gemini 3.1 Flash Lite at 19 minutes with Skills, but it scores 7.9 points lower. +**It is the fastest model in the top tier.** 20 minutes with Skills and 13 minutes without is faster than every other model scoring above 90. The only model in the with-Skills table with a shorter run is Gemini 3.1 Flash Lite at 19 minutes, but it scores 88.3, below the 90-point top tier. **Skills materially improve the freeform score.** Without Skills, freeform scores 77.5%. With Skills, freeform reaches 91.9%, a 14.4-point increase. The same delta for GPT 5.5 is +4.8 points (90.0 to 94.8), and for Claude Opus 4.7 is −0.6 points (94.8 to 94.2), where Skills slightly lowered the score because the model's built-in Appwrite knowledge is already near the ceiling. 3.5 Flash relies more on in-context documentation than its frontier peers, consistent with the January 2025 knowledge cutoff. -**Category profile.** With Skills, 3.5 Flash scores 100% on Messaging, MCQ Foundation, MCQ Auth, MCQ Functions, and MCQ Sites, and 94.1% on Realtime. The weakest categories are TablesDB (89.1% with Skills, 77.8% without) and CLI (95.0% with Skills, 73.3% without). Both require the most current API surface, which the knowledge cutoff does not cover. +**Category profile.** With Skills, 3.5 Flash scores 100% on Messaging, MCQ Foundation, MCQ Auth, MCQ Functions, and MCQ Sites, and 94.1% on MCQ Realtime. The weakest categories are TablesDB (89.1% with Skills, 77.8% without) and CLI (95.0% with Skills, 73.3% without). Both require the most current API surface, which the knowledge cutoff does not cover. # Workloads where 3.5 Flash is the right choice From c237f18204975f135ddeb99f37f04e7c029f1eae Mon Sep 17 00:00:00 2001 From: Atharva Deosthale Date: Wed, 20 May 2026 16:51:05 +0530 Subject: [PATCH 5/7] blog(gemini-3-5-flash): unify precision for Intelligence Index and eval cost Intelligence Index bullet uses 55.3 to match the SOTA table. Eval cost bullet uses $1,552 to match the table and downstream prose. --- src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc index e5dd34b66c..286a6f3c76 100644 --- a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc +++ b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc @@ -85,10 +85,10 @@ The largest gain is MCP Atlas: a 21.6 point increase over Gemini 3 Flash and 5.4 Gemini 3.5 Flash on Artificial Analysis: -- **Intelligence Index: 55** (rank #7 of 147). Top three: GPT-5.5 (xhigh) at 60.2, GPT-5.5 (high) at 58.9, Claude Opus 4.7 (max) at 57.3. +- **Intelligence Index: 55.3** (rank #7 of 147). Top three: GPT-5.5 (xhigh) at 60.2, GPT-5.5 (high) at 58.9, Claude Opus 4.7 (max) at 57.3. - **Speed: 278 output tokens per second** (rank #2 of 147 in its AA price class). The closest frontier peer is gpt-oss-120b (high) at 246. Other frontier-class models are well behind: Gemini 3.1 Pro Preview at 123, GPT-5.5 (xhigh) at 65, Claude Opus 4.7 (max) at 50. - **Verbosity: 73M tokens** generated across the Intelligence Index suite, against a leaderboard average of 36M. Verbosity counts how many output tokens the model produced to complete the eval suite. Higher means the model spent more reasoning tokens per answer, which raises latency and bill size even when the per-token price is low. -- **Cost to evaluate the Intelligence Index: $1,551.60.** That is 5.5x Gemini 3 Flash and 75% more than Gemini 3.1 Pro despite the lower per-token rate. This is the total dollar cost to run the full Intelligence Index once, combining per-token pricing and token volume. It serves as a proxy for what the model costs on heavy reasoning workloads in production. +- **Cost to evaluate the Intelligence Index: $1,552.** That is 5.5x Gemini 3 Flash and 75% more than Gemini 3.1 Pro despite the lower per-token rate. This is the total dollar cost to run the full Intelligence Index once, combining per-token pricing and token volume. It serves as a proxy for what the model costs on heavy reasoning workloads in production. - **Hallucination rate: 61%** on the AA hallucination measure, 31 points lower than Gemini 3 Flash. The hallucination measure is the share of responses on a fabrication-probing prompt set where the model produces incorrect or invented content. Lower is better, and a 31-point drop versus the predecessor indicates a material gain in factual reliability. On the intelligence-versus-speed axis, Artificial Analysis ranks Gemini 3.5 Flash as the Pareto leader. No model in the same intelligence bracket runs near 278 tokens per second. From 0875798f5e6ae62dfc4144b9e9508cdf21ea29ba Mon Sep 17 00:00:00 2001 From: Atharva Deosthale Date: Wed, 20 May 2026 17:03:27 +0530 Subject: [PATCH 6/7] blog(gemini-3-5-flash): remove unlisted flag Frontmatter unlisted: true was copied from a style-reference post without authorization. Removing so the post appears on the blog index. --- src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc | 1 - 1 file changed, 1 deletion(-) diff --git a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc index 286a6f3c76..71e2d60dbe 100644 --- a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc +++ b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc @@ -7,7 +7,6 @@ cover: /images/blog/gemini-3-5-flash-deep-dive/cover.avif timeToRead: 11 author: atharva category: ai -unlisted: true faqs: - question: "What is Gemini 3.5 Flash?" answer: "Gemini 3.5 Flash is Google DeepMind's mid-2026 Flash-tier reasoning model, built on the Gemini 3 Flash foundation with explicit thinking levels that trade quality for cost and latency. It accepts text, images, audio, video, and PDF input, outputs up to 64K text tokens, and has a 1M token context window." From b6f195f8cadb482ee60698bdcf3944c0648365ca Mon Sep 17 00:00:00 2001 From: Atharva Deosthale Date: Wed, 20 May 2026 17:08:40 +0530 Subject: [PATCH 7/7] blog(gemini-3-5-flash): correct vs 3.1 Pro pricing comparison 3.5 Flash is 25% cheaper per token than 3.1 Pro on both input and output ($1.50/$2.00 and $9.00/$12.00 = 0.75), not 40%. --- src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc index 71e2d60dbe..1ffb49595b 100644 --- a/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc +++ b/src/routes/blog/post/gemini-3-5-flash-deep-dive/+page.markdoc @@ -50,7 +50,7 @@ API pricing per million tokens: How it compares: - **vs Gemini 3 Flash** ($0.50 / $3.00): 3x more on both input and output. -- **vs Gemini 3.1 Pro** (blended): approximately 40% cheaper. +- **vs Gemini 3.1 Pro** ($2.00 / $12.00): 25% cheaper per token on both input and output. - **Within the Flash tier:** the most expensive Flash-tier model Google has released. # Google's published benchmark table