Skip to content

Update docs with latest benchmark results and blog post fixes#78

Merged
Wenyueh merged 5 commits intomainfrom
website-data-update
Apr 3, 2026
Merged

Update docs with latest benchmark results and blog post fixes#78
Wenyueh merged 5 commits intomainfrom
website-data-update

Conversation

@Sripadkarne
Copy link
Copy Markdown
Collaborator

  • benchmark-results/index.md: All tables updated with corrected numbers, added GPQA thinking effort ablation section
  • blog technical-deep-dive: Updated budget alternatives, algorithm comparison, selector summary, Opus+Opus fix, cumulative cost wording
  • mkdocs.yml: Minor config updates

- benchmark-results/index.md: All tables updated with corrected numbers,
  added GPQA thinking effort ablation section
- blog technical-deep-dive: Updated budget alternatives, algorithm comparison,
  selector summary, Opus+Opus fix, cumulative cost wording
- mkdocs.yml: Minor config updates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Sripadkarne Sripadkarne requested a review from Wenyueh April 2, 2026 20:26
Sripadkarne and others added 4 commits April 2, 2026 13:31
Arm Elimination is assumption-free (uses only observed data), while
Hill Climbing requires a hand-crafted model ranking upfront. Also
split LM Proposal into its own paragraph.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added Avg Latency (s) column to all 6 tables (Top 15, Bottom 15,
Full 81) for both 2-tuple benchmarks using server-side latency
from cache.db results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Kimi + Haiku 4.5 is rank 66, not 67. Bottom 15 now correctly
starts at rank 67 and matches the Full 81 table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Wenyueh Wenyueh merged commit 6b43355 into main Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants