Futureeval leaderboard entries fix by cemreinanc · Pull Request #4263 · Metaculus/metaculus

cemreinanc · 2026-02-06T14:17:25Z

Summary by CodeRabbit

Updates
- Renamed legend label to "Frontier Model Trend" and updated methodology wording accordingly.
- Model leaderboard banner now shows upcoming models from the centralized data source.
Refactor
- Leaderboard and forecasting views now consume centralized, pre-mapped aggregates, bots, and SOTA dates for consistent UI data.
- Utilities consolidated and display/filtering logic simplified for more predictable leaderboard behavior.

- Updated FutureEvalLeaderboardTable to use a utility function for determining aggregate status. - Simplified FutureEvalMethodologySections by utilizing pre-computed SOTA crossing dates. - Enhanced FutureEvalModelBenchmark to directly access mapped aggregates and bots. - Refactored FutureEvalBenchmarkForecastingPerformance to streamline data mapping and SOTA date calculations. - Consolidated leaderboard utility functions for better organization and clarity. - Removed unused shared utility file to clean up the codebase.

coderabbitai · 2026-02-06T14:17:45Z

📝 Walkthrough

Walkthrough

Provider-centric refactor: leaderboard-derived aggregates, bots, upcoming models, and SOTA crossing dates are computed in the leaderboard provider and exposed to components; components now consume these mapped values instead of deriving them locally.

Changes

Cohort / File(s)	Summary
Provider `front_end/src/app/(futureeval)/futureeval/components/leaderboard/futureeval-leaderboard-provider.tsx`	Provider now memoizes and exposes derived values: entries, aggregates, bots, mappedAggregates, mappedBots, upcomingModels, and sotaCrossingDates (replaces providing only raw leaderboard).
Mapping & SOTA logic `.../performance-over-time/mapping.ts`, `.../performance-over-time/sota-trend.ts`	Replaced leaderboard-based getters (`getAggregates`/`getBots`) with `mapAggregates`/`mapBots` that accept `LeaderboardEntry[]`; added `computeSotaCrossingDates` and `formatCrossingDate`; removed leaderboard-centric SOTA functions.
Leaderboard utilities `front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts`, `front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.shared.ts`	`utils.shared.ts` deleted; `utils.ts` now contains reimplemented helpers (MIN_RESOLVED_FORECASTS, getResolvedCount, shouldDisplayEntry, getBaseModelName, entryLabel, getUpcomingModels, isAggregate, getDisplayableAggregates, getDisplayableBots).
Components — consume provider data `front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-model-benchmark.tsx`, `.../futureeval-benchmark-forecasting-performance.tsx`, `.../futureeval-methodology-sections.tsx`, `front_end/src/app/(futureeval)/futureeval/components/futureeval-leaderboard-table.tsx`	Components updated to use aggregates, bots, mappedAggregates/mappedBots, upcomingModels, and sotaCrossingDates from hook/provider instead of computing them locally; adjusted `isAggregate` usage and dependencies.
Minor UI text `front_end/src/app/(futureeval)/futureeval/components/benchmark/performance-over-time/benchmark-chart-legend.tsx`	Label changed from "SOTA Trend" to "Frontier Model Trend" only.
Page import update `front_end/src/app/(futureeval)/futureeval/leaderboard/page.tsx`	Adjusted import to read `getUpcomingModels` from consolidated `leaderboard/utils`.

Sequence Diagram(s)

sequenceDiagram
  participant Provider as LeaderboardProvider
  participant Mapping as Mapping & SOTA Utils
  participant Components as FutureEval Components

  Provider->>Mapping: call mapAggregates(entries), mapBots(entries)
  Mapping-->>Provider: mappedAggregates, mappedBots, sotaCrossingDates
  Provider->>Provider: memoize aggregates, bots, upcomingModels, sotaCrossingDates
  Components->>Provider: useFutureEvalLeaderboard() ⇒ {aggregates, bots, mappedAggregates, mappedBots, upcomingModels, sotaCrossingDates}
  Provider-->>Components: supply derived data
  Components->>Components: render UI using provider-provided mapped data

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

FutureEval Branding & Page Updates #4001: Overlaps refactoring of FutureEval leaderboard provider, mapping utilities, and component wiring.
global bot leaderboard minimum contributions 190 #4085: Changes MIN_RESOLVED_FORECASTS to 190 — directly related to display/filtering logic now consolidated in utils.ts.

Suggested reviewers

elisescu
ncarazon
lsabor

Poem

🐇 "I hopped through code at break of dawn,
Mapped the bots and trimmed the brawn,
Provider holds the tidy stack,
Components fetch — no looking back,
Hooray! The leaderboard's reborn."

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Futureeval leaderboard entries fix' is vague and generic, using non-descriptive language like 'fix' without conveying what was actually changed or improved.	Consider using a more specific title that describes the actual change, such as 'Refactor futureeval leaderboard to derive entries from aggregates and bots' or 'Consolidate leaderboard data computation in provider hook'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch futureeval-leaderboard-entries-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

front_end/src/app/(futureeval)/futureeval/components/benchmark/performance-over-time/mapping.ts (1)
22-41: ⚠️ Potential issue | 🟡 Minor

Potential Invalid Date from unvalidated releasedAt.

Line 29 constructs new Date(meta.releasedAt) without validating the result. If releasedAt is a malformed string, you'll get an Invalid Date object where getTime() returns NaN, causing the sort on Line 36 to produce non-deterministic ordering and the >= filter on Line 39 to silently drop or keep entries incorrectly.

Consider adding a validity check after construction:
Proposed fix
     .map((e) => {
       const meta = getModelDetailsFromScoreEntry(e);
       if (!meta?.releasedAt) return null;
+      const releaseDate = new Date(meta.releasedAt);
+      if (isNaN(releaseDate.getTime())) return null;
       return {
         name: meta.label,
-        releaseDate: new Date(meta.releasedAt),
+        releaseDate,
         score: e.score,
         family: meta.family,
         familyLabel: meta.familyLabel,
       };
     })

🤖 Fix all issues with AI agents

In `@front_end/src/app/`(futureeval)/futureeval/components/leaderboard/utils.ts:
- Around line 36-74: The dash-cleanup regex in getBaseModelName is too greedy
and removes dashes even when not surrounded by spaces (mangling names like
"GPT-4o"); update the final cleanup step in getBaseModelName to only collapse
dashes that have a space on at least one side by replacing .replace(/\s*-+\s*/g,
" ") with a regex that matches dashes preceded OR followed by whitespace (for
example .replace(/\s+-+|-+\s+/g, " ")); modify this replacement in the
getBaseModelName function (where patternsToRemove is applied) so legitimate
hyphenated model tokens like "GPT-4o" or "Claude-3.5-Sonnet" are preserved.

🧹 Nitpick comments (3)

front_end/src/app/(futureeval)/futureeval/components/leaderboard/futureeval-leaderboard-provider.tsx (1)
22-25: Consider using ReturnType instead of duplicating the return type.

SotaCrossingDates mirrors the return type of computeSotaCrossingDates. You could derive it to keep a single source of truth:
type SotaCrossingDates = ReturnType<typeof computeSotaCrossingDates>;
front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts (2)

40-58: Overlapping and redundant regex patterns.

A couple of observations:

Lines 42 and 43 overlap: /\d+k\b/ is a subset of /\d+[km]b?\b/, so the first pattern is redundant.

Lines 56–57 (combined patterns like high-16k) are also redundant — the individual high and \d+k patterns on Lines 46 and 42–43 already remove each part independently across the loop iterations.

Not a bug (patterns run in sequence so earlier removals make later ones no-ops), but trimming the duplicates would simplify maintenance.

146-148: isAggregate now accepts Partial<LeaderboardEntry> but callers pass full entries.

The signature widening is fine for flexibility, but note that aggregateKind (Line 150) and entryIconPair (Line 160) still require a full LeaderboardEntry. If a caller passes a Partial to isAggregate and then feeds the same object to aggregateKind, TypeScript won't catch the mismatch unless the caller also widens aggregateKind. This is currently safe since getDisplayableAggregates and getDisplayableBots (Lines 178–197) accept LeaderboardEntry[], but worth keeping in mind.

front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts

github-actions · 2026-02-06T14:46:49Z

🧹 Preview Environment Cleaned Up

The preview environment for this PR has been destroyed.

Resource	Status
🌐 Preview App	✅ Deleted
🗄️ PostgreSQL Branch	✅ Deleted
⚡ Redis Database	✅ Deleted
🔧 GitHub Deployments	✅ Removed
📦 Docker Image	⚠️ Retained (auto-cleanup via GHCR policies)

Cleanup triggered by PR close at 2026-02-06T14:49:10Z

cemreinanc added 2 commits February 6, 2026 15:00

Changed references from "SOTA models" to "Frontier Models"

5dc9676

cemreinanc temporarily deployed to testing_env February 6, 2026 14:18 — with GitHub Actions Inactive

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts Show resolved Hide resolved

Fix release date handling in mapBots function

fe9cf71

cemreinanc temporarily deployed to testing_env February 6, 2026 14:28 — with GitHub Actions Inactive

cemreinanc merged commit 49659a3 into main Feb 6, 2026
19 of 20 checks passed

cemreinanc deleted the futureeval-leaderboard-entries-fix branch February 6, 2026 14:49

coderabbitai bot mentioned this pull request Feb 7, 2026

FutureEval v2.2 #4272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Futureeval leaderboard entries fix#4263

Futureeval leaderboard entries fix#4263
cemreinanc merged 3 commits intomainfrom
futureeval-leaderboard-entries-fix

cemreinanc commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cemreinanc commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧹 Preview Environment Cleaned Up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cemreinanc commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

github-actions bot commented Feb 6, 2026 •

edited

Loading