Skip to content

Futureeval leaderboard entries fix#4263

Merged
cemreinanc merged 3 commits intomainfrom
futureeval-leaderboard-entries-fix
Feb 6, 2026
Merged

Futureeval leaderboard entries fix#4263
cemreinanc merged 3 commits intomainfrom
futureeval-leaderboard-entries-fix

Conversation

@cemreinanc
Copy link
Contributor

@cemreinanc cemreinanc commented Feb 6, 2026

Summary by CodeRabbit

  • Updates

    • Renamed legend label to "Frontier Model Trend" and updated methodology wording accordingly.
    • Model leaderboard banner now shows upcoming models from the centralized data source.
  • Refactor

    • Leaderboard and forecasting views now consume centralized, pre-mapped aggregates, bots, and SOTA dates for consistent UI data.
    • Utilities consolidated and display/filtering logic simplified for more predictable leaderboard behavior.

- Updated FutureEvalLeaderboardTable to use a utility function for determining aggregate status.
- Simplified FutureEvalMethodologySections by utilizing pre-computed SOTA crossing dates.
- Enhanced FutureEvalModelBenchmark to directly access mapped aggregates and bots.
- Refactored FutureEvalBenchmarkForecastingPerformance to streamline data mapping and SOTA date calculations.
- Consolidated leaderboard utility functions for better organization and clarity.
- Removed unused shared utility file to clean up the codebase.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

📝 Walkthrough

Walkthrough

Provider-centric refactor: leaderboard-derived aggregates, bots, upcoming models, and SOTA crossing dates are computed in the leaderboard provider and exposed to components; components now consume these mapped values instead of deriving them locally.

Changes

Cohort / File(s) Summary
Provider
front_end/src/app/(futureeval)/futureeval/components/leaderboard/futureeval-leaderboard-provider.tsx
Provider now memoizes and exposes derived values: entries, aggregates, bots, mappedAggregates, mappedBots, upcomingModels, and sotaCrossingDates (replaces providing only raw leaderboard).
Mapping & SOTA logic
.../performance-over-time/mapping.ts, .../performance-over-time/sota-trend.ts
Replaced leaderboard-based getters (getAggregates/getBots) with mapAggregates/mapBots that accept LeaderboardEntry[]; added computeSotaCrossingDates and formatCrossingDate; removed leaderboard-centric SOTA functions.
Leaderboard utilities
front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts, front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.shared.ts
utils.shared.ts deleted; utils.ts now contains reimplemented helpers (MIN_RESOLVED_FORECASTS, getResolvedCount, shouldDisplayEntry, getBaseModelName, entryLabel, getUpcomingModels, isAggregate, getDisplayableAggregates, getDisplayableBots).
Components — consume provider data
front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-model-benchmark.tsx, .../futureeval-benchmark-forecasting-performance.tsx, .../futureeval-methodology-sections.tsx, front_end/src/app/(futureeval)/futureeval/components/futureeval-leaderboard-table.tsx
Components updated to use aggregates, bots, mappedAggregates/mappedBots, upcomingModels, and sotaCrossingDates from hook/provider instead of computing them locally; adjusted isAggregate usage and dependencies.
Minor UI text
front_end/src/app/(futureeval)/futureeval/components/benchmark/performance-over-time/benchmark-chart-legend.tsx
Label changed from "SOTA Trend" to "Frontier Model Trend" only.
Page import update
front_end/src/app/(futureeval)/futureeval/leaderboard/page.tsx
Adjusted import to read getUpcomingModels from consolidated leaderboard/utils.

Sequence Diagram(s)

sequenceDiagram
  participant Provider as LeaderboardProvider
  participant Mapping as Mapping & SOTA Utils
  participant Components as FutureEval Components

  Provider->>Mapping: call mapAggregates(entries), mapBots(entries)
  Mapping-->>Provider: mappedAggregates, mappedBots, sotaCrossingDates
  Provider->>Provider: memoize aggregates, bots, upcomingModels, sotaCrossingDates
  Components->>Provider: useFutureEvalLeaderboard() ⇒ {aggregates, bots, mappedAggregates, mappedBots, upcomingModels, sotaCrossingDates}
  Provider-->>Components: supply derived data
  Components->>Components: render UI using provider-provided mapped data
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • elisescu
  • ncarazon
  • lsabor

Poem

🐇 "I hopped through code at break of dawn,
Mapped the bots and trimmed the brawn,
Provider holds the tidy stack,
Components fetch — no looking back,
Hooray! The leaderboard's reborn."

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Futureeval leaderboard entries fix' is vague and generic, using non-descriptive language like 'fix' without conveying what was actually changed or improved. Consider using a more specific title that describes the actual change, such as 'Refactor futureeval leaderboard to derive entries from aggregates and bots' or 'Consolidate leaderboard data computation in provider hook'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch futureeval-leaderboard-entries-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
front_end/src/app/(futureeval)/futureeval/components/benchmark/performance-over-time/mapping.ts (1)

22-41: ⚠️ Potential issue | 🟡 Minor

Potential Invalid Date from unvalidated releasedAt.

Line 29 constructs new Date(meta.releasedAt) without validating the result. If releasedAt is a malformed string, you'll get an Invalid Date object where getTime() returns NaN, causing the sort on Line 36 to produce non-deterministic ordering and the >= filter on Line 39 to silently drop or keep entries incorrectly.

Consider adding a validity check after construction:

Proposed fix
     .map((e) => {
       const meta = getModelDetailsFromScoreEntry(e);
       if (!meta?.releasedAt) return null;
+      const releaseDate = new Date(meta.releasedAt);
+      if (isNaN(releaseDate.getTime())) return null;
       return {
         name: meta.label,
-        releaseDate: new Date(meta.releasedAt),
+        releaseDate,
         score: e.score,
         family: meta.family,
         familyLabel: meta.familyLabel,
       };
     })
🤖 Fix all issues with AI agents
In `@front_end/src/app/`(futureeval)/futureeval/components/leaderboard/utils.ts:
- Around line 36-74: The dash-cleanup regex in getBaseModelName is too greedy
and removes dashes even when not surrounded by spaces (mangling names like
"GPT-4o"); update the final cleanup step in getBaseModelName to only collapse
dashes that have a space on at least one side by replacing .replace(/\s*-+\s*/g,
" ") with a regex that matches dashes preceded OR followed by whitespace (for
example .replace(/\s+-+|-+\s+/g, " ")); modify this replacement in the
getBaseModelName function (where patternsToRemove is applied) so legitimate
hyphenated model tokens like "GPT-4o" or "Claude-3.5-Sonnet" are preserved.
🧹 Nitpick comments (3)
front_end/src/app/(futureeval)/futureeval/components/leaderboard/futureeval-leaderboard-provider.tsx (1)

22-25: Consider using ReturnType instead of duplicating the return type.

SotaCrossingDates mirrors the return type of computeSotaCrossingDates. You could derive it to keep a single source of truth:

type SotaCrossingDates = ReturnType<typeof computeSotaCrossingDates>;
front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts (2)

40-58: Overlapping and redundant regex patterns.

A couple of observations:

  1. Lines 42 and 43 overlap: /\d+k\b/ is a subset of /\d+[km]b?\b/, so the first pattern is redundant.
  2. Lines 56–57 (combined patterns like high-16k) are also redundant — the individual high and \d+k patterns on Lines 46 and 42–43 already remove each part independently across the loop iterations.

Not a bug (patterns run in sequence so earlier removals make later ones no-ops), but trimming the duplicates would simplify maintenance.


146-148: isAggregate now accepts Partial<LeaderboardEntry> but callers pass full entries.

The signature widening is fine for flexibility, but note that aggregateKind (Line 150) and entryIconPair (Line 160) still require a full LeaderboardEntry. If a caller passes a Partial to isAggregate and then feeds the same object to aggregateKind, TypeScript won't catch the mismatch unless the caller also widens aggregateKind. This is currently safe since getDisplayableAggregates and getDisplayableBots (Lines 178–197) accept LeaderboardEntry[], but worth keeping in mind.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

🧹 Preview Environment Cleaned Up

The preview environment for this PR has been destroyed.

Resource Status
🌐 Preview App ✅ Deleted
🗄️ PostgreSQL Branch ✅ Deleted
⚡ Redis Database ✅ Deleted
🔧 GitHub Deployments ✅ Removed
📦 Docker Image ⚠️ Retained (auto-cleanup via GHCR policies)

Cleanup triggered by PR close at 2026-02-06T14:49:10Z

@cemreinanc cemreinanc merged commit 49659a3 into main Feb 6, 2026
19 of 20 checks passed
@cemreinanc cemreinanc deleted the futureeval-leaderboard-entries-fix branch February 6, 2026 14:49
@coderabbitai coderabbitai bot mentioned this pull request Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant