Conversation
- Updated FutureEvalLeaderboardTable to use a utility function for determining aggregate status. - Simplified FutureEvalMethodologySections by utilizing pre-computed SOTA crossing dates. - Enhanced FutureEvalModelBenchmark to directly access mapped aggregates and bots. - Refactored FutureEvalBenchmarkForecastingPerformance to streamline data mapping and SOTA date calculations. - Consolidated leaderboard utility functions for better organization and clarity. - Removed unused shared utility file to clean up the codebase.
📝 WalkthroughWalkthroughProvider-centric refactor: leaderboard-derived aggregates, bots, upcoming models, and SOTA crossing dates are computed in the leaderboard provider and exposed to components; components now consume these mapped values instead of deriving them locally. Changes
Sequence Diagram(s)sequenceDiagram
participant Provider as LeaderboardProvider
participant Mapping as Mapping & SOTA Utils
participant Components as FutureEval Components
Provider->>Mapping: call mapAggregates(entries), mapBots(entries)
Mapping-->>Provider: mappedAggregates, mappedBots, sotaCrossingDates
Provider->>Provider: memoize aggregates, bots, upcomingModels, sotaCrossingDates
Components->>Provider: useFutureEvalLeaderboard() ⇒ {aggregates, bots, mappedAggregates, mappedBots, upcomingModels, sotaCrossingDates}
Provider-->>Components: supply derived data
Components->>Components: render UI using provider-provided mapped data
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
front_end/src/app/(futureeval)/futureeval/components/benchmark/performance-over-time/mapping.ts (1)
22-41:⚠️ Potential issue | 🟡 MinorPotential
Invalid Datefrom unvalidatedreleasedAt.Line 29 constructs
new Date(meta.releasedAt)without validating the result. IfreleasedAtis a malformed string, you'll get anInvalid Dateobject wheregetTime()returnsNaN, causing the sort on Line 36 to produce non-deterministic ordering and the>=filter on Line 39 to silently drop or keep entries incorrectly.Consider adding a validity check after construction:
Proposed fix
.map((e) => { const meta = getModelDetailsFromScoreEntry(e); if (!meta?.releasedAt) return null; + const releaseDate = new Date(meta.releasedAt); + if (isNaN(releaseDate.getTime())) return null; return { name: meta.label, - releaseDate: new Date(meta.releasedAt), + releaseDate, score: e.score, family: meta.family, familyLabel: meta.familyLabel, }; })
🤖 Fix all issues with AI agents
In `@front_end/src/app/`(futureeval)/futureeval/components/leaderboard/utils.ts:
- Around line 36-74: The dash-cleanup regex in getBaseModelName is too greedy
and removes dashes even when not surrounded by spaces (mangling names like
"GPT-4o"); update the final cleanup step in getBaseModelName to only collapse
dashes that have a space on at least one side by replacing .replace(/\s*-+\s*/g,
" ") with a regex that matches dashes preceded OR followed by whitespace (for
example .replace(/\s+-+|-+\s+/g, " ")); modify this replacement in the
getBaseModelName function (where patternsToRemove is applied) so legitimate
hyphenated model tokens like "GPT-4o" or "Claude-3.5-Sonnet" are preserved.
🧹 Nitpick comments (3)
front_end/src/app/(futureeval)/futureeval/components/leaderboard/futureeval-leaderboard-provider.tsx (1)
22-25: Consider usingReturnTypeinstead of duplicating the return type.
SotaCrossingDatesmirrors the return type ofcomputeSotaCrossingDates. You could derive it to keep a single source of truth:type SotaCrossingDates = ReturnType<typeof computeSotaCrossingDates>;front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts (2)
40-58: Overlapping and redundant regex patterns.A couple of observations:
- Lines 42 and 43 overlap:
/\d+k\b/is a subset of/\d+[km]b?\b/, so the first pattern is redundant.- Lines 56–57 (combined patterns like
high-16k) are also redundant — the individualhighand\d+kpatterns on Lines 46 and 42–43 already remove each part independently across the loop iterations.Not a bug (patterns run in sequence so earlier removals make later ones no-ops), but trimming the duplicates would simplify maintenance.
146-148:isAggregatenow acceptsPartial<LeaderboardEntry>but callers pass full entries.The signature widening is fine for flexibility, but note that
aggregateKind(Line 150) andentryIconPair(Line 160) still require a fullLeaderboardEntry. If a caller passes aPartialtoisAggregateand then feeds the same object toaggregateKind, TypeScript won't catch the mismatch unless the caller also widensaggregateKind. This is currently safe sincegetDisplayableAggregatesandgetDisplayableBots(Lines 178–197) acceptLeaderboardEntry[], but worth keeping in mind.
front_end/src/app/(futureeval)/futureeval/components/leaderboard/utils.ts
Show resolved
Hide resolved
🧹 Preview Environment Cleaned UpThe preview environment for this PR has been destroyed.
Cleanup triggered by PR close at 2026-02-06T14:49:10Z |
Summary by CodeRabbit
Updates
Refactor