Feat(evaluation): Improve UI usability issues#80
Conversation
📝 WalkthroughWalkthroughThis PR refactors the evaluations UI by extracting implementations from the main page component into reusable, composable components: Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant EvaluationsTab as EvaluationsTab Component
participant ConfigSelector as ConfigSelector
participant API as Backend API
participant DatasetsTab as DatasetsTab Component
User->>EvaluationsTab: Select dataset & config, enter experiment name
EvaluationsTab->>ConfigSelector: Render config selector
ConfigSelector-->>EvaluationsTab: Return selected config
User->>EvaluationsTab: Click "Run Evaluation"
EvaluationsTab->>API: POST /api/evaluations (with config & dataset)
API-->>EvaluationsTab: Return job ID & status
EvaluationsTab->>API: GET /api/evaluations (poll jobs list)
API-->>EvaluationsTab: Return evaluation jobs with statuses
EvaluationsTab->>API: GET /api/assistant/:assistantId (per-job config)
API-->>EvaluationsTab: Return assistant config
EvaluationsTab->>EvaluationsTab: Render EvalRunCard with job & config
User->>EvaluationsTab: Filter by status / Click refresh
EvaluationsTab->>API: GET /api/evaluations (with filters)
API-->>EvaluationsTab: Return filtered job list
EvaluationsTab->>EvaluationsTab: Re-render cards with updated jobs
User->>DatasetsTab: Upload CSV via drag-drop
DatasetsTab->>API: POST /api/evaluations/datasets (create dataset)
API-->>DatasetsTab: Return dataset ID
User->>DatasetsTab: Click "View" on dataset
DatasetsTab->>API: GET /api/evaluations/datasets/:id (fetch content)
API-->>DatasetsTab: Return CSV text & signed URL
DatasetsTab->>DatasetsTab: Parse CSV, render table
User->>DatasetsTab: Click "Download"
DatasetsTab->>User: Generate & download Blob
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (3)
app/evaluations/page.tsx (1)
215-237: Inconsistent return type: early returns yieldundefinedinstead offalse.The function signature implies
Promise<boolean>(per theEvaluationsTabPropsinterface), but validation-error paths returnundefinedvia barereturn;. While this works (sinceundefined !== true), it's cleaner to returnfalseexplicitly for consistency and type safety.♻️ Suggested fix for consistent return type
const handleRunEvaluation = async () => { if (!selectedKeyId) { toast.error('Please select an API key first'); - return; + return false; } if (!selectedDatasetId) { toast.error('Please select a dataset first'); - return; + return false; } if (!experimentName.trim()) { toast.error('Please enter an evaluation name'); - return; + return false; } if (!selectedConfigId || !selectedConfigVersion) { toast.error('Please select a configuration before running evaluation'); - return; + return false; } const selectedKey = apiKeys.find(k => k.id === selectedKeyId); if (!selectedKey) { toast.error('Selected API key not found'); - return; + return false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/evaluations/page.tsx` around lines 215 - 237, The validation branches in handleRunEvaluation currently use bare returns which yield undefined; update each early-exit validation (checks for selectedKeyId, selectedDatasetId, experimentName.trim(), selectedConfigId/selectedConfigVersion, and missing selectedKey lookup) to return false explicitly so the function consistently resolves Promise<boolean>; ensure the successful path still returns true (or a boolean) at the end of handleRunEvaluation to match the EvaluationsTabProps signature.app/components/evaluations/EvaluationsTab.tsx (2)
121-123: Missing 10-second polling for job status updates.The current implementation only fetches evaluations on mount and manual refresh. Jobs with "processing" or "pending" status won't auto-update, requiring users to manually refresh.
Based on learnings: "Implement polling every 10 seconds for job status updates in evaluation workflows instead of using WebSockets or server-sent events."
♻️ Suggested polling implementation
useEffect(() => { if (selectedKeyId) fetchEvaluations(); }, [selectedKeyId, fetchEvaluations]); + + // Poll for job status updates every 10 seconds + useEffect(() => { + if (!selectedKeyId) return; + + const hasInProgressJobs = evalJobs.some( + job => job.status.toLowerCase() === 'processing' || job.status.toLowerCase() === 'pending' + ); + + if (!hasInProgressJobs) return; + + const intervalId = setInterval(fetchEvaluations, 10000); + return () => clearInterval(intervalId); + }, [selectedKeyId, evalJobs, fetchEvaluations]);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/components/evaluations/EvaluationsTab.tsx` around lines 121 - 123, The current useEffect that calls fetchEvaluations should be extended to implement a 10-second polling loop while there are any evaluations with status "processing" or "pending": when selectedKeyId is present, call fetchEvaluations immediately, then start a setInterval that calls fetchEvaluations every 10_000 ms; clear the interval on cleanup or when selectedKeyId becomes falsy; additionally stop/clear the interval if the latest evaluations (from the component state, e.g., evaluations) contain no "processing" or "pending" jobs to avoid unnecessary polling. Ensure the effect depends on selectedKeyId, fetchEvaluations and evaluations (or a derived boolean like hasPendingJobs) so polling starts/stops correctly and always clearInterval in the cleanup function.
113-119: Consider batching or memoizing assistant config fetches.The effect iterates all
evalJobson every change, triggering individual fetch calls for each uniqueassistant_id. For large job lists with many distinct assistants, this could result in a burst of network requests. Consider:
- Extracting unique missing IDs first
- Adding a small delay between requests
- Or batching if the API supports it
This is acceptable for typical use cases but may scale poorly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/components/evaluations/EvaluationsTab.tsx` around lines 113 - 119, The effect over evalJobs currently loops and calls fetchAssistantConfig for each job, causing many concurrent requests; change it to first derive a deduplicated list of missing assistant IDs (useMemo over evalJobs and assistantConfigs to compute missingIds), then either call a batching helper (e.g., implement fetchAssistantConfigsBatch(missingIds) and call that from the useEffect) or issue requests sequentially with a small delay (e.g., iterate missingIds and await a short sleep between fetchAssistantConfig(id) calls or use Promise.all with rate-limiting). Update the useEffect to depend on the memoized missingIds (not raw assistantConfigs) and add a new helper function (fetchAssistantConfigsBatch) or a small rate-limiter to prevent bursts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/components/ConfigSelector.tsx`:
- Around line 396-400: The chevron icons rendered in the conditional using
promptExpanded (ChevronUpIcon and ChevronDownIcon) are missing sizing classes;
update those components to include the same utility size classes used elsewhere
(e.g., add className="w-3.5 h-3.5" or "w-4 h-4" to ChevronUpIcon and
ChevronDownIcon) so they render consistently with other icons in this component
(match the sizes used at other usages around the file).
In `@app/components/evaluations/DatasetsTab.tsx`:
- Around line 200-227: The label elements in DatasetsTab are not associated with
their controls; update each label to use htmlFor and give the corresponding
input/select a matching id (e.g., id="dataset-name" for the input using
datasetName and id="dataset-description" for the one using datasetDescription),
and do the same for the other unlabeled fields referenced around lines 230-285
(matching ids to the state setters like setDatasetName/setDatasetDescription and
any select handlers) so assistive tech can correctly associate labels with their
controls.
- Around line 455-531: The modal opened via viewModalData lacks accessible
dialog behavior: add role="dialog", aria-modal="true", and an aria-labelledby
referencing the modal title element (e.g., the h3 showing viewModalData.name);
implement Escape-key handling to call setViewModalData(null) and add focus
management (on open save document.activeElement, move focus into the modal
container—give it tabIndex={-1}—and trap/tab-cycle focus while open, then
restore focus when closed). Put the same changes for the other overlay (the one
referenced in the review). Ensure backdrop click still closes but keyboard and
screen-reader users get proper dialog semantics and focus restoration.
- Around line 118-140: The CSV parser corrupts valid CSV by pre-splitting
csvText on '\n' and trimming every cell; update the parsing flow so you do not
split into lines before handling quoted newlines and do not trim cell contents.
Use the existing parseRow logic approach but apply it to the full csvText to
produce rows by scanning characters and detecting row boundaries only when
encountering an unquoted newline (so quoted multiline fields are preserved),
keep cell values exactly as parsed (no .trim()), and ensure headers and rows
(variables headers and rows) are derived from that parser; for downloads, use
the original csvText or serialize from the preserved parsed values without
trimming to avoid altering whitespace/newlines.
- Around line 279-285: The file picker is currently hidden (input ref
fileInputRef, onChange onFileSelect) while the visible upload affordance is a
non-focusable div, preventing keyboard users from opening the file dialog; fix
by giving the real input an id and using a semantic, focusable control
(preferably a <label htmlFor="..."> wrapping the visible affordance or a
<button>) so keyboard users can activate it, or if you keep the div add
role="button" tabindex={0} and an onKeyDown handler that triggers
fileInputRef.current?.click(); ensure the input remains type="file"
accept=".csv" with onChange={onFileSelect} and update the visible upload
affordance code paths (the div currently used around lines with the upload
affordance) to use the new label/button or include the role/tabindex/keyboard
handler.
In `@app/components/evaluations/EvalRunCard.tsx`:
- Around line 21-23: The card treats only 'completed' as terminal; update the
completion checks to include 'success' so successful runs enable "View Results"
consistently. Change the isCompleted evaluation (wherever job.status is
compared) to consider job.status?.toLowerCase() === 'completed' ||
job.status?.toLowerCase() === 'success' (affecting the isCompleted variable and
any other places doing the same check around getStatusColor, getScoreObject, and
UI conditionals mentioned) so the UI and button enablement match the styling
from getStatusColor.
---
Nitpick comments:
In `@app/components/evaluations/EvaluationsTab.tsx`:
- Around line 121-123: The current useEffect that calls fetchEvaluations should
be extended to implement a 10-second polling loop while there are any
evaluations with status "processing" or "pending": when selectedKeyId is
present, call fetchEvaluations immediately, then start a setInterval that calls
fetchEvaluations every 10_000 ms; clear the interval on cleanup or when
selectedKeyId becomes falsy; additionally stop/clear the interval if the latest
evaluations (from the component state, e.g., evaluations) contain no
"processing" or "pending" jobs to avoid unnecessary polling. Ensure the effect
depends on selectedKeyId, fetchEvaluations and evaluations (or a derived boolean
like hasPendingJobs) so polling starts/stops correctly and always clearInterval
in the cleanup function.
- Around line 113-119: The effect over evalJobs currently loops and calls
fetchAssistantConfig for each job, causing many concurrent requests; change it
to first derive a deduplicated list of missing assistant IDs (useMemo over
evalJobs and assistantConfigs to compute missingIds), then either call a
batching helper (e.g., implement fetchAssistantConfigsBatch(missingIds) and call
that from the useEffect) or issue requests sequentially with a small delay
(e.g., iterate missingIds and await a short sleep between
fetchAssistantConfig(id) calls or use Promise.all with rate-limiting). Update
the useEffect to depend on the memoized missingIds (not raw assistantConfigs)
and add a new helper function (fetchAssistantConfigsBatch) or a small
rate-limiter to prevent bursts.
In `@app/evaluations/page.tsx`:
- Around line 215-237: The validation branches in handleRunEvaluation currently
use bare returns which yield undefined; update each early-exit validation
(checks for selectedKeyId, selectedDatasetId, experimentName.trim(),
selectedConfigId/selectedConfigVersion, and missing selectedKey lookup) to
return false explicitly so the function consistently resolves Promise<boolean>;
ensure the successful path still returns true (or a boolean) at the end of
handleRunEvaluation to match the EvaluationsTabProps signature.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6a94fc86-bbff-4fc5-b694-1a53d74b0683
📒 Files selected for processing (12)
app/components/ConfigSelector.tsxapp/components/evaluations/DatasetsTab.tsxapp/components/evaluations/EvalDatasetDescription.tsxapp/components/evaluations/EvalRunCard.tsxapp/components/evaluations/EvaluationsTab.tsxapp/components/icons/evaluations/CheckIcon.tsxapp/components/icons/evaluations/ChevronDownIcon.tsxapp/components/icons/evaluations/ChevronUpIcon.tsxapp/components/icons/evaluations/EditIcon.tsxapp/components/icons/evaluations/GearIcon.tsxapp/components/icons/index.tsxapp/evaluations/page.tsx
Issue: #78
Summary:
Additional Improvements
Summary by CodeRabbit
New Features
Refactor