Skip to content

Feat(evaluation): Improve UI usability issues#80

Merged
Ayush8923 merged 2 commits intomainfrom
feat/evaluation-ui-usability-improvements
Mar 20, 2026
Merged

Feat(evaluation): Improve UI usability issues#80
Ayush8923 merged 2 commits intomainfrom
feat/evaluation-ui-usability-improvements

Conversation

@Ayush8923
Copy link
Collaborator

@Ayush8923 Ayush8923 commented Mar 20, 2026

Issue: #78

Summary:

  • Max Results value overlapping with Knowledge Base ID
  • Missing date and time information for evaluation runs
  • Limited visibility in the prompt preview section

Additional Improvements

  • Performed frontend code cleanup and refactoring
  • Improved code structure and readability
  • Reduced duplication and streamlined components

Summary by CodeRabbit

  • New Features

    • Dataset management: Create, upload, view, and delete CSV datasets with drag-and-drop support
    • Evaluation job monitoring with status filtering and run summary cards
    • Expandable prompt preview in configuration selector
  • Refactor

    • Reorganized evaluation page components for improved modularity and maintainability

@Ayush8923 Ayush8923 self-assigned this Mar 20, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

This PR refactors the evaluations UI by extracting implementations from the main page component into reusable, composable components: DatasetsTab, EvaluationsTab, EvalRunCard, and EvalDatasetDescription. It adds icon components (CheckIcon, ChevronDownIcon, ChevronUpIcon, EditIcon, GearIcon) and enhances ConfigSelector with prompt preview expansion functionality. The page component is substantially simplified by delegating UI logic to these new modules.

Changes

Cohort / File(s) Summary
ConfigSelector Enhancement
app/components/ConfigSelector.tsx
Added prompt preview expansion/collapse toggle with overflow detection via useLayoutEffect and promptRef. Replaced inline SVG icons with imported icon components (EditIcon, GearIcon, ChevronDownIcon, CheckIcon). Updated disabled button styling with cursor-pointer and disabled:cursor-not-allowed. Enhanced "Knowledge Base IDs" layout with break-all text wrapping and col-span-2 grid adjustment.
Evaluation Components
app/components/evaluations/DatasetsTab.tsx, app/components/evaluations/EvaluationsTab.tsx, app/components/evaluations/EvalRunCard.tsx, app/components/evaluations/EvalDatasetDescription.tsx
Added four new React components for evaluation UI. DatasetsTab provides CSV dataset upload, drag-and-drop handling, deletion, and viewing with authenticated API calls. EvaluationsTab manages evaluation job submission (via ConfigSelector), filtering by status, and displays run cards with assistant config caching. EvalRunCard renders individual evaluation run summaries with config/results views. EvalDatasetDescription truncates descriptions to 100 chars with expand/collapse toggle.
Icon Components
app/components/icons/evaluations/CheckIcon.tsx, app/components/icons/evaluations/ChevronDownIcon.tsx, app/components/icons/evaluations/ChevronUpIcon.tsx, app/components/icons/evaluations/EditIcon.tsx, app/components/icons/evaluations/GearIcon.tsx
Added five new icon components with optional className and style props, each rendering a distinct SVG glyph (currentColor stroke/fill). Provides reusable icon UI elements across evaluation and config components.
Icon Export Index
app/components/icons/index.tsx
Added centralized re-export module for five evaluation icon components, simplifying import statements across the codebase.
Page Refactoring
app/evaluations/page.tsx
Replaced ~1,000 lines of inline dataset and evaluation UI logic with imports of DatasetsTab and EvaluationsTab components. Updated imports to use absolute aliases (@/app/...). Removed local implementations of dataset CRUD, evaluation job fetching, run card rendering, and modal management, delegating to new components while maintaining state/prop threading.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant EvaluationsTab as EvaluationsTab Component
    participant ConfigSelector as ConfigSelector
    participant API as Backend API
    participant DatasetsTab as DatasetsTab Component

    User->>EvaluationsTab: Select dataset & config, enter experiment name
    EvaluationsTab->>ConfigSelector: Render config selector
    ConfigSelector-->>EvaluationsTab: Return selected config
    User->>EvaluationsTab: Click "Run Evaluation"
    EvaluationsTab->>API: POST /api/evaluations (with config & dataset)
    API-->>EvaluationsTab: Return job ID & status
    EvaluationsTab->>API: GET /api/evaluations (poll jobs list)
    API-->>EvaluationsTab: Return evaluation jobs with statuses
    EvaluationsTab->>API: GET /api/assistant/:assistantId (per-job config)
    API-->>EvaluationsTab: Return assistant config
    EvaluationsTab->>EvaluationsTab: Render EvalRunCard with job & config
    User->>EvaluationsTab: Filter by status / Click refresh
    EvaluationsTab->>API: GET /api/evaluations (with filters)
    API-->>EvaluationsTab: Return filtered job list
    EvaluationsTab->>EvaluationsTab: Re-render cards with updated jobs
    
    User->>DatasetsTab: Upload CSV via drag-drop
    DatasetsTab->>API: POST /api/evaluations/datasets (create dataset)
    API-->>DatasetsTab: Return dataset ID
    User->>DatasetsTab: Click "View" on dataset
    DatasetsTab->>API: GET /api/evaluations/datasets/:id (fetch content)
    API-->>DatasetsTab: Return CSV text & signed URL
    DatasetsTab->>DatasetsTab: Parse CSV, render table
    User->>DatasetsTab: Click "Download"
    DatasetsTab->>User: Generate & download Blob
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #56: Modifies the same ConfigSelector.tsx component with overlapping functionality adjustments.
  • PR #4: Introduces the same icon components (EditIcon, GearIcon, ChevronDownIcon, CheckIcon) and icons index infrastructure.
  • PR #44: Modifies app/evaluations/page.tsx with complementary UI label and error handling changes.

Suggested labels

enhancement

Suggested reviewers

  • Prajna1999
  • nishika26
  • AkhileshNegi

🐰 Hop, skip, and a component jump!
Evaluations refactored, no more UI clump,
Icons polished, datasets now flow,
ConfigSelector expands—watch the previews glow!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feat(evaluation): Improve UI usability issues' is vague and generic, using a non-descriptive term 'usability issues' that does not clearly convey the specific changes made in this substantial refactoring and component addition. Make the title more specific by highlighting the primary change, such as 'Extract evaluation tab components' or 'Refactor evaluations page with new component library' to better represent the scope of the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/evaluation-ui-usability-improvements
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
app/evaluations/page.tsx (1)

215-237: Inconsistent return type: early returns yield undefined instead of false.

The function signature implies Promise<boolean> (per the EvaluationsTabProps interface), but validation-error paths return undefined via bare return;. While this works (since undefined !== true), it's cleaner to return false explicitly for consistency and type safety.

♻️ Suggested fix for consistent return type
   const handleRunEvaluation = async () => {
     if (!selectedKeyId) {
       toast.error('Please select an API key first');
-      return;
+      return false;
     }
     if (!selectedDatasetId) {
       toast.error('Please select a dataset first');
-      return;
+      return false;
     }
     if (!experimentName.trim()) {
       toast.error('Please enter an evaluation name');
-      return;
+      return false;
     }
     if (!selectedConfigId || !selectedConfigVersion) {
       toast.error('Please select a configuration before running evaluation');
-      return;
+      return false;
     }

     const selectedKey = apiKeys.find(k => k.id === selectedKeyId);
     if (!selectedKey) {
       toast.error('Selected API key not found');
-      return;
+      return false;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/evaluations/page.tsx` around lines 215 - 237, The validation branches in
handleRunEvaluation currently use bare returns which yield undefined; update
each early-exit validation (checks for selectedKeyId, selectedDatasetId,
experimentName.trim(), selectedConfigId/selectedConfigVersion, and missing
selectedKey lookup) to return false explicitly so the function consistently
resolves Promise<boolean>; ensure the successful path still returns true (or a
boolean) at the end of handleRunEvaluation to match the EvaluationsTabProps
signature.
app/components/evaluations/EvaluationsTab.tsx (2)

121-123: Missing 10-second polling for job status updates.

The current implementation only fetches evaluations on mount and manual refresh. Jobs with "processing" or "pending" status won't auto-update, requiring users to manually refresh.

Based on learnings: "Implement polling every 10 seconds for job status updates in evaluation workflows instead of using WebSockets or server-sent events."

♻️ Suggested polling implementation
   useEffect(() => {
     if (selectedKeyId) fetchEvaluations();
   }, [selectedKeyId, fetchEvaluations]);
+
+  // Poll for job status updates every 10 seconds
+  useEffect(() => {
+    if (!selectedKeyId) return;
+    
+    const hasInProgressJobs = evalJobs.some(
+      job => job.status.toLowerCase() === 'processing' || job.status.toLowerCase() === 'pending'
+    );
+    
+    if (!hasInProgressJobs) return;
+    
+    const intervalId = setInterval(fetchEvaluations, 10000);
+    return () => clearInterval(intervalId);
+  }, [selectedKeyId, evalJobs, fetchEvaluations]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/evaluations/EvaluationsTab.tsx` around lines 121 - 123, The
current useEffect that calls fetchEvaluations should be extended to implement a
10-second polling loop while there are any evaluations with status "processing"
or "pending": when selectedKeyId is present, call fetchEvaluations immediately,
then start a setInterval that calls fetchEvaluations every 10_000 ms; clear the
interval on cleanup or when selectedKeyId becomes falsy; additionally stop/clear
the interval if the latest evaluations (from the component state, e.g.,
evaluations) contain no "processing" or "pending" jobs to avoid unnecessary
polling. Ensure the effect depends on selectedKeyId, fetchEvaluations and
evaluations (or a derived boolean like hasPendingJobs) so polling starts/stops
correctly and always clearInterval in the cleanup function.

113-119: Consider batching or memoizing assistant config fetches.

The effect iterates all evalJobs on every change, triggering individual fetch calls for each unique assistant_id. For large job lists with many distinct assistants, this could result in a burst of network requests. Consider:

  1. Extracting unique missing IDs first
  2. Adding a small delay between requests
  3. Or batching if the API supports it

This is acceptable for typical use cases but may scale poorly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/components/evaluations/EvaluationsTab.tsx` around lines 113 - 119, The
effect over evalJobs currently loops and calls fetchAssistantConfig for each
job, causing many concurrent requests; change it to first derive a deduplicated
list of missing assistant IDs (useMemo over evalJobs and assistantConfigs to
compute missingIds), then either call a batching helper (e.g., implement
fetchAssistantConfigsBatch(missingIds) and call that from the useEffect) or
issue requests sequentially with a small delay (e.g., iterate missingIds and
await a short sleep between fetchAssistantConfig(id) calls or use Promise.all
with rate-limiting). Update the useEffect to depend on the memoized missingIds
(not raw assistantConfigs) and add a new helper function
(fetchAssistantConfigsBatch) or a small rate-limiter to prevent bursts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/components/ConfigSelector.tsx`:
- Around line 396-400: The chevron icons rendered in the conditional using
promptExpanded (ChevronUpIcon and ChevronDownIcon) are missing sizing classes;
update those components to include the same utility size classes used elsewhere
(e.g., add className="w-3.5 h-3.5" or "w-4 h-4" to ChevronUpIcon and
ChevronDownIcon) so they render consistently with other icons in this component
(match the sizes used at other usages around the file).

In `@app/components/evaluations/DatasetsTab.tsx`:
- Around line 200-227: The label elements in DatasetsTab are not associated with
their controls; update each label to use htmlFor and give the corresponding
input/select a matching id (e.g., id="dataset-name" for the input using
datasetName and id="dataset-description" for the one using datasetDescription),
and do the same for the other unlabeled fields referenced around lines 230-285
(matching ids to the state setters like setDatasetName/setDatasetDescription and
any select handlers) so assistive tech can correctly associate labels with their
controls.
- Around line 455-531: The modal opened via viewModalData lacks accessible
dialog behavior: add role="dialog", aria-modal="true", and an aria-labelledby
referencing the modal title element (e.g., the h3 showing viewModalData.name);
implement Escape-key handling to call setViewModalData(null) and add focus
management (on open save document.activeElement, move focus into the modal
container—give it tabIndex={-1}—and trap/tab-cycle focus while open, then
restore focus when closed). Put the same changes for the other overlay (the one
referenced in the review). Ensure backdrop click still closes but keyboard and
screen-reader users get proper dialog semantics and focus restoration.
- Around line 118-140: The CSV parser corrupts valid CSV by pre-splitting
csvText on '\n' and trimming every cell; update the parsing flow so you do not
split into lines before handling quoted newlines and do not trim cell contents.
Use the existing parseRow logic approach but apply it to the full csvText to
produce rows by scanning characters and detecting row boundaries only when
encountering an unquoted newline (so quoted multiline fields are preserved),
keep cell values exactly as parsed (no .trim()), and ensure headers and rows
(variables headers and rows) are derived from that parser; for downloads, use
the original csvText or serialize from the preserved parsed values without
trimming to avoid altering whitespace/newlines.
- Around line 279-285: The file picker is currently hidden (input ref
fileInputRef, onChange onFileSelect) while the visible upload affordance is a
non-focusable div, preventing keyboard users from opening the file dialog; fix
by giving the real input an id and using a semantic, focusable control
(preferably a <label htmlFor="..."> wrapping the visible affordance or a
<button>) so keyboard users can activate it, or if you keep the div add
role="button" tabindex={0} and an onKeyDown handler that triggers
fileInputRef.current?.click(); ensure the input remains type="file"
accept=".csv" with onChange={onFileSelect} and update the visible upload
affordance code paths (the div currently used around lines with the upload
affordance) to use the new label/button or include the role/tabindex/keyboard
handler.

In `@app/components/evaluations/EvalRunCard.tsx`:
- Around line 21-23: The card treats only 'completed' as terminal; update the
completion checks to include 'success' so successful runs enable "View Results"
consistently. Change the isCompleted evaluation (wherever job.status is
compared) to consider job.status?.toLowerCase() === 'completed' ||
job.status?.toLowerCase() === 'success' (affecting the isCompleted variable and
any other places doing the same check around getStatusColor, getScoreObject, and
UI conditionals mentioned) so the UI and button enablement match the styling
from getStatusColor.

---

Nitpick comments:
In `@app/components/evaluations/EvaluationsTab.tsx`:
- Around line 121-123: The current useEffect that calls fetchEvaluations should
be extended to implement a 10-second polling loop while there are any
evaluations with status "processing" or "pending": when selectedKeyId is
present, call fetchEvaluations immediately, then start a setInterval that calls
fetchEvaluations every 10_000 ms; clear the interval on cleanup or when
selectedKeyId becomes falsy; additionally stop/clear the interval if the latest
evaluations (from the component state, e.g., evaluations) contain no
"processing" or "pending" jobs to avoid unnecessary polling. Ensure the effect
depends on selectedKeyId, fetchEvaluations and evaluations (or a derived boolean
like hasPendingJobs) so polling starts/stops correctly and always clearInterval
in the cleanup function.
- Around line 113-119: The effect over evalJobs currently loops and calls
fetchAssistantConfig for each job, causing many concurrent requests; change it
to first derive a deduplicated list of missing assistant IDs (useMemo over
evalJobs and assistantConfigs to compute missingIds), then either call a
batching helper (e.g., implement fetchAssistantConfigsBatch(missingIds) and call
that from the useEffect) or issue requests sequentially with a small delay
(e.g., iterate missingIds and await a short sleep between
fetchAssistantConfig(id) calls or use Promise.all with rate-limiting). Update
the useEffect to depend on the memoized missingIds (not raw assistantConfigs)
and add a new helper function (fetchAssistantConfigsBatch) or a small
rate-limiter to prevent bursts.

In `@app/evaluations/page.tsx`:
- Around line 215-237: The validation branches in handleRunEvaluation currently
use bare returns which yield undefined; update each early-exit validation
(checks for selectedKeyId, selectedDatasetId, experimentName.trim(),
selectedConfigId/selectedConfigVersion, and missing selectedKey lookup) to
return false explicitly so the function consistently resolves Promise<boolean>;
ensure the successful path still returns true (or a boolean) at the end of
handleRunEvaluation to match the EvaluationsTabProps signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6a94fc86-bbff-4fc5-b694-1a53d74b0683

📥 Commits

Reviewing files that changed from the base of the PR and between f5fa6ce and 0837c95.

📒 Files selected for processing (12)
  • app/components/ConfigSelector.tsx
  • app/components/evaluations/DatasetsTab.tsx
  • app/components/evaluations/EvalDatasetDescription.tsx
  • app/components/evaluations/EvalRunCard.tsx
  • app/components/evaluations/EvaluationsTab.tsx
  • app/components/icons/evaluations/CheckIcon.tsx
  • app/components/icons/evaluations/ChevronDownIcon.tsx
  • app/components/icons/evaluations/ChevronUpIcon.tsx
  • app/components/icons/evaluations/EditIcon.tsx
  • app/components/icons/evaluations/GearIcon.tsx
  • app/components/icons/index.tsx
  • app/evaluations/page.tsx

@Ayush8923 Ayush8923 merged commit 069d9f2 into main Mar 20, 2026
1 check passed
@Ayush8923 Ayush8923 deleted the feat/evaluation-ui-usability-improvements branch March 20, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants