Skip to content

Assessment: Gemini Batch Fix & Dataset Preview Row Limiting#820

Merged
vprashrex merged 6 commits into
mainfrom
chore/assessment-gemini-batch-fix
May 13, 2026
Merged

Assessment: Gemini Batch Fix & Dataset Preview Row Limiting#820
vprashrex merged 6 commits into
mainfrom
chore/assessment-gemini-batch-fix

Conversation

@vprashrex
Copy link
Copy Markdown
Collaborator

@vprashrex vprashrex commented May 9, 2026

Target issue: #830

Summary

This pull request addresses two key issues within the assessment module:

  1. Fixed bugs in Gemini batch processing to improve the reliability and stability of AI assessment execution and testing workflows.
  2. Added limit_row support to the dataset preview endpoint, allowing clients to fetch only a limited number of dataset rows instead of the full dataset.
    Previously, large dataset responses caused frontend browser lag and UI freezes due to excessive data rendering on the client side. By introducing row limiting, the frontend can now request lightweight previews (for example, 5 rows), resulting in improved performance and a smoother user experience.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features

    • GET dataset endpoint can return a lightweight preview (column headers + first N rows) via optional limit_rows (1–100).
  • Documentation

    • API docs updated to describe the new limit_rows preview option and behavior.
  • Refactor

    • Batch submissions now include row identifiers at the top level for clearer row tracking.
  • Bug Fixes

    • Preview requests return appropriate HTTP errors for missing/invalid or unsupported files.
  • Tests

    • Coverage added/updated for dataset preview behavior and batch identifier location.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds optional dataset preview (headers + first N rows) to GET /datasets/{id} with models, service parsing CSV/XLSX, docs and tests; moves Gemini/Google JSONL row identifier to a top-level key (tests updated).

Changes

Assessment dataset preview

Layer / File(s) Summary
Preview Pydantic models
backend/app/models/assessment.py
Adds AssessmentDatasetPreview and an optional preview field to AssessmentDatasetResponse.
Preview parsing service
backend/app/services/assessment/dataset.py
Adds _stringify, _preview_csv, _preview_excel, and preview_dataset to fetch and parse CSV/XLSX previews and return headers + rows, with HTTP error handling.
API handler and wiring for preview
backend/app/api/routes/assessment/datasets.py
Imports preview types/service, extends _dataset_to_response to accept preview, adds limit_rows query param, builds AssessmentDatasetPreview when requested, and includes it in responses.
Endpoint docs
backend/app/api/docs/assessment/get_dataset.md
Documents limit_rows (1–100) parameter and that omitting it avoids fetching the underlying file.
Preview tests
backend/app/tests/assessment/test_dataset.py, backend/app/tests/assessment/test_routes.py
Adds tests for CSV/XLSX preview outputs, encoding fallbacks, error cases, and route-level preview behavior.

Gemini Batch JSONL Schema

Layer / File(s) Summary
JSONL Row Identifier Schema
backend/app/crud/assessment/batch.py, backend/app/tests/assessment/test_batch.py
build_google_jsonl now emits row identifier as top-level key instead of metadata.key; test assertion updated accordingly.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API
  participant Service
  participant ObjectStore
  Client->>API: GET /datasets/{id}?limit_rows=N
  API->>Service: preview_dataset(dataset, limit)
  Service->>ObjectStore: fetch object_store_url bytes
  ObjectStore-->>Service: raw file bytes / error
  Service-->>API: (headers, rows) or raise HTTPException
  API-->>Client: AssessmentDatasetResponse (with preview) or error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • kartpop
  • AkhileshNegi
  • Ayush8923

Poem

🐰 A key pops up where it’s easy to see,
Preview hops in with a header and three,
CSV and sheets, trimmed neat and sweet,
Rows and columns met on a tiny treat,
Hooray — the backend’s lighter on its feet!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.24% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes both main changes: fixing a Gemini batch schema issue and adding dataset preview row limiting functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/assessment-gemini-batch-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vprashrex vprashrex requested a review from Prajna1999 May 11, 2026 05:34
@vprashrex vprashrex self-assigned this May 11, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/routes/assessment/datasets.py`:
- Around line 149-161: The truncated flag is over-reported because the code
treats len(rows) >= limit_rows as truncated; to fix, request one extra row from
preview_assessment_dataset (call with limit=limit_rows + 1), set truncated =
len(rows) > limit_rows, and if truncated trim rows to the original limit_rows
before constructing AssessmentDatasetPreview (use the existing names session,
dataset, limit_rows, preview_assessment_dataset, headers, rows, and
AssessmentDatasetPreview).

In `@backend/app/services/assessment/dataset.py`:
- Around line 197-219: The current preview logic defaults to CSV for any
non-".xlsx" file_ext which can silently mis-handle missing/invalid metadata;
update the preview path in the preview function (where file_ext is derived) to
validate file_ext explicitly (normalize with .lower() and strip), and only allow
known extensions like ".xlsx" and ".csv"; if file_ext is None or not in the
allowed set, raise HTTPException(status_code=422, detail="Unsupported or missing
file extension.") instead of calling _preview_csv, otherwise call _preview_excel
for ".xlsx" and _preview_csv for ".csv".
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 60a75072-f1b1-49b2-b789-0b3989427bea

📥 Commits

Reviewing files that changed from the base of the PR and between e08abbc and 15ad20d.

📒 Files selected for processing (5)
  • backend/app/api/docs/assessment/get_dataset.md
  • backend/app/api/routes/assessment/datasets.py
  • backend/app/models/assessment.py
  • backend/app/services/assessment/dataset.py
  • backend/app/tests/assessment/test_batch.py
✅ Files skipped from review due to trivial changes (1)
  • backend/app/api/docs/assessment/get_dataset.md

Comment thread backend/app/api/routes/assessment/datasets.py
Comment thread backend/app/services/assessment/dataset.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 98.07692% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/services/assessment/dataset.py 94.11% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@vprashrex vprashrex changed the title Assessment (HotFix): Gemini Batch Fix Assessment: Gemini Batch Fix May 12, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
backend/app/tests/assessment/test_dataset.py (3)

148-163: 💤 Low value

Consider importing openpyxl at module level.

openpyxl is already imported at line 7 for InvalidFileException. Importing it again inside the test function (lines 149, 151) is inconsistent with the module-level import pattern.

♻️ Proposed consolidation

At the top of the file, consolidate the imports:

 from openpyxl.utils.exceptions import InvalidFileException
+import openpyxl
+import io

Then remove the inline imports in the test functions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/assessment/test_dataset.py` around lines 148 - 163, The
test function test_preview_excel_returns_headers_and_rows imports openpyxl
locally even though openpyxl is already imported at module level for
InvalidFileException; remove the inline imports inside
test_preview_excel_returns_headers_and_rows and any other tests, and add/ensure
a single module-level import for openpyxl alongside InvalidFileException so the
test uses that top-level import instead.

142-146: ⚡ Quick win

Strengthen the latin-1 fallback assertion.

The test claims to verify latin-1 fallback but only checks that the value starts with "ca". It should verify that the invalid UTF-8 byte \xff was correctly decoded as ÿ (U+00FF in latin-1) rather than dropped.

✨ Proposed stronger assertion
     def test_preview_csv_handles_latin1_fallback(self) -> None:
         # \xff is invalid utf-8 -> falls back to latin-1
         headers, rows = _preview_csv(b"name\nca\xfffe\n", limit=5)
         assert headers == ["name"]
-        assert rows and rows[0][0].startswith("ca")
+        assert rows and rows[0][0] == "caÿfe"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/assessment/test_dataset.py` around lines 142 - 146, Update
the test_preview_csv_handles_latin1_fallback assertion to verify the latin-1
decoded character is present: call _preview_csv as before, then assert that the
first cell exactly equals "caÿ" or contains the Unicode character U+00FF (ÿ) to
ensure the invalid UTF-8 byte 0xFF was decoded via latin-1; refer to the test
function name test_preview_csv_handles_latin1_fallback and the helper
_preview_csv when locating the change.

165-175: ⚡ Quick win

Clarify expected behavior for empty workbooks.

The assertion at line 174 accepts two different outcomes ([""] or []), which suggests either:

  1. The expected behavior for empty workbooks is not well-defined, or
  2. The test is being overly permissive.

Consider determining the correct expected behavior and asserting only that outcome.

♻️ Proposed fix

If empty workbooks should return an empty list:

-        assert headers == [""] or headers == []
+        assert headers == []

Or if they should return a list with one empty string:

-        assert headers == [""] or headers == []
+        assert headers == [""]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/assessment/test_dataset.py` around lines 165 - 175, The
test test_preview_excel_empty_workbook is ambiguous because it accepts two
outcomes for headers; decide the canonical behavior for _preview_excel (either
return [] for no headers or [""] to represent a single empty header) and update
the test to assert that single expected value only; locate the test function
test_preview_excel_empty_workbook and the helper _preview_excel, then change the
assertion to assert headers == <chosen_expected_value> (and keep assert rows ==
[]), or if you choose to change _preview_excel instead, make it return the
chosen headers shape for an empty workbook and keep the test asserting that one
outcome.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/app/tests/assessment/test_dataset.py`:
- Around line 148-163: The test function
test_preview_excel_returns_headers_and_rows imports openpyxl locally even though
openpyxl is already imported at module level for InvalidFileException; remove
the inline imports inside test_preview_excel_returns_headers_and_rows and any
other tests, and add/ensure a single module-level import for openpyxl alongside
InvalidFileException so the test uses that top-level import instead.
- Around line 142-146: Update the test_preview_csv_handles_latin1_fallback
assertion to verify the latin-1 decoded character is present: call _preview_csv
as before, then assert that the first cell exactly equals "caÿ" or contains the
Unicode character U+00FF (ÿ) to ensure the invalid UTF-8 byte 0xFF was decoded
via latin-1; refer to the test function name
test_preview_csv_handles_latin1_fallback and the helper _preview_csv when
locating the change.
- Around line 165-175: The test test_preview_excel_empty_workbook is ambiguous
because it accepts two outcomes for headers; decide the canonical behavior for
_preview_excel (either return [] for no headers or [""] to represent a single
empty header) and update the test to assert that single expected value only;
locate the test function test_preview_excel_empty_workbook and the helper
_preview_excel, then change the assertion to assert headers ==
<chosen_expected_value> (and keep assert rows == []), or if you choose to change
_preview_excel instead, make it return the chosen headers shape for an empty
workbook and keep the test asserting that one outcome.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3ab2e4d5-c58f-4731-ada1-cfe46d7ba28d

📥 Commits

Reviewing files that changed from the base of the PR and between fa5e476 and 0a20a8b.

📒 Files selected for processing (2)
  • backend/app/tests/assessment/test_dataset.py
  • backend/app/tests/assessment/test_routes.py

@vprashrex vprashrex requested a review from Ayush8923 May 12, 2026 13:59
@vprashrex vprashrex changed the title Assessment: Gemini Batch Fix Assessment: Gemini Batch Fix & Dataset Preview Row Limiting May 12, 2026
@vprashrex vprashrex merged commit 1d02df6 into main May 13, 2026
2 checks passed
@vprashrex vprashrex deleted the chore/assessment-gemini-batch-fix branch May 13, 2026 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assessment: Gemini Batch Processing Fix & Dataset Preview Row Limiting

3 participants