Assessment: Gemini Batch Fix & Dataset Preview Row Limiting by vprashrex · Pull Request #820 · ProjectTech4DevAI/kaapi-backend

vprashrex · 2026-05-09T10:13:45Z

Target issue: #830

Summary

This pull request addresses two key issues within the assessment module:

Fixed bugs in Gemini batch processing to improve the reliability and stability of AI assessment execution and testing workflows.
Added limit_row support to the dataset preview endpoint, allowing clients to fetch only a limited number of dataset rows instead of the full dataset.
Previously, large dataset responses caused frontend browser lag and UI freezes due to excessive data rendering on the client side. By introducing row limiting, the frontend can now request lightweight previews (for example, 5 rows), resulting in improved performance and a smoother user experience.

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

New Features
- GET dataset endpoint can return a lightweight preview (column headers + first N rows) via optional limit_rows (1–100).
Documentation
- API docs updated to describe the new limit_rows preview option and behavior.
Refactor
- Batch submissions now include row identifiers at the top level for clearer row tracking.
Bug Fixes
- Preview requests return appropriate HTTP errors for missing/invalid or unsupported files.
Tests
- Coverage added/updated for dataset preview behavior and batch identifier location.

coderabbitai · 2026-05-09T10:13:53Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds optional dataset preview (headers + first N rows) to GET /datasets/{id} with models, service parsing CSV/XLSX, docs and tests; moves Gemini/Google JSONL row identifier to a top-level key (tests updated).

Changes

Assessment dataset preview

Layer / File(s)	Summary
Preview Pydantic models `backend/app/models/assessment.py`	Adds `AssessmentDatasetPreview` and an optional `preview` field to `AssessmentDatasetResponse`.
Preview parsing service `backend/app/services/assessment/dataset.py`	Adds `_stringify`, `_preview_csv`, `_preview_excel`, and `preview_dataset` to fetch and parse CSV/XLSX previews and return headers + rows, with HTTP error handling.
API handler and wiring for preview `backend/app/api/routes/assessment/datasets.py`	Imports preview types/service, extends `_dataset_to_response` to accept `preview`, adds `limit_rows` query param, builds `AssessmentDatasetPreview` when requested, and includes it in responses.
Endpoint docs `backend/app/api/docs/assessment/get_dataset.md`	Documents `limit_rows` (1–100) parameter and that omitting it avoids fetching the underlying file.
Preview tests `backend/app/tests/assessment/test_dataset.py`, `backend/app/tests/assessment/test_routes.py`	Adds tests for CSV/XLSX preview outputs, encoding fallbacks, error cases, and route-level preview behavior.

Gemini Batch JSONL Schema

Layer / File(s)	Summary
JSONL Row Identifier Schema `backend/app/crud/assessment/batch.py`, `backend/app/tests/assessment/test_batch.py`	`build_google_jsonl` now emits row identifier as top-level `key` instead of `metadata.key`; test assertion updated accordingly.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant API
  participant Service
  participant ObjectStore
  Client->>API: GET /datasets/{id}?limit_rows=N
  API->>Service: preview_dataset(dataset, limit)
  Service->>ObjectStore: fetch object_store_url bytes
  ObjectStore-->>Service: raw file bytes / error
  Service-->>API: (headers, rows) or raise HTTPException
  API-->>Client: AssessmentDatasetResponse (with preview) or error

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

ProjectTech4DevAI/kaapi-backend#788: Earlier changes to Gemini/Google batch JSONL construction touching the same batch code paths.

Suggested labels

enhancement

Suggested reviewers

kartpop
AkhileshNegi
Ayush8923

Poem

🐰 A key pops up where it’s easy to see,
Preview hops in with a header and three,
CSV and sheets, trimmed neat and sweet,
Rows and columns met on a tiny treat,
Hooray — the backend’s lighter on its feet!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.24% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes both main changes: fixing a Gemini batch schema issue and adding dataset preview row limiting functionality.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/assessment-gemini-batch-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ints

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/api/routes/assessment/datasets.py`:
- Around line 149-161: The truncated flag is over-reported because the code
treats len(rows) >= limit_rows as truncated; to fix, request one extra row from
preview_assessment_dataset (call with limit=limit_rows + 1), set truncated =
len(rows) > limit_rows, and if truncated trim rows to the original limit_rows
before constructing AssessmentDatasetPreview (use the existing names session,
dataset, limit_rows, preview_assessment_dataset, headers, rows, and
AssessmentDatasetPreview).

In `@backend/app/services/assessment/dataset.py`:
- Around line 197-219: The current preview logic defaults to CSV for any
non-".xlsx" file_ext which can silently mis-handle missing/invalid metadata;
update the preview path in the preview function (where file_ext is derived) to
validate file_ext explicitly (normalize with .lower() and strip), and only allow
known extensions like ".xlsx" and ".csv"; if file_ext is None or not in the
allowed set, raise HTTPException(status_code=422, detail="Unsupported or missing
file extension.") instead of calling _preview_csv, otherwise call _preview_excel
for ".xlsx" and _preview_csv for ".csv".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 60a75072-f1b1-49b2-b789-0b3989427bea

📥 Commits

Reviewing files that changed from the base of the PR and between e08abbc and 15ad20d.

📒 Files selected for processing (5)

backend/app/api/docs/assessment/get_dataset.md
backend/app/api/routes/assessment/datasets.py
backend/app/models/assessment.py
backend/app/services/assessment/dataset.py
backend/app/tests/assessment/test_batch.py

✅ Files skipped from review due to trivial changes (1)

backend/app/api/docs/assessment/get_dataset.md

…r validation

codecov · 2026-05-12T06:53:12Z

Codecov Report

❌ Patch coverage is 98.07692% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/services/assessment/dataset.py	94.11%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

…clude preview with limit_rows

coderabbitai

🧹 Nitpick comments (3)

backend/app/tests/assessment/test_dataset.py (3)
148-163: 💤 Low value

Consider importing openpyxl at module level.

openpyxl is already imported at line 7 for InvalidFileException. Importing it again inside the test function (lines 149, 151) is inconsistent with the module-level import pattern.
♻️ Proposed consolidation

At the top of the file, consolidate the imports:
 from openpyxl.utils.exceptions import InvalidFileException
+import openpyxl
+import io
Then remove the inline imports in the test functions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/assessment/test_dataset.py` around lines 148 - 163, The
test function test_preview_excel_returns_headers_and_rows imports openpyxl
locally even though openpyxl is already imported at module level for
InvalidFileException; remove the inline imports inside
test_preview_excel_returns_headers_and_rows and any other tests, and add/ensure
a single module-level import for openpyxl alongside InvalidFileException so the
test uses that top-level import instead.
142-146: ⚡ Quick win

Strengthen the latin-1 fallback assertion.

The test claims to verify latin-1 fallback but only checks that the value starts with "ca". It should verify that the invalid UTF-8 byte \xff was correctly decoded as ÿ (U+00FF in latin-1) rather than dropped.
✨ Proposed stronger assertion
     def test_preview_csv_handles_latin1_fallback(self) -> None:
         # \xff is invalid utf-8 -> falls back to latin-1
         headers, rows = _preview_csv(b"name\nca\xfffe\n", limit=5)
         assert headers == ["name"]
-        assert rows and rows[0][0].startswith("ca")
+        assert rows and rows[0][0] == "caÿfe"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/assessment/test_dataset.py` around lines 142 - 146, Update
the test_preview_csv_handles_latin1_fallback assertion to verify the latin-1
decoded character is present: call _preview_csv as before, then assert that the
first cell exactly equals "caÿ" or contains the Unicode character U+00FF (ÿ) to
ensure the invalid UTF-8 byte 0xFF was decoded via latin-1; refer to the test
function name test_preview_csv_handles_latin1_fallback and the helper
_preview_csv when locating the change.
165-175: ⚡ Quick win

Clarify expected behavior for empty workbooks.

The assertion at line 174 accepts two different outcomes ([""] or []), which suggests either:

The expected behavior for empty workbooks is not well-defined, or

The test is being overly permissive.

Consider determining the correct expected behavior and asserting only that outcome.
♻️ Proposed fix

If empty workbooks should return an empty list:
-        assert headers == [""] or headers == []
+        assert headers == []
Or if they should return a list with one empty string:
-        assert headers == [""] or headers == []
+        assert headers == [""]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/assessment/test_dataset.py` around lines 165 - 175, The
test test_preview_excel_empty_workbook is ambiguous because it accepts two
outcomes for headers; decide the canonical behavior for _preview_excel (either
return [] for no headers or [""] to represent a single empty header) and update
the test to assert that single expected value only; locate the test function
test_preview_excel_empty_workbook and the helper _preview_excel, then change the
assertion to assert headers == <chosen_expected_value> (and keep assert rows ==
[]), or if you choose to change _preview_excel instead, make it return the
chosen headers shape for an empty workbook and keep the test asserting that one
outcome.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@backend/app/tests/assessment/test_dataset.py`:
- Around line 148-163: The test function
test_preview_excel_returns_headers_and_rows imports openpyxl locally even though
openpyxl is already imported at module level for InvalidFileException; remove
the inline imports inside test_preview_excel_returns_headers_and_rows and any
other tests, and add/ensure a single module-level import for openpyxl alongside
InvalidFileException so the test uses that top-level import instead.
- Around line 142-146: Update the test_preview_csv_handles_latin1_fallback
assertion to verify the latin-1 decoded character is present: call _preview_csv
as before, then assert that the first cell exactly equals "caÿ" or contains the
Unicode character U+00FF (ÿ) to ensure the invalid UTF-8 byte 0xFF was decoded
via latin-1; refer to the test function name
test_preview_csv_handles_latin1_fallback and the helper _preview_csv when
locating the change.
- Around line 165-175: The test test_preview_excel_empty_workbook is ambiguous
because it accepts two outcomes for headers; decide the canonical behavior for
_preview_excel (either return [] for no headers or [""] to represent a single
empty header) and update the test to assert that single expected value only;
locate the test function test_preview_excel_empty_workbook and the helper
_preview_excel, then change the assertion to assert headers ==
<chosen_expected_value> (and keep assert rows == []), or if you choose to change
_preview_excel instead, make it return the chosen headers shape for an empty
workbook and keep the test asserting that one outcome.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3ab2e4d5-c58f-4731-ada1-cfe46d7ba28d

📥 Commits

Reviewing files that changed from the base of the PR and between fa5e476 and 0a20a8b.

📒 Files selected for processing (2)

backend/app/tests/assessment/test_dataset.py
backend/app/tests/assessment/test_routes.py

…corresponding tests

Assessment (HotFix): Gemini Batch Fix

e08abbc

vprashrex requested a review from Prajna1999 May 11, 2026 05:34

vprashrex self-assigned this May 11, 2026

vprashrex added the ready-for-review label May 11, 2026

Prajna1999 approved these changes May 12, 2026

View reviewed changes

Add dataset preview functionality and update related models and endpo…

15ad20d

…ints

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread backend/app/api/routes/assessment/datasets.py

Comment thread backend/app/services/assessment/dataset.py Outdated

Refactor: Update limit_rows parameter type to use Annotated for bette…

fa5e476

…r validation

vprashrex added 2 commits May 12, 2026 19:17

Merge branch 'main' into chore/assessment-gemini-batch-fix

beee4bc

Add dataset preview tests and enhance get_dataset functionality to in…

0a20a8b

…clude preview with limit_rows

vprashrex changed the title ~~Assessment (HotFix): Gemini Batch Fix~~ Assessment: Gemini Batch Fix May 12, 2026

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

vprashrex requested a review from Ayush8923 May 12, 2026 13:59

vprashrex changed the title ~~Assessment: Gemini Batch Fix~~ Assessment: Gemini Batch Fix & Dataset Preview Row Limiting May 12, 2026

Ayush8923 approved these changes May 12, 2026

View reviewed changes

Enhance preview_dataset function to validate file extensions and add …

d4c56ee

…corresponding tests

vprashrex added the ready-for-merge label May 13, 2026

vprashrex merged commit 1d02df6 into main May 13, 2026
2 checks passed

vprashrex deleted the chore/assessment-gemini-batch-fix branch May 13, 2026 01:52

vprashrex linked an issue May 13, 2026 that may be closed by this pull request

Assessment: Gemini Batch Processing Fix & Dataset Preview Row Limiting #830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assessment: Gemini Batch Fix & Dataset Preview Row Limiting#820

Assessment: Gemini Batch Fix & Dataset Preview Row Limiting#820
vprashrex merged 6 commits into
mainfrom
chore/assessment-gemini-batch-fix

vprashrex commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vprashrex commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Target issue: #830

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vprashrex commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading