Evaluation: Added type for dataset by AkhileshNegi · Pull Request #641 · ProjectTech4DevAI/kaapi-backend

AkhileshNegi · 2026-03-05T10:32:09Z

Summary

Target issue is ProjectTech4DevAI/kaapi-frontend#52 (review)

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

New Features
- Dataset operations now consistently filter to only include text-type evaluation datasets for creation, retrieval, and listing.
Tests
- Added tests to verify datasets default to text-type and that non-text datasets are excluded from retrieval and listing.

Summary by CodeRabbit

Improvements
- Evaluation datasets and runs now incorporate type-based filtering. New evaluations default to TEXT type, and all retrieval and listing operations automatically filter to show only TEXT-type datasets and runs. This ensures consistent data handling and establishes the foundation for supporting additional evaluation types in the future.

coderabbitai · 2026-03-05T10:38:09Z

📝 Walkthrough

Walkthrough

Adds type-based filtering for evaluation datasets and runs: created records default to EvaluationType.TEXT, and get/list operations now restrict results to type == TEXT across dataset and run CRUD paths.

Changes

Cohort / File(s)	Summary
Evaluation dataset CRUD `backend/app/crud/evaluations/dataset.py`	Import `EvaluationType`; set created datasets to `TEXT`; add `type == TEXT` constraints to get-by-id, get-by-name, and list queries.
Evaluation run CRUD `backend/app/crud/evaluations/core.py`	Import `EvaluationType`; set created evaluation runs to `TEXT`; add `type == TEXT` filters to get-by-id and list queries.
Dataset tests `backend/app/tests/crud/evaluations/test_dataset.py`	Add imports for `EvaluationDataset` and `EvaluationType`; assert created datasets default to `TEXT`; add tests ensuring non-`TEXT` (e.g., `STT`) datasets are excluded from get/list operations.
Run tests `backend/app/tests/crud/evaluations/test_core.py`	New test module: verify creation, retrieval, and listing of evaluation runs with defaults to `TEXT`, and that non-`TEXT` runs are excluded from queries.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Kaapi v1.0: Enhancing the test suite #488 — touches evaluation dataset/run type handling and tests; likely related changes to EvaluationType usage and filtering.

Suggested labels

ready-for-review

Suggested reviewers

Prajna1999

Poem

🐰 I hop through code where types align,
I mark new records as TEXT by design,
I hide the STT friends from lists I keep,
Tests nod along as they pass and leap,
A little rabbit dance — tidy and fine.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Evaluation: Added type for dataset' directly relates to the main change, which adds type filtering for evaluation datasets across CRUD operations. It accurately summarizes the primary modification.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch enhancement/text-evals-dataset-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/app/crud/evaluations/dataset.py (1)
25-25: Consider moving EvaluationType to a neutral model module.

Importing EvaluationType from app.models.stt_evaluation in a generic evaluation dataset CRUD module introduces avoidable domain coupling. A shared location (e.g., app.models.evaluation) would keep boundaries cleaner.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/crud/evaluations/dataset.py` at line 25, The EvaluationType enum
is currently defined in app.models.stt_evaluation and should be moved to a
neutral model module (e.g., app.models.evaluation); extract the EvaluationType
definition into that new/shared module, export it there, then update the import
in dataset.py (and any other modules importing it) to import EvaluationType from
app.models.evaluation so the CRUD code no longer depends on the STT-specific
model.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/crud/evaluations/dataset.py`:
- Line 64: The CRUD functions in this module are hard-coding
type=EvaluationType.TEXT.value causing non-TEXT datasets to be ignored; change
the dataset CRUD functions (e.g., the create/read/list functions that currently
use type=EvaluationType.TEXT.value at the occurrences around the file) to accept
an optional dataset_type parameter (defaulting to EvaluationType.TEXT) and use
dataset_type.value when querying/creating, and update any internal calls to pass
through the correct EvaluationType (for example, have callers like
start_evaluation pass the incoming evaluation type or dataset.evaluation_type
instead of relying on the hard-coded TEXT); apply this change for each
occurrence noted (around the three lines mentioned) so STT/TTS datasets are
included while preserving TEXT as the default.

---

Nitpick comments:
In `@backend/app/crud/evaluations/dataset.py`:
- Line 25: The EvaluationType enum is currently defined in
app.models.stt_evaluation and should be moved to a neutral model module (e.g.,
app.models.evaluation); extract the EvaluationType definition into that
new/shared module, export it there, then update the import in dataset.py (and
any other modules importing it) to import EvaluationType from
app.models.evaluation so the CRUD code no longer depends on the STT-specific
model.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79237a1b-93a2-4c81-b535-7638d281f042

📥 Commits

Reviewing files that changed from the base of the PR and between a0c1f24 and fdd37e0.

📒 Files selected for processing (1)

backend/app/crud/evaluations/dataset.py

coderabbitai · 2026-03-05T10:41:36Z

backend/app/crud/evaluations/dataset.py

        dataset = EvaluationDataset(
            name=name,
            description=description,
+            type=EvaluationType.TEXT.value,


⚠️ Potential issue | 🟠 Major

Hard-coded TEXT scope makes generic dataset CRUD return false “not found” for valid non-TEXT datasets.

These changes force dataset creation and all reads/lists to TEXT only. Given EvaluationType includes STT and TTS (backend/app/models/stt_evaluation.py:21-26), this module now silently excludes those datasets and can surface misleading 404s in callers like start_evaluation (backend/app/services/evaluations/evaluation.py:28-85).

Suggested fix (parameterize dataset type, keep TEXT default)

def create_evaluation_dataset( session: Session, name: str, dataset_metadata: dict[str, Any], organization_id: int, project_id: int, description: str | None = None, object_store_url: str | None = None, langfuse_dataset_id: str | None = None, + evaluation_type: EvaluationType = EvaluationType.TEXT, ) -> EvaluationDataset: @@ dataset = EvaluationDataset( name=name, description=description, - type=EvaluationType.TEXT.value, + type=evaluation_type.value, dataset_metadata=dataset_metadata, @@ def get_dataset_by_id( - session: Session, dataset_id: int, organization_id: int, project_id: int + session: Session, + dataset_id: int, + organization_id: int, + project_id: int, + evaluation_type: EvaluationType = EvaluationType.TEXT, ) -> EvaluationDataset | None: @@ .where(EvaluationDataset.organization_id == organization_id) .where(EvaluationDataset.project_id == project_id) - .where(EvaluationDataset.type == EvaluationType.TEXT.value) + .where(EvaluationDataset.type == evaluation_type.value) @@ def get_dataset_by_name( - session: Session, name: str, organization_id: int, project_id: int + session: Session, + name: str, + organization_id: int, + project_id: int, + evaluation_type: EvaluationType = EvaluationType.TEXT, ) -> EvaluationDataset | None: @@ .where(EvaluationDataset.organization_id == organization_id) .where(EvaluationDataset.project_id == project_id) - .where(EvaluationDataset.type == EvaluationType.TEXT.value) + .where(EvaluationDataset.type == evaluation_type.value) @@ def list_datasets( session: Session, organization_id: int, project_id: int, limit: int = 50, offset: int = 0, + evaluation_type: EvaluationType = EvaluationType.TEXT, ) -> list[EvaluationDataset]: @@ .where(EvaluationDataset.organization_id == organization_id) .where(EvaluationDataset.project_id == project_id) - .where(EvaluationDataset.type == EvaluationType.TEXT.value) + .where(EvaluationDataset.type == evaluation_type.value)

Also applies to: 127-127, 164-164, 201-201

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/dataset.py` at line 64, The CRUD functions in this module are hard-coding type=EvaluationType.TEXT.value causing non-TEXT datasets to be ignored; change the dataset CRUD functions (e.g., the create/read/list functions that currently use type=EvaluationType.TEXT.value at the occurrences around the file) to accept an optional dataset_type parameter (defaulting to EvaluationType.TEXT) and use dataset_type.value when querying/creating, and update any internal calls to pass through the correct EvaluationType (for example, have callers like start_evaluation pass the incoming evaluation type or dataset.evaluation_type instead of relying on the hard-coded TEXT); apply this change for each occurrence noted (around the three lines mentioned) so STT/TTS datasets are included while preserving TEXT as the default.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/app/tests/crud/evaluations/test_dataset.py (1)
130-157: Well-structured test; db.add() is redundant.

The test correctly validates the type filtering behavior. However, calling db.add(dataset) at line 147 is unnecessary since the object is already tracked by the session after create_evaluation_dataset. The attribute mutation will be persisted on commit() regardless.
♻️ Suggested simplification
         # Manually update type to STT to simulate a non-text dataset
         dataset.type = EvaluationType.STT.value
-        db.add(dataset)
         db.commit()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/crud/evaluations/test_dataset.py` around lines 130 - 157,
Remove the redundant session add: in
test_get_dataset_by_id_excludes_non_text_type, after creating the dataset with
create_evaluation_dataset and mutating dataset.type = EvaluationType.STT.value,
do not call db.add(dataset) because the instance is already tracked; simply call
db.commit() to persist the change and leave the rest of the test (especially the
get_dataset_by_id assertion) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/tests/crud/evaluations/test_dataset.py`:
- Line 16: Remove the unused EvaluationDataset import from the import statement
that currently reads "from app.models import EvaluationDataset, Organization,
Project"; the tests use the create_evaluation_dataset() CRUD helper and only
reference "TestCreateEvaluationDataset" as a class name, so keep Organization
and Project but delete EvaluationDataset to avoid an unused import.

---

Nitpick comments:
In `@backend/app/tests/crud/evaluations/test_dataset.py`:
- Around line 130-157: Remove the redundant session add: in
test_get_dataset_by_id_excludes_non_text_type, after creating the dataset with
create_evaluation_dataset and mutating dataset.type = EvaluationType.STT.value,
do not call db.add(dataset) because the instance is already tracked; simply call
db.commit() to persist the change and leave the rest of the test (especially the
get_dataset_by_id assertion) unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 60e1103f-2327-416a-9501-96df8d80fd2f

📥 Commits

Reviewing files that changed from the base of the PR and between fdd37e0 and 35a9e4e.

📒 Files selected for processing (1)

backend/app/tests/crud/evaluations/test_dataset.py

coderabbitai · 2026-03-05T10:45:39Z

backend/app/tests/crud/evaluations/test_dataset.py

    upload_csv_to_object_store,
 )
-from app.models import Organization, Project
+from app.models import EvaluationDataset, Organization, Project


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if EvaluationDataset is directly used in the test file (beyond imports) rg -n 'EvaluationDataset' backend/app/tests/crud/evaluations/test_dataset.py | grep -v 'import'

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 112

Remove the unused EvaluationDataset import from line 16.

The EvaluationDataset model is imported but never directly used in the test file. The tests create datasets via the create_evaluation_dataset() CRUD function, which handles instantiation implicitly. The only occurrence of "EvaluationDataset" in the file is the test class name TestCreateEvaluationDataset, not a usage of the imported model.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/crud/evaluations/test_dataset.py` at line 16, Remove the unused EvaluationDataset import from the import statement that currently reads "from app.models import EvaluationDataset, Organization, Project"; the tests use the create_evaluation_dataset() CRUD helper and only reference "TestCreateEvaluationDataset" as a class name, so keep Organization and Project but delete EvaluationDataset to avoid an unused import.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/app/tests/crud/evaluations/test_core.py (1)
130-133: Use a collision-proof non-existent ID in not-found tests.

Using a hardcoded 99999 can become flaky in long-lived/shared test DBs; prefer a guaranteed-miss ID strategy (e.g., -1 if IDs are positive-only).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/tests/crud/evaluations/test_core.py` around lines 130 - 133,
Replace the hardcoded 99999 used in the not-found test for
get_evaluation_run_by_id with a collision-proof non-existent ID (e.g., use -1)
to guarantee a miss in databases where IDs are positive-only; update the test
invocation of get_evaluation_run_by_id(session=db, evaluation_id=...,
organization_id=org.id) to pass -1 (or otherwise compute a guaranteed-miss
value) so the test cannot collide with real records.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/tests/crud/evaluations/test_core.py`:
- Around line 16-42: The inline helper _create_config and repeated setup logic
should be replaced by reusable factory fixtures under backend/app/tests/ (e.g.,
ConfigFactory/ConfigVersionFactory, ProjectFactory, OrgFactory, DatasetFactory,
RunFactory); create factories that construct Config and ConfigVersion (mirroring
the fields set in _create_config and using now() for timestamps), register them
as pytest fixtures, and update tests in test_core.py (and other affected tests)
to use these fixtures instead of calling _create_config or manually creating
models; ensure factories return the same identifiers/objects (config.id and
config_version.version) or provide equivalent attributes so existing assertions
in the tests remain valid.

---

Nitpick comments:
In `@backend/app/tests/crud/evaluations/test_core.py`:
- Around line 130-133: Replace the hardcoded 99999 used in the not-found test
for get_evaluation_run_by_id with a collision-proof non-existent ID (e.g., use
-1) to guarantee a miss in databases where IDs are positive-only; update the
test invocation of get_evaluation_run_by_id(session=db, evaluation_id=...,
organization_id=org.id) to pass -1 (or otherwise compute a guaranteed-miss
value) so the test cannot collide with real records.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ffdeda3-927d-4509-aae5-bd30406b78ef

📥 Commits

Reviewing files that changed from the base of the PR and between 35a9e4e and 176b769.

📒 Files selected for processing (2)

backend/app/crud/evaluations/core.py
backend/app/tests/crud/evaluations/test_core.py

coderabbitai · 2026-03-05T11:13:03Z

backend/app/tests/crud/evaluations/test_core.py

+def _create_config(db: Session, project_id: int) -> tuple:
+    """Helper to create a config and config_version for evaluation runs."""
+    from app.models.config import Config, ConfigVersion
+
+    config = Config(
+        name="test_config",
+        project_id=project_id,
+        inserted_at=now(),
+        updated_at=now(),
+    )
+    db.add(config)
+    db.commit()
+    db.refresh(config)
+
+    config_version = ConfigVersion(
+        config_id=config.id,
+        version=1,
+        config_blob={"completion": {"params": {"model": "gpt-4o"}}},
+        inserted_at=now(),
+        updated_at=now(),
+    )
+    db.add(config_version)
+    db.commit()
+    db.refresh(config_version)
+
+    return config.id, config_version.version
+


🛠️ Refactor suggestion | 🟠 Major

Adopt factory fixtures instead of inline object construction in tests.

This module repeats setup logic (org/project/dataset/config/run) and uses an ad-hoc helper; please move these into reusable factory fixtures under backend/app/tests/ for consistency and maintainability.

♻️ Example direction

- def _create_config(db: Session, project_id: int) -> tuple: - ... + # backend/app/tests/factories/config_factory.py + def create_config_with_version(db: Session, project_id: int) -> tuple[int, int]: + ...

- config_id, config_version = _create_config(db, project.id) + config_id, config_version = config_factory.create_config_with_version(db, project.id)

As per coding guidelines, "backend/app/tests/**/*.py: Use factory pattern for test fixtures in backend/app/tests/."

Also applies to: 47-245

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/crud/evaluations/test_core.py` around lines 16 - 42, The inline helper _create_config and repeated setup logic should be replaced by reusable factory fixtures under backend/app/tests/ (e.g., ConfigFactory/ConfigVersionFactory, ProjectFactory, OrgFactory, DatasetFactory, RunFactory); create factories that construct Config and ConfigVersion (mirroring the fields set in _create_config and using now() for timestamps), register them as pytest fixtures, and update tests in test_core.py (and other affected tests) to use these fixtures instead of calling _create_config or manually creating models; ensure factories return the same identifiers/objects (config.id and config_version.version) or provide equivalent attributes so existing assertions in the tests remain valid.

codecov · 2026-03-05T11:15:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

added type text for evaluation dataset

fdd37e0

AkhileshNegi self-assigned this Mar 5, 2026

AkhileshNegi added the bug Something isn't working label Mar 5, 2026

AkhileshNegi marked this pull request as ready for review March 5, 2026 10:32

AkhileshNegi requested a review from nishika26 March 5, 2026 10:33

added testcases

35a9e4e

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

updated for evaluation run as well

9d5d663

nishika26 approved these changes Mar 5, 2026

View reviewed changes

updating testcases

176b769

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

AkhileshNegi merged commit 03a0568 into main Mar 5, 2026
3 checks passed

AkhileshNegi deleted the enhancement/text-evals-dataset-fix branch March 5, 2026 11:18

AkhileshNegi added this to Kaapi-dev Mar 5, 2026

github-project-automation bot moved this to Closed in Kaapi-dev Mar 5, 2026

nishika26 pushed a commit that referenced this pull request Mar 10, 2026

Evaluation: Added type for dataset (#641)

1a7b1fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Added type for dataset#641

Evaluation: Added type for dataset#641
AkhileshNegi merged 4 commits intomainfrom
enhancement/text-evals-dataset-fix

AkhileshNegi commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 5, 2026

Uh oh!

codecov bot commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AkhileshNegi commented Mar 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 5, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AkhileshNegi commented Mar 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading