Conversation
📝 WalkthroughWalkthroughAdds type-based filtering for evaluation datasets and runs: created records default to EvaluationType.TEXT, and get/list operations now restrict results to type == TEXT across dataset and run CRUD paths. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/crud/evaluations/dataset.py (1)
25-25: Consider movingEvaluationTypeto a neutral model module.Importing
EvaluationTypefromapp.models.stt_evaluationin a generic evaluation dataset CRUD module introduces avoidable domain coupling. A shared location (e.g.,app.models.evaluation) would keep boundaries cleaner.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/crud/evaluations/dataset.py` at line 25, The EvaluationType enum is currently defined in app.models.stt_evaluation and should be moved to a neutral model module (e.g., app.models.evaluation); extract the EvaluationType definition into that new/shared module, export it there, then update the import in dataset.py (and any other modules importing it) to import EvaluationType from app.models.evaluation so the CRUD code no longer depends on the STT-specific model.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/crud/evaluations/dataset.py`:
- Line 64: The CRUD functions in this module are hard-coding
type=EvaluationType.TEXT.value causing non-TEXT datasets to be ignored; change
the dataset CRUD functions (e.g., the create/read/list functions that currently
use type=EvaluationType.TEXT.value at the occurrences around the file) to accept
an optional dataset_type parameter (defaulting to EvaluationType.TEXT) and use
dataset_type.value when querying/creating, and update any internal calls to pass
through the correct EvaluationType (for example, have callers like
start_evaluation pass the incoming evaluation type or dataset.evaluation_type
instead of relying on the hard-coded TEXT); apply this change for each
occurrence noted (around the three lines mentioned) so STT/TTS datasets are
included while preserving TEXT as the default.
---
Nitpick comments:
In `@backend/app/crud/evaluations/dataset.py`:
- Line 25: The EvaluationType enum is currently defined in
app.models.stt_evaluation and should be moved to a neutral model module (e.g.,
app.models.evaluation); extract the EvaluationType definition into that
new/shared module, export it there, then update the import in dataset.py (and
any other modules importing it) to import EvaluationType from
app.models.evaluation so the CRUD code no longer depends on the STT-specific
model.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79237a1b-93a2-4c81-b535-7638d281f042
📒 Files selected for processing (1)
backend/app/crud/evaluations/dataset.py
| dataset = EvaluationDataset( | ||
| name=name, | ||
| description=description, | ||
| type=EvaluationType.TEXT.value, |
There was a problem hiding this comment.
Hard-coded TEXT scope makes generic dataset CRUD return false “not found” for valid non-TEXT datasets.
These changes force dataset creation and all reads/lists to TEXT only. Given EvaluationType includes STT and TTS (backend/app/models/stt_evaluation.py:21-26), this module now silently excludes those datasets and can surface misleading 404s in callers like start_evaluation (backend/app/services/evaluations/evaluation.py:28-85).
Suggested fix (parameterize dataset type, keep TEXT default)
def create_evaluation_dataset(
session: Session,
name: str,
dataset_metadata: dict[str, Any],
organization_id: int,
project_id: int,
description: str | None = None,
object_store_url: str | None = None,
langfuse_dataset_id: str | None = None,
+ evaluation_type: EvaluationType = EvaluationType.TEXT,
) -> EvaluationDataset:
@@
dataset = EvaluationDataset(
name=name,
description=description,
- type=EvaluationType.TEXT.value,
+ type=evaluation_type.value,
dataset_metadata=dataset_metadata,
@@
def get_dataset_by_id(
- session: Session, dataset_id: int, organization_id: int, project_id: int
+ session: Session,
+ dataset_id: int,
+ organization_id: int,
+ project_id: int,
+ evaluation_type: EvaluationType = EvaluationType.TEXT,
) -> EvaluationDataset | None:
@@
.where(EvaluationDataset.organization_id == organization_id)
.where(EvaluationDataset.project_id == project_id)
- .where(EvaluationDataset.type == EvaluationType.TEXT.value)
+ .where(EvaluationDataset.type == evaluation_type.value)
@@
def get_dataset_by_name(
- session: Session, name: str, organization_id: int, project_id: int
+ session: Session,
+ name: str,
+ organization_id: int,
+ project_id: int,
+ evaluation_type: EvaluationType = EvaluationType.TEXT,
) -> EvaluationDataset | None:
@@
.where(EvaluationDataset.organization_id == organization_id)
.where(EvaluationDataset.project_id == project_id)
- .where(EvaluationDataset.type == EvaluationType.TEXT.value)
+ .where(EvaluationDataset.type == evaluation_type.value)
@@
def list_datasets(
session: Session,
organization_id: int,
project_id: int,
limit: int = 50,
offset: int = 0,
+ evaluation_type: EvaluationType = EvaluationType.TEXT,
) -> list[EvaluationDataset]:
@@
.where(EvaluationDataset.organization_id == organization_id)
.where(EvaluationDataset.project_id == project_id)
- .where(EvaluationDataset.type == EvaluationType.TEXT.value)
+ .where(EvaluationDataset.type == evaluation_type.value)Also applies to: 127-127, 164-164, 201-201
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/crud/evaluations/dataset.py` at line 64, The CRUD functions in
this module are hard-coding type=EvaluationType.TEXT.value causing non-TEXT
datasets to be ignored; change the dataset CRUD functions (e.g., the
create/read/list functions that currently use type=EvaluationType.TEXT.value at
the occurrences around the file) to accept an optional dataset_type parameter
(defaulting to EvaluationType.TEXT) and use dataset_type.value when
querying/creating, and update any internal calls to pass through the correct
EvaluationType (for example, have callers like start_evaluation pass the
incoming evaluation type or dataset.evaluation_type instead of relying on the
hard-coded TEXT); apply this change for each occurrence noted (around the three
lines mentioned) so STT/TTS datasets are included while preserving TEXT as the
default.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/tests/crud/evaluations/test_dataset.py (1)
130-157: Well-structured test;db.add()is redundant.The test correctly validates the type filtering behavior. However, calling
db.add(dataset)at line 147 is unnecessary since the object is already tracked by the session aftercreate_evaluation_dataset. The attribute mutation will be persisted oncommit()regardless.♻️ Suggested simplification
# Manually update type to STT to simulate a non-text dataset dataset.type = EvaluationType.STT.value - db.add(dataset) db.commit()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/crud/evaluations/test_dataset.py` around lines 130 - 157, Remove the redundant session add: in test_get_dataset_by_id_excludes_non_text_type, after creating the dataset with create_evaluation_dataset and mutating dataset.type = EvaluationType.STT.value, do not call db.add(dataset) because the instance is already tracked; simply call db.commit() to persist the change and leave the rest of the test (especially the get_dataset_by_id assertion) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/tests/crud/evaluations/test_dataset.py`:
- Line 16: Remove the unused EvaluationDataset import from the import statement
that currently reads "from app.models import EvaluationDataset, Organization,
Project"; the tests use the create_evaluation_dataset() CRUD helper and only
reference "TestCreateEvaluationDataset" as a class name, so keep Organization
and Project but delete EvaluationDataset to avoid an unused import.
---
Nitpick comments:
In `@backend/app/tests/crud/evaluations/test_dataset.py`:
- Around line 130-157: Remove the redundant session add: in
test_get_dataset_by_id_excludes_non_text_type, after creating the dataset with
create_evaluation_dataset and mutating dataset.type = EvaluationType.STT.value,
do not call db.add(dataset) because the instance is already tracked; simply call
db.commit() to persist the change and leave the rest of the test (especially the
get_dataset_by_id assertion) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 60e1103f-2327-416a-9501-96df8d80fd2f
📒 Files selected for processing (1)
backend/app/tests/crud/evaluations/test_dataset.py
| upload_csv_to_object_store, | ||
| ) | ||
| from app.models import Organization, Project | ||
| from app.models import EvaluationDataset, Organization, Project |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if EvaluationDataset is directly used in the test file (beyond imports)
rg -n 'EvaluationDataset' backend/app/tests/crud/evaluations/test_dataset.py | grep -v 'import'Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 112
Remove the unused EvaluationDataset import from line 16.
The EvaluationDataset model is imported but never directly used in the test file. The tests create datasets via the create_evaluation_dataset() CRUD function, which handles instantiation implicitly. The only occurrence of "EvaluationDataset" in the file is the test class name TestCreateEvaluationDataset, not a usage of the imported model.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/tests/crud/evaluations/test_dataset.py` at line 16, Remove the
unused EvaluationDataset import from the import statement that currently reads
"from app.models import EvaluationDataset, Organization, Project"; the tests use
the create_evaluation_dataset() CRUD helper and only reference
"TestCreateEvaluationDataset" as a class name, so keep Organization and Project
but delete EvaluationDataset to avoid an unused import.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/tests/crud/evaluations/test_core.py (1)
130-133: Use a collision-proof non-existent ID in not-found tests.Using a hardcoded
99999can become flaky in long-lived/shared test DBs; prefer a guaranteed-miss ID strategy (e.g.,-1if IDs are positive-only).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/tests/crud/evaluations/test_core.py` around lines 130 - 133, Replace the hardcoded 99999 used in the not-found test for get_evaluation_run_by_id with a collision-proof non-existent ID (e.g., use -1) to guarantee a miss in databases where IDs are positive-only; update the test invocation of get_evaluation_run_by_id(session=db, evaluation_id=..., organization_id=org.id) to pass -1 (or otherwise compute a guaranteed-miss value) so the test cannot collide with real records.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/tests/crud/evaluations/test_core.py`:
- Around line 16-42: The inline helper _create_config and repeated setup logic
should be replaced by reusable factory fixtures under backend/app/tests/ (e.g.,
ConfigFactory/ConfigVersionFactory, ProjectFactory, OrgFactory, DatasetFactory,
RunFactory); create factories that construct Config and ConfigVersion (mirroring
the fields set in _create_config and using now() for timestamps), register them
as pytest fixtures, and update tests in test_core.py (and other affected tests)
to use these fixtures instead of calling _create_config or manually creating
models; ensure factories return the same identifiers/objects (config.id and
config_version.version) or provide equivalent attributes so existing assertions
in the tests remain valid.
---
Nitpick comments:
In `@backend/app/tests/crud/evaluations/test_core.py`:
- Around line 130-133: Replace the hardcoded 99999 used in the not-found test
for get_evaluation_run_by_id with a collision-proof non-existent ID (e.g., use
-1) to guarantee a miss in databases where IDs are positive-only; update the
test invocation of get_evaluation_run_by_id(session=db, evaluation_id=...,
organization_id=org.id) to pass -1 (or otherwise compute a guaranteed-miss
value) so the test cannot collide with real records.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9ffdeda3-927d-4509-aae5-bd30406b78ef
📒 Files selected for processing (2)
backend/app/crud/evaluations/core.pybackend/app/tests/crud/evaluations/test_core.py
| def _create_config(db: Session, project_id: int) -> tuple: | ||
| """Helper to create a config and config_version for evaluation runs.""" | ||
| from app.models.config import Config, ConfigVersion | ||
|
|
||
| config = Config( | ||
| name="test_config", | ||
| project_id=project_id, | ||
| inserted_at=now(), | ||
| updated_at=now(), | ||
| ) | ||
| db.add(config) | ||
| db.commit() | ||
| db.refresh(config) | ||
|
|
||
| config_version = ConfigVersion( | ||
| config_id=config.id, | ||
| version=1, | ||
| config_blob={"completion": {"params": {"model": "gpt-4o"}}}, | ||
| inserted_at=now(), | ||
| updated_at=now(), | ||
| ) | ||
| db.add(config_version) | ||
| db.commit() | ||
| db.refresh(config_version) | ||
|
|
||
| return config.id, config_version.version | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Adopt factory fixtures instead of inline object construction in tests.
This module repeats setup logic (org/project/dataset/config/run) and uses an ad-hoc helper; please move these into reusable factory fixtures under backend/app/tests/ for consistency and maintainability.
♻️ Example direction
- def _create_config(db: Session, project_id: int) -> tuple:
- ...
+ # backend/app/tests/factories/config_factory.py
+ def create_config_with_version(db: Session, project_id: int) -> tuple[int, int]:
+ ...- config_id, config_version = _create_config(db, project.id)
+ config_id, config_version = config_factory.create_config_with_version(db, project.id)As per coding guidelines, "backend/app/tests/**/*.py: Use factory pattern for test fixtures in backend/app/tests/."
Also applies to: 47-245
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/app/tests/crud/evaluations/test_core.py` around lines 16 - 42, The
inline helper _create_config and repeated setup logic should be replaced by
reusable factory fixtures under backend/app/tests/ (e.g.,
ConfigFactory/ConfigVersionFactory, ProjectFactory, OrgFactory, DatasetFactory,
RunFactory); create factories that construct Config and ConfigVersion (mirroring
the fields set in _create_config and using now() for timestamps), register them
as pytest fixtures, and update tests in test_core.py (and other affected tests)
to use these fixtures instead of calling _create_config or manually creating
models; ensure factories return the same identifiers/objects (config.id and
config_version.version) or provide equivalent attributes so existing assertions
in the tests remain valid.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
Target issue is ProjectTech4DevAI/kaapi-frontend#52 (review)
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
New Features
Tests
Summary by CodeRabbit