-
Notifications
You must be signed in to change notification settings - Fork 7
Evaluation #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation #405
Conversation
- Remove project_id parameter from /evaluate endpoint - Update get_provider_credential calls to not require project_id - Credentials now retrieved via API key authentication - Clean up logging configuration and imports - Fix linting errors and update type annotations
WalkthroughAdds a full evaluation subsystem: dataset CSV upload/management, evaluation run lifecycle, batch provider abstractions with OpenAI batch integration, embedding-based scoring and Langfuse integration, cron-driven polling, DB models/migration, storage utilities, tests, docs, and small auth/endpoint adjustments. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API as Evaluation API
participant DB as Database
participant Storage as Object Store
participant Langfuse
participant OpenAI as OpenAI Batch
participant Cron as Cron Worker
User->>API: POST /upload_dataset (CSV)
API->>Storage: (optional) upload CSV
API->>Langfuse: create dataset & upload items
API->>DB: store EvaluationDataset
API-->>User: DatasetUploadResponse
User->>API: POST /evaluations (dataset_id, config)
API->>DB: create EvaluationRun (pending)
API->>Langfuse: fetch dataset items
API->>API: build JSONL
API->>OpenAI: submit batch job
API->>DB: update run with batch_job_id
API-->>User: EvaluationRunPublic
Cron->>API: GET /cron/evaluations
API->>DB: query pending EvaluationRuns
loop per pending run
API->>OpenAI: poll batch status
alt complete
API->>OpenAI: download results
API->>Storage: (optional) upload results
API->>Langfuse: create traces
API->>OpenAI: start embedding batch
API->>DB: update run status/metrics
else not complete
API->>DB: no change
end
end
API-->>Cron: summary
sequenceDiagram
participant Embedding as Embedding Batch
participant OpenAI
participant Scorer
participant Langfuse
participant DB
Embedding->>OpenAI: poll embedding batch status
alt complete
Embedding->>OpenAI: download embedding results
Embedding->>Scorer: parse embeddings & pairings
Scorer->>Scorer: compute cosine similarities (per-item)
Scorer->>Scorer: aggregate mean/std
Scorer->>Langfuse: update traces with scores
Scorer->>DB: update EvaluationRun with scores
else failed
Embedding->>DB: mark run completed with error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (2)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
backend/app/api/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📚 Learning: 2025-10-08T12:05:01.317ZApplied to files:
📚 Learning: 2025-10-08T12:05:01.317ZApplied to files:
🧬 Code graph analysis (1)backend/app/api/routes/evaluation.py (9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
backend/app/models/batch_job.py (1)
75-80: Resolve forward references using TYPE_CHECKING guard.The forward references to
OrganizationandProjectstill trigger F821 lint errors despite the# noqacomments. As previously suggested, import these types under aTYPE_CHECKINGguard to satisfy static analyzers without causing runtime circular imports.Apply this diff:
-from typing import Any, Optional +from typing import TYPE_CHECKING, Any, Optional +if TYPE_CHECKING: + from app.models.organization import Organization + from app.models.project import Project - organization: Optional["Organization"] = Relationship( # noqa: F821 + organization: Optional["Organization"] = Relationship( back_populates="batch_jobs" ) - project: Optional["Project"] = Relationship( - back_populates="batch_jobs" - ) # noqa: F821 + project: Optional["Project"] = Relationship(back_populates="batch_jobs")backend/app/models/evaluation.py (2)
119-124: Resolve forward references using TYPE_CHECKING guard.The forward references to
ProjectandOrganizationtrigger F821 lint errors. Import these types under aTYPE_CHECKINGguard to satisfy static analyzers.Apply this diff:
-from typing import Any, Optional +from typing import TYPE_CHECKING, Any, Optional +if TYPE_CHECKING: + from app.models.batch_job import BatchJob + from app.models.organization import Organization + from app.models.project import Project - project: "Project" = Relationship( - back_populates="evaluation_datasets" - ) # noqa: F821 - organization: "Organization" = Relationship( - back_populates="evaluation_datasets" - ) # noqa: F821 + project: "Project" = Relationship(back_populates="evaluation_datasets") + organization: "Organization" = Relationship(back_populates="evaluation_datasets")
206-213: Use SQLField for timestamp columns to ensure persistence.The
inserted_atandupdated_atfields are declared with PydanticField, which means SQLModel won't generate actual database columns. This breaks persistence for these timestamps.Apply this diff:
# Timestamps - inserted_at: datetime = Field( - default_factory=now, - description="The timestamp when the evaluation run was started", - ) - updated_at: datetime = Field( - default_factory=now, - description="The timestamp when the evaluation run was last updated", - ) + inserted_at: datetime = SQLField( + default_factory=now, + nullable=False, + description="The timestamp when the evaluation run was started", + ) + updated_at: datetime = SQLField( + default_factory=now, + nullable=False, + description="The timestamp when the evaluation run was last updated", + )backend/app/api/routes/evaluation.py (1)
634-662: Preserve caller's config when applying assistant defaults.The current implementation replaces the entire
configdictionary, which discards any additional keys provided by the caller (e.g.,include,tools, custom parameters). This prevents legitimate API use cases where callers want to override or extend assistant configuration.Merge assistant defaults into the caller's config instead:
- config = { - "model": config.get("model", assistant.model), - "instructions": config.get("instructions", assistant.instructions), - "temperature": config.get("temperature", assistant.temperature), - } + # Preserve caller's config and apply assistant defaults where not provided + if not config.get("model"): + config["model"] = assistant.model + if not config.get("instructions"): + config["instructions"] = assistant.instructions + if "temperature" not in config and assistant.temperature is not None: + config["temperature"] = assistant.temperature # Add tools if vector stores are available vector_store_ids = config.get( "vector_store_ids", assistant.vector_store_ids or [] ) if vector_store_ids and len(vector_store_ids) > 0: - config["tools"] = [ - { - "type": "file_search", - "vector_store_ids": vector_store_ids, - } - ] + # Add file_search tool if not already present + if not any(tool.get("type") == "file_search" for tool in config.get("tools", [])): + config.setdefault("tools", []).append({ + "type": "file_search", + "vector_store_ids": vector_store_ids, + })backend/app/crud/evaluations/processing.py (1)
742-747: Classify embedding outcomes correctly.The action classification logic doesn't account for
"embeddings_completed"and"embeddings_failed"actions returned bycheck_and_process_evaluation(). These fall into theelsebranch and are incorrectly counted asstill_processing.Apply this diff:
- if result["action"] == "processed": + action = result["action"] + if action in {"processed", "embeddings_completed"}: total_processed_count += 1 - elif result["action"] == "failed": + elif action in {"failed", "embeddings_failed"}: total_failed_count += 1 else: total_still_processing_count += 1
🧹 Nitpick comments (1)
backend/app/api/routes/evaluation.py (1)
195-238: Consider validating CSV headers before full file read.Currently, the entire file is read at line 196, and then CSV headers are validated at lines 205-213. For better efficiency and security, consider validating the CSV headers first by reading only the first line, then read the entire file if validation passes. This prevents processing large invalid files unnecessarily.
Example approach:
# Read first line to validate headers file.file.seek(0) first_line = file.file.readline() csv_reader = csv.DictReader(io.StringIO(first_line.decode("utf-8"))) csv_reader.fieldnames = [name.strip() for name in csv_reader.fieldnames] if "question" not in csv_reader.fieldnames or "answer" not in csv_reader.fieldnames: raise HTTPException( status_code=422, detail=f"CSV must contain 'question' and 'answer' columns. " f"Found columns: {csv_reader.fieldnames}", ) # Now read the entire file file.file.seek(0) csv_content = await file.read()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/app/api/routes/evaluation.py(1 hunks)backend/app/celery/celery_app.py(1 hunks)backend/app/crud/evaluations/core.py(1 hunks)backend/app/crud/evaluations/processing.py(1 hunks)backend/app/models/batch_job.py(1 hunks)backend/app/models/evaluation.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/celery/celery_app.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/crud/evaluations/core.pybackend/app/api/routes/evaluation.pybackend/app/models/evaluation.pybackend/app/crud/evaluations/processing.pybackend/app/models/batch_job.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/evaluations/core.pybackend/app/crud/evaluations/processing.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/evaluation.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.pybackend/app/models/batch_job.py
🧠 Learnings (3)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/
Applied to files:
backend/app/crud/evaluations/core.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/
Applied to files:
backend/app/models/evaluation.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to **/*.py : Use type hints in Python code (Python 3.11+ project)
Applied to files:
backend/app/models/batch_job.py
🧬 Code graph analysis (3)
backend/app/crud/evaluations/core.py (5)
backend/app/core/util.py (1)
configure_langfuse(42-73)backend/app/crud/credentials.py (1)
get_provider_credential(121-159)backend/app/models/evaluation.py (2)
EvaluationRun(130-230)DatasetUploadResponse(19-38)backend/app/models/user.py (1)
UserProjectOrg(58-59)backend/app/core/langfuse/langfuse.py (1)
flush(108-109)
backend/app/api/routes/evaluation.py (10)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/core/util.py (2)
configure_langfuse(42-73)configure_openai(76-95)backend/app/crud/assistants.py (1)
get_assistant_by_id(19-30)backend/app/crud/credentials.py (1)
get_provider_credential(121-159)backend/app/crud/evaluations/dataset.py (5)
create_evaluation_dataset(31-105)get_dataset_by_id(108-143)list_datasets(179-215)upload_csv_to_object_store(218-250)delete_dataset(325-387)backend/app/crud/evaluations/core.py (3)
create_evaluation_run(140-182)get_evaluation_run_by_id(224-262)list_evaluation_runs(185-221)backend/app/crud/evaluations/batch.py (1)
start_evaluation_batch(114-202)backend/app/core/storage_utils.py (1)
upload_csv_to_object_store(22-83)backend/app/crud/evaluations/langfuse.py (1)
upload_dataset_to_langfuse_from_csv(165-277)backend/app/models/evaluation.py (2)
DatasetUploadResponse(19-38)EvaluationRunPublic(247-265)
backend/app/crud/evaluations/processing.py (10)
backend/app/core/batch/openai_provider.py (1)
OpenAIBatchProvider(14-267)backend/app/core/util.py (2)
configure_langfuse(42-73)configure_openai(76-95)backend/app/crud/batch_job.py (1)
get_batch_job(56-70)backend/app/crud/batch_operations.py (2)
upload_batch_results_to_object_store(267-311)poll_batch_status(110-165)backend/app/crud/credentials.py (1)
get_provider_credential(121-159)backend/app/crud/evaluations/batch.py (1)
fetch_dataset_items(24-57)backend/app/crud/evaluations/core.py (1)
update_evaluation_run(265-312)backend/app/crud/evaluations/embeddings.py (3)
calculate_average_similarity(255-331)parse_embedding_results(132-218)start_embedding_batch(334-434)backend/app/crud/evaluations/langfuse.py (2)
create_langfuse_dataset_run(18-109)update_traces_with_cosine_scores(112-162)backend/app/models/evaluation.py (1)
EvaluationRun(130-230)
🪛 Ruff (0.14.3)
backend/app/models/evaluation.py
119-119: Undefined name Project
(F821)
122-122: Undefined name Organization
(F821)
217-217: Undefined name Organization
(F821)
backend/app/models/batch_job.py
78-78: Undefined name Project
(F821)
🔇 Additional comments (4)
backend/app/crud/evaluations/core.py (1)
1-311: LGTM! Well-structured CRUD operations.The evaluation core CRUD module is well-implemented with:
- Comprehensive error handling in
upload_dataset_to_langfuse- Proper CSV validation and parsing
- Good logging throughout
- Proper access control checks in query functions
- Clean separation of concerns
backend/app/api/routes/evaluation.py (2)
165-193: Good security validation.The file upload security checks are well-implemented:
- Extension validation against allowed list
- MIME type verification
- Size limit enforcement (1 MB)
- Empty file detection
339-385: Well-structured API endpoints.The dataset and evaluation run management endpoints are well-implemented with:
- Proper authentication and authorization via
AuthContextDep- Appropriate HTTP status codes
- Good error messages
- Pagination support where needed
- Clear documentation in docstrings
Also applies to: 388-429, 432-472, 702-770
backend/app/crud/evaluations/processing.py (1)
45-284: Well-architected batch processing orchestration.The evaluation processing module is well-designed with:
- Robust parsing logic that handles multiple output formats gracefully
- Comprehensive error handling throughout the processing pipeline
- Graceful degradation when optional steps fail (object store, Langfuse updates)
- Clear separation between evaluation and embedding batch processing
- Good logging with contextual prefixes for debugging
Also applies to: 287-409, 412-613
avirajsingh7
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few points to consider (now or later):
-
I see extensive usage of async. While async provides concurrency benefits, it should be used carefully. If a route is async but the internal function calls are blocking/synchronous, it can actually degrade performance.
Async is most useful when there are I/O-heavy operations (e.g., network calls, file access). -
There are docstrings throughout the codebase, which is good in principle. However, some of them describe very obvious behavior and end up reducing readability.
Docstrings are most valuable where behavior or intent is non-obvious — otherwise they add noise instead of clarity. -
Potential for OOP Structure (optional / future consideration)
There’s a heavy use of standalone functions. Over time, grouping related logic into classes could improve structure, reusability, and readability. Not necessary right now, but something to consider if this module grows. -
Langfuse SDK Version
Upgrading to newer versions of the Langfuse SDK might be challenging due to how usage is increase. -
Magic Strings & Hardcoded Dictionaries
Strings use in job_type, status, provider, etc., are directly used in multiple places.
This is okay for now, but can lead to errors or inconsistencies, can consider use of ENUM
Replacing repeated dictionaries like structures with Pydantic models or enums could improve maintainability and reduce duplication.
| auth_context: AuthContextDep, | ||
| limit: int = 50, | ||
| offset: int = 0, | ||
| ) -> list[DatasetUploadResponse]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all Evals routes,
instead of returning a dict all response should return valid SQLModel class, and should be in standard format wrapped in APIResponse type
backend/app/api/routes/evaluation.py
Outdated
| List all datasets for the current organization and project. | ||
|
|
||
| Args: | ||
| limit: Maximum number of datasets to return (default 50, max 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here to put limit you need to add something like this and can we use (skip offset) instead of limit, offset like all other endpoints
skip: int = Query(0, ge=0, description="Number of records to skip"),
limit: int = Query(100, ge=1, le=100, description="Maximum records to return"),
backend/app/api/routes/evaluation.py
Outdated
| ) | ||
|
|
||
|
|
||
| @router.delete("/evaluations/datasets/{dataset_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here there is no response model,
OpenAPI Doc will not get updated
backend/app/api/routes/evaluation.py
Outdated
| return eval_run | ||
|
|
||
|
|
||
| @router.get("/evaluations/list", response_model=list[EvaluationRunPublic]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AkhileshNegi We are fixing that endpoint you can remove list from this one
| async def get_evaluation_run_status( | ||
| evaluation_id: int, | ||
| _session: SessionDep, | ||
| auth_context: AuthContextDep, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To all the routes,
Can you add restriction like
dependencies=[Depends(require_permission(Permission.REQUIRE_PROJECT))],
Otherwise someone with username and passwrod can access the endpoint, and in that case project object will be null
Above dependecies ensure project object is not null
refer api_keys route
|
|
||
| This module provides common functions for uploading various file types | ||
| to cloud object storage, abstracting away provider-specific details. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please put this particular file in storage directory?
storage/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
backend/app/alembic/versions/d5747495bd7c_create_evaluation_run_table.py (2)
20-20: Add return type hint toupgrade()function.The function signature lacks a return type annotation, which violates the project's Python 3.11+ type hint requirements.
As per coding guidelines.
Apply this diff:
-def upgrade(): +def upgrade() -> None:
231-231: Add return type hint todowngrade()function.The function signature lacks a return type annotation, which violates the project's Python 3.11+ type hint requirements.
As per coding guidelines.
Apply this diff:
-def downgrade(): +def downgrade() -> None:backend/app/crud/evaluations/langfuse.py (1)
216-228: Guard against empty CSV to prevent TypeError.When
csv.DictReaderprocesses an empty file,fieldnamesisNone, which causes the list comprehension at line 218 to raiseTypeErrorbefore the validation at lines 221-228 can run. This turns a user error (empty file) into a 500 error.Apply this diff:
csv_text = csv_content.decode("utf-8") csv_reader = csv.DictReader(io.StringIO(csv_text)) - csv_reader.fieldnames = [name.strip() for name in csv_reader.fieldnames] + + fieldnames = csv_reader.fieldnames + if not fieldnames: + raise ValueError( + "CSV must contain 'question' and 'answer' columns. Found columns: []" + ) + csv_reader.fieldnames = [name.strip() for name in fieldnames] # Validate CSV headersbackend/app/models/evaluation.py (1)
207-214: Use SQLField for timestamp columns to ensure DB persistence.Lines 207-214 declare
inserted_atandupdated_atwith PydanticField, which means SQLModel won't generate actual database columns for these fields. This breaks timestamp persistence forEvaluationRun. Notice thatEvaluationDatasetcorrectly usesSQLFieldfor its timestamps at lines 120-121.Apply this diff to fix the issue:
# Timestamps - inserted_at: datetime = Field( - default_factory=now, - description="The timestamp when the evaluation run was started", - ) - updated_at: datetime = Field( - default_factory=now, - description="The timestamp when the evaluation run was last updated", - ) + inserted_at: datetime = SQLField( + default_factory=now, + nullable=False, + description="The timestamp when the evaluation run was started", + ) + updated_at: datetime = SQLField( + default_factory=now, + nullable=False, + description="The timestamp when the evaluation run was last updated", + )backend/app/api/routes/evaluation.py (1)
527-546: Merge assistant defaults into caller config instead of replacing it.Lines 529-533 create a new config dict with only
model,instructions, andtemperature, which discards all other keys from the caller-provided config. This means important parameters like custom evaluation settings or other configuration values are silently dropped. The pattern should preserve the caller's config and only set defaults for missing keys.Apply this diff to properly merge configurations:
- # Build config from assistant (use provided config values to override - # if present) - config = { - "model": config.get("model", assistant.model), - "instructions": config.get("instructions", assistant.instructions), - "temperature": config.get("temperature", assistant.temperature), - } + # Merge assistant defaults into caller's config (preserve all caller keys) + config.setdefault("model", assistant.model) + config.setdefault("instructions", assistant.instructions) + if "temperature" not in config and assistant.temperature is not None: + config["temperature"] = assistant.temperature # Add tools if vector stores are available - vector_store_ids = config.get( - "vector_store_ids", assistant.vector_store_ids or [] - ) + vector_store_ids = config.get("vector_store_ids") or assistant.vector_store_ids or [] if vector_store_ids and len(vector_store_ids) > 0: - config["tools"] = [ - { - "type": "file_search", - "vector_store_ids": vector_store_ids, - } - ] + if "tools" not in config: + config["tools"] = [] + # Only add file_search if not already present + if not any(t.get("type") == "file_search" for t in config["tools"]): + config["tools"].append({ + "type": "file_search", + "vector_store_ids": vector_store_ids, + })backend/app/crud/evaluations/processing.py (1)
763-768: Correctly classify embedding completion outcomes in summary counts.Lines 763-768 only check for
"processed"and"failed"actions, causing"embeddings_completed"and"embeddings_failed"actions (returned at lines 503 and 524) to fall into the else branch and be incorrectly counted asstill_processing. This breaks the polling summary statistics.Apply this diff to properly classify all outcomes:
- if result["action"] == "processed": + action = result["action"] + if action in {"processed", "embeddings_completed"}: total_processed_count += 1 - elif result["action"] == "failed": + elif action in {"failed", "embeddings_failed"}: total_failed_count += 1 else: total_still_processing_count += 1
🧹 Nitpick comments (1)
backend/app/core/batch/provider_interface.py (1)
11-13: Consider more specific return types if provider responses are consistent.While
dict[str, Any]provides flexibility for provider-specific metadata, if the core return structure is consistent across providers (e.g., all returnprovider_batch_id,provider_status, etc.), consider using TypedDict or a dataclass for better type safety and IDE support.Example:
from typing import TypedDict class BatchCreationResult(TypedDict): provider_batch_id: str provider_file_id: str provider_status: str total_items: intThis is optional and can be deferred if provider responses vary significantly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
backend/app/alembic/versions/d5747495bd7c_create_evaluation_run_table.py(1 hunks)backend/app/api/docs/evaluation/create_evaluation.md(1 hunks)backend/app/api/docs/evaluation/delete_dataset.md(1 hunks)backend/app/api/docs/evaluation/get_dataset.md(1 hunks)backend/app/api/docs/evaluation/get_evaluation.md(1 hunks)backend/app/api/docs/evaluation/list_datasets.md(1 hunks)backend/app/api/docs/evaluation/list_evaluations.md(1 hunks)backend/app/api/docs/evaluation/upload_dataset.md(1 hunks)backend/app/api/routes/cron.py(1 hunks)backend/app/api/routes/evaluation.py(1 hunks)backend/app/core/batch/openai_provider.py(1 hunks)backend/app/core/batch/provider_interface.py(1 hunks)backend/app/crud/batch_job.py(1 hunks)backend/app/crud/batch_operations.py(1 hunks)backend/app/crud/evaluations/batch.py(1 hunks)backend/app/crud/evaluations/dataset.py(1 hunks)backend/app/crud/evaluations/langfuse.py(1 hunks)backend/app/crud/evaluations/processing.py(1 hunks)backend/app/models/batch_job.py(1 hunks)backend/app/models/evaluation.py(1 hunks)backend/app/models/organization.py(1 hunks)backend/app/models/project.py(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- backend/app/api/docs/evaluation/list_datasets.md
- backend/app/api/docs/evaluation/upload_dataset.md
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/api/routes/cron.py
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/alembic/versions/d5747495bd7c_create_evaluation_run_table.pybackend/app/models/evaluation.pybackend/app/core/batch/openai_provider.pybackend/app/crud/batch_operations.pybackend/app/crud/evaluations/langfuse.pybackend/app/core/batch/provider_interface.pybackend/app/crud/evaluations/batch.pybackend/app/models/organization.pybackend/app/api/routes/evaluation.pybackend/app/crud/batch_job.pybackend/app/models/project.pybackend/app/models/batch_job.pybackend/app/crud/evaluations/dataset.pybackend/app/crud/evaluations/processing.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.pybackend/app/models/organization.pybackend/app/models/project.pybackend/app/models/batch_job.py
backend/app/core/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/
Files:
backend/app/core/batch/openai_provider.pybackend/app/core/batch/provider_interface.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/batch_operations.pybackend/app/crud/evaluations/langfuse.pybackend/app/crud/evaluations/batch.pybackend/app/crud/batch_job.pybackend/app/crud/evaluations/dataset.pybackend/app/crud/evaluations/processing.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/evaluation.py
🧠 Learnings (5)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to **/*.py : Use type hints in Python code (Python 3.11+ project)
Applied to files:
backend/app/alembic/versions/d5747495bd7c_create_evaluation_run_table.pybackend/app/models/project.pybackend/app/models/batch_job.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/
Applied to files:
backend/app/models/evaluation.pybackend/app/api/routes/evaluation.pybackend/app/models/batch_job.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/
Applied to files:
backend/app/crud/evaluations/langfuse.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/services/response/**/*.py : Put OpenAI Responses API integration, conversation management, and job execution in backend/app/services/response/
Applied to files:
backend/app/api/routes/evaluation.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/crud/**/*.py : Implement database access operations in backend/app/crud/
Applied to files:
backend/app/crud/batch_job.py
🧬 Code graph analysis (13)
backend/app/models/evaluation.py (3)
backend/app/models/batch_job.py (1)
BatchJob(15-80)backend/app/models/organization.py (1)
Organization(34-54)backend/app/models/project.py (1)
Project(38-66)
backend/app/core/batch/openai_provider.py (2)
backend/app/core/batch/provider_interface.py (6)
BatchProvider(7-105)create_batch(11-32)upload_file(75-89)get_batch_status(35-52)download_batch_results(55-72)download_file(92-105)backend/app/crud/batch_operations.py (1)
download_batch_results(174-215)
backend/app/crud/batch_operations.py (6)
backend/app/core/batch/provider_interface.py (4)
BatchProvider(7-105)create_batch(11-32)get_batch_status(35-52)download_batch_results(55-72)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/core/storage_utils.py (1)
upload_jsonl_to_object_store(86-152)backend/app/crud/batch_job.py (2)
create_batch_job(13-56)update_batch_job(76-120)backend/app/models/batch_job.py (3)
BatchJob(15-80)BatchJobCreate(83-97)BatchJobUpdate(100-109)backend/app/core/batch/openai_provider.py (3)
create_batch(26-87)get_batch_status(89-141)download_batch_results(143-191)
backend/app/crud/evaluations/langfuse.py (2)
backend/app/api/routes/evaluation.py (1)
get_dataset(357-388)backend/app/core/langfuse/langfuse.py (1)
flush(108-109)
backend/app/core/batch/provider_interface.py (2)
backend/app/core/batch/openai_provider.py (5)
create_batch(26-87)get_batch_status(89-141)download_batch_results(143-191)upload_file(193-223)download_file(225-254)backend/app/crud/batch_operations.py (1)
download_batch_results(174-215)
backend/app/crud/evaluations/batch.py (3)
backend/app/core/batch/openai_provider.py (1)
OpenAIBatchProvider(14-254)backend/app/crud/batch_operations.py (1)
start_batch_job(20-111)backend/app/models/evaluation.py (1)
EvaluationRun(131-229)
backend/app/models/organization.py (5)
backend/app/models/collection.py (2)
now(49-51)Collection(13-36)backend/app/models/assistants.py (1)
Assistant(33-44)backend/app/models/credentials.py (1)
Credential(48-97)backend/app/models/openai_conversation.py (1)
OpenAIConversation(58-69)backend/app/models/project.py (1)
Project(38-66)
backend/app/api/routes/evaluation.py (11)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/core/util.py (2)
configure_langfuse(42-73)configure_openai(76-95)backend/app/crud/assistants.py (1)
get_assistant_by_id(19-30)backend/app/crud/credentials.py (1)
get_provider_credential(121-159)backend/app/crud/evaluations/dataset.py (5)
create_evaluation_dataset(31-104)get_dataset_by_id(107-140)list_datasets(175-210)upload_csv_to_object_store(213-245)delete_dataset(323-387)backend/app/crud/evaluations/core.py (3)
create_evaluation_run(140-182)get_evaluation_run_by_id(224-262)list_evaluation_runs(185-221)backend/app/crud/evaluations/batch.py (1)
start_evaluation_batch(118-212)backend/app/core/storage_utils.py (1)
upload_csv_to_object_store(22-83)backend/app/crud/evaluations/langfuse.py (1)
upload_dataset_to_langfuse_from_csv(181-297)backend/app/models/evaluation.py (2)
DatasetUploadResponse(24-43)EvaluationRunPublic(246-264)backend/app/utils.py (1)
load_description(248-253)
backend/app/crud/batch_job.py (1)
backend/app/models/batch_job.py (3)
BatchJob(15-80)BatchJobCreate(83-97)BatchJobUpdate(100-109)
backend/app/models/project.py (6)
backend/app/models/collection.py (2)
now(49-51)Collection(13-36)backend/app/models/assistants.py (1)
Assistant(33-44)backend/app/models/credentials.py (1)
Credential(48-97)backend/app/models/fine_tuning.py (1)
Fine_Tuning(55-90)backend/app/models/openai_conversation.py (1)
OpenAIConversation(58-69)backend/app/models/organization.py (1)
Organization(34-54)
backend/app/models/batch_job.py (2)
backend/app/models/organization.py (1)
Organization(34-54)backend/app/models/project.py (1)
Project(38-66)
backend/app/crud/evaluations/dataset.py (3)
backend/app/core/cloud/storage.py (1)
CloudStorage(114-142)backend/app/core/storage_utils.py (1)
generate_timestamped_filename(155-167)backend/app/models/evaluation.py (2)
EvaluationDataset(72-128)EvaluationRun(131-229)
backend/app/crud/evaluations/processing.py (10)
backend/app/core/batch/openai_provider.py (2)
OpenAIBatchProvider(14-254)download_batch_results(143-191)backend/app/core/util.py (2)
configure_langfuse(42-73)configure_openai(76-95)backend/app/crud/batch_job.py (1)
get_batch_job(59-73)backend/app/crud/batch_operations.py (3)
download_batch_results(174-215)upload_batch_results_to_object_store(283-330)poll_batch_status(114-171)backend/app/crud/credentials.py (1)
get_provider_credential(121-159)backend/app/crud/evaluations/batch.py (1)
fetch_dataset_items(24-59)backend/app/crud/evaluations/core.py (1)
update_evaluation_run(265-312)backend/app/crud/evaluations/embeddings.py (3)
calculate_average_similarity(255-331)parse_embedding_results(132-218)start_embedding_batch(334-434)backend/app/crud/evaluations/langfuse.py (2)
create_langfuse_dataset_run(18-122)update_traces_with_cosine_scores(125-178)backend/app/models/evaluation.py (1)
EvaluationRun(131-229)
🔇 Additional comments (10)
backend/app/models/project.py (1)
9-16: LGTM! Correct use of TYPE_CHECKING for circular import avoidance.The TYPE_CHECKING block properly imports related model types for use in relationship annotations (lines 51-66) without causing circular import issues at runtime. This is the standard Python pattern for forward references in type hints.
backend/app/api/docs/evaluation/delete_dataset.md (1)
1-18: Documentation looks good.The delete dataset endpoint is well-documented with clear parameter descriptions, return values, and error responses. The note about CSV files remaining in object storage for audit purposes is a helpful detail.
backend/app/api/docs/evaluation/get_dataset.md (1)
1-22: LGTM! Comprehensive endpoint documentation.The documentation clearly describes the dataset retrieval endpoint, including all response fields and error cases. The inclusion of Langfuse integration details and object store URL is helpful for users.
backend/app/api/docs/evaluation/get_evaluation.md (1)
26-28: Good usage guidance for async polling.The documentation helpfully explains that evaluation is processed asynchronously and users should poll this endpoint to check status. The 60-second Celery Beat interval is a useful implementation detail.
backend/app/api/docs/evaluation/list_evaluations.md (1)
1-25: LGTM! Clear pagination and return structure.The documentation clearly describes the list endpoint with pagination parameters and comprehensive field descriptions. The ordering (most recent first) is an important detail for users.
backend/app/models/organization.py (1)
8-13: LGTM! Proper TYPE_CHECKING pattern.The TYPE_CHECKING block correctly imports model types used in relationship annotations (lines 40-54) while avoiding circular import issues at runtime. All imported types are actively used in the forward references.
backend/app/api/docs/evaluation/create_evaluation.md (2)
14-27: Request for clarification on config vs assistant_id interaction.The documentation shows that
configis optional andassistant_idis optional, with an example whereconfig: {}when usingassistant_id. However, it's not entirely clear what happens if both are provided with conflicting values. Doesassistant_idoverrideconfig, or do they merge?Consider adding a note clarifying the precedence/merge behavior when both
configandassistant_idare provided.
1-12: Excellent end-to-end flow documentation.The step-by-step flow description (lines 3-9) and async processing notes are very helpful for developers integrating this API. The explanation of Celery Beat processing and polling guidance provides clear expectations.
backend/app/core/batch/provider_interface.py (2)
7-32: LGTM! Clean abstract interface.The
BatchProviderabstract base class provides a well-documented interface for batch processing providers. Thecreate_batchmethod signature and return structure are clearly specified.
1-105: Inconsistency between AI summary and implementation.The AI summary claims this interface defines 6 abstract methods including
cancel_batch, but the actual implementation only contains 5 methods:create_batch,get_batch_status,download_batch_results,upload_file, anddownload_file. Thecancel_batchmethod is not present.Verify whether a
cancel_batchmethod is needed for the batch provider interface. If batch cancellation is a required operation, it should be added to the interface. If not, the summary should be updated.
| def __init__(self, client: OpenAI): | ||
| """ | ||
| Initialize the OpenAI batch provider. | ||
|
|
||
| Args: | ||
| client: Configured OpenAI client | ||
| """ | ||
| self.client = client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type hint to __init__() method.
The method signature is missing a return type annotation, which violates the project's Python 3.11+ type hint requirements.
As per coding guidelines.
Apply this diff:
- def __init__(self, client: OpenAI):
+ def __init__(self, client: OpenAI) -> None:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__(self, client: OpenAI): | |
| """ | |
| Initialize the OpenAI batch provider. | |
| Args: | |
| client: Configured OpenAI client | |
| """ | |
| self.client = client | |
| def __init__(self, client: OpenAI) -> None: | |
| """ | |
| Initialize the OpenAI batch provider. | |
| Args: | |
| client: Configured OpenAI client | |
| """ | |
| self.client = client |
🤖 Prompt for AI Agents
In backend/app/core/batch/openai_provider.py around lines 17 to 24, the __init__
method is missing a return type annotation; update the signature to include an
explicit "-> None" return type (def __init__(self, client: OpenAI) -> None:) so
it complies with the project's Python 3.11+ type hint requirements and keep the
body unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/models/__init__.py (1)
40-45: Move batch_job imports to maintain alphabetical ordering.The
batch_jobimports should be placed before thecollectionimports (line 9) to maintain the alphabetical ordering convention followed throughout this file.Apply this refactor to maintain consistency:
+from .batch_job import ( + BatchJob, + BatchJobCreate, + BatchJobPublic, + BatchJobUpdate, +) + from .collection import Collection, CollectionPublicAnd remove the current placement at lines 40-45.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
backend/app/api/main.py(2 hunks)backend/app/models/__init__.py(1 hunks)backend/pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/models/__init__.pybackend/app/api/main.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/__init__.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/main.py
🧬 Code graph analysis (1)
backend/app/models/__init__.py (2)
backend/app/models/batch_job.py (4)
BatchJob(15-80)BatchJobCreate(83-97)BatchJobPublic(112-129)BatchJobUpdate(100-109)backend/app/models/evaluation.py (6)
EvaluationDataset(72-128)EvaluationDatasetCreate(267-284)EvaluationDatasetPublic(287-299)EvaluationRun(131-229)EvaluationRunCreate(232-243)EvaluationRunPublic(246-264)
🔇 Additional comments (3)
backend/app/api/main.py (2)
21-22: LGTM! Router imports are correctly placed.The new
cronandevaluationrouter imports follow the existing alphabetical ordering pattern and are consistent with other router imports in the file.
35-35: LGTM! Router inclusions are properly integrated.The new routers are correctly included in alphabetical order, following the existing pattern for router registration.
Also applies to: 38-38
backend/app/models/__init__.py (1)
47-54: Evaluation model exports are complete and correctly positioned relative to fine_tuning.All model variants from
evaluation.pyare properly re-exported. The placement relative tofine_tuningfollows alphabetical order. However, ifbatch_jobimports are moved as suggested, these imports are already in the correct position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
backend/app/api/routes/evaluation.py (1)
500-518: Preserve caller config when merging assistant defaults.The current implementation replaces the entire
configdict, discarding caller-provided keys likeinclude,tools, or other OpenAI Responses API parameters. This breaks legitimate use cases where the caller wants to override or add fields beyond model/instructions/temperature.Apply this diff to merge assistant defaults without losing caller config:
- config = { - "model": config.get("model", assistant.model), - "instructions": config.get("instructions", assistant.instructions), - "temperature": config.get("temperature", assistant.temperature), - } - - # Add tools if vector stores are available - vector_store_ids = config.get( - "vector_store_ids", assistant.vector_store_ids or [] - ) - if vector_store_ids and len(vector_store_ids) > 0: - config["tools"] = [ - { - "type": "file_search", - "vector_store_ids": vector_store_ids, - } - ] + # Merge assistant defaults into caller config without replacing it + config.setdefault("model", assistant.model) + config.setdefault("instructions", assistant.instructions) + if "temperature" not in config and assistant.temperature is not None: + config["temperature"] = assistant.temperature + + # Add file_search tool if vector stores are available and not already in tools + vector_store_ids = config.get("vector_store_ids") or assistant.vector_store_ids or [] + if vector_store_ids: + # Check if file_search tool already exists + existing_tools = config.get("tools", []) + has_file_search = any( + tool.get("type") == "file_search" for tool in existing_tools + ) + if not has_file_search: + config.setdefault("tools", []).append({ + "type": "file_search", + "vector_store_ids": vector_store_ids, + })backend/app/crud/evaluations/processing.py (1)
733-739: Classify embedding outcomes correctly.When
check_and_process_evaluation()returns"embeddings_completed"or"embeddings_failed", the current code treats them as still processing. Completed embedding runs should incrementprocessed, and failed embeddings should incrementfailed.Apply this diff to fix the classification:
- if result["action"] == "processed": + action = result["action"] + if action in {"processed", "embeddings_completed"}: total_processed_count += 1 - elif result["action"] == "failed": + elif action in {"failed", "embeddings_failed"}: total_failed_count += 1 else: total_still_processing_count += 1Based on learnings
🧹 Nitpick comments (1)
backend/app/crud/evaluations/processing.py (1)
477-477: Consider moving imports to module level.Importing
poll_batch_statusinside the function (also at line 551) is unconventional and makes the code harder to follow. While this pattern is sometimes used to avoid circular imports, consider restructuring your modules to eliminate circular dependencies instead.If circular imports are unavoidable, add a comment explaining why the import must be local:
# Poll embedding batch status provider = OpenAIBatchProvider(client=openai_client) + # Local import to avoid circular dependency with batch_operations from app.crud.batch_operations import poll_batch_status
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/app/api/routes/evaluation.py(1 hunks)backend/app/api/routes/llm.py(1 hunks)backend/app/crud/evaluations/core.py(1 hunks)backend/app/crud/evaluations/processing.py(1 hunks)backend/app/tests/api/routes/test_evaluation.py(1 hunks)backend/app/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/utils.pybackend/app/api/routes/llm.pybackend/app/crud/evaluations/processing.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/crud/evaluations/core.pybackend/app/api/routes/evaluation.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/llm.pybackend/app/api/routes/evaluation.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.py
🧠 Learnings (4)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/langfuse/**/*.py : Place Langfuse observability integration under backend/app/core/langfuse/
Applied to files:
backend/app/utils.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/services/response/**/*.py : Put OpenAI Responses API integration, conversation management, and job execution in backend/app/services/response/
Applied to files:
backend/app/api/routes/evaluation.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/models/**/*.py : Define SQLModel entities (database tables and domain objects) in backend/app/models/
Applied to files:
backend/app/api/routes/evaluation.py
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/core/finetune/**/*.py : Place fine-tuning utilities under backend/app/core/finetune/
Applied to files:
backend/app/api/routes/evaluation.py
🧬 Code graph analysis (4)
backend/app/crud/evaluations/processing.py (8)
backend/app/core/batch/openai_provider.py (2)
OpenAIBatchProvider(14-254)download_batch_results(143-191)backend/app/crud/batch_job.py (1)
get_batch_job(59-73)backend/app/crud/batch_operations.py (3)
download_batch_results(174-215)upload_batch_results_to_object_store(283-330)poll_batch_status(114-171)backend/app/crud/evaluations/batch.py (1)
fetch_dataset_items(24-59)backend/app/crud/evaluations/core.py (1)
update_evaluation_run(262-309)backend/app/crud/evaluations/embeddings.py (3)
calculate_average_similarity(255-331)parse_embedding_results(132-218)start_embedding_batch(334-434)backend/app/crud/evaluations/langfuse.py (2)
create_langfuse_dataset_run(18-122)update_traces_with_cosine_scores(125-178)backend/app/utils.py (2)
get_langfuse_client(209-245)get_openai_client(176-206)
backend/app/tests/api/routes/test_evaluation.py (3)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-115)backend/app/models/evaluation.py (1)
EvaluationDataset(72-128)backend/app/tests/conftest.py (2)
user_api_key_header(90-92)db(28-45)
backend/app/crud/evaluations/core.py (6)
backend/app/crud/credentials.py (1)
get_provider_credential(121-159)backend/app/models/evaluation.py (2)
EvaluationRun(131-229)DatasetUploadResponse(24-43)backend/app/models/user.py (1)
UserProjectOrg(58-59)backend/app/utils.py (1)
get_langfuse_client(209-245)backend/app/core/langfuse/langfuse.py (1)
flush(108-109)backend/app/api/routes/evaluation.py (1)
list_evaluation_runs(573-591)
backend/app/api/routes/evaluation.py (9)
backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/crud/assistants.py (1)
get_assistant_by_id(19-30)backend/app/crud/evaluations/dataset.py (5)
create_evaluation_dataset(31-104)get_dataset_by_id(107-140)list_datasets(175-210)upload_csv_to_object_store(213-245)delete_dataset(323-387)backend/app/crud/evaluations/core.py (3)
create_evaluation_run(137-179)get_evaluation_run_by_id(221-259)list_evaluation_runs(182-218)backend/app/crud/evaluations/batch.py (1)
start_evaluation_batch(118-212)backend/app/core/storage_utils.py (1)
upload_csv_to_object_store(22-83)backend/app/crud/evaluations/langfuse.py (1)
upload_dataset_to_langfuse_from_csv(181-297)backend/app/models/evaluation.py (2)
DatasetUploadResponse(24-43)EvaluationRunPublic(246-264)backend/app/utils.py (4)
get_langfuse_client(209-245)get_openai_client(176-206)load_description(288-293)_(297-298)
🔇 Additional comments (10)
backend/app/utils.py (1)
209-245: LGTM! Enhanced error handling for Langfuse client.The changes enforce explicit credential validation (HTTP 400 for missing/incomplete credentials) and wrap client construction with proper error handling (HTTP 500 for construction failures). Logging includes relevant context for debugging.
backend/app/api/routes/llm.py (1)
38-58: LGTM! Correct alignment of function signature with implementation.The function performs no asynchronous operations (no
awaitstatements), so making it synchronous is appropriate. Thestart_jobcall queues work for background processing and returns immediately, consistent with the callback pattern.backend/app/tests/api/routes/test_evaluation.py (1)
1-688: LGTM! Comprehensive test coverage.This test file provides excellent coverage of the evaluation subsystem:
- CSV validation and parsing (valid, invalid, empty rows)
- Duplication logic with boundary testing (0, 1, 5, 6)
- Error handling (Langfuse config failure, invalid CSV, missing auth)
- Batch evaluation endpoints
- JSONL building logic with various configurations
The tests use appropriate mocking for external dependencies and verify both API responses and database state.
backend/app/crud/evaluations/core.py (4)
17-134: LGTM! Comprehensive error handling in dataset upload.The function properly handles multiple failure scenarios:
- Langfuse client creation failure (caught and returned as error tuple)
- CSV validation (missing headers, empty items)
- Per-item upload failures (logged but processing continues)
- General exceptions (caught and returned as error)
The duplication logic correctly multiplies items and tracks metadata.
182-218: LGTM! Proper project scoping in list query.The query correctly filters by both
organization_idandproject_id(lines 204-205), ensuring users can only see evaluation runs within their project scope.
221-259: LGTM! Proper access control in get_by_id query.The query correctly scopes by
id,organization_id, andproject_id(lines 241-243), preventing cross-project data access.
262-309: LGTM! Consistent update pattern with automatic timestamp.The function provides selective field updates while always maintaining the
updated_attimestamp (line 302), ensuring proper audit trail.backend/app/api/routes/evaluation.py (3)
92-293: LGTM! Comprehensive validation and security checks.The endpoint implements multiple layers of validation:
- File extension and MIME type checks (lines 131-144)
- File size limits (1 MB max, lines 146-155)
- CSV header validation (lines 170-178)
- Empty file detection (lines 157-158)
Object store upload is optional (catches exceptions gracefully), while Langfuse upload is required (raises on failure). The 1 MB size limit makes reading the entire file into memory (line 161) acceptable.
296-336: LGTM! Proper pagination with enforced limits.The endpoint enforces a maximum limit of 100 (lines 308-309) to prevent excessive data retrieval, and properly scopes queries by organization and project.
568-627: LGTM! Consistent access control in list and get endpoints.Both endpoints properly scope queries by organization and project using
AuthContextDep, ensuring users can only access their own evaluation runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (6)
backend/app/api/routes/cron.py (1)
52-64: Raise HTTPException instead of returning HTTP 200 on error.This exception handler logs the error but returns a JSON payload with HTTP 200 status. Clients (including cron monitoring) will interpret this as success and won't alert on actual failures. You must raise an
HTTPExceptionwith a 5xx status code so the HTTP response properly reflects the error condition.This issue was flagged in a previous review but remains unresolved.
Apply this diff to fix:
except Exception as e: logger.error( f"[evaluation_cron_job] Error executing cron job: {e}", exc_info=True, ) - return { - "status": "error", - "error": str(e), - "organizations_processed": 0, - "total_processed": 0, - "total_failed": 0, - "total_still_processing": 0, - } + raise HTTPException( + status_code=500, + detail=f"Cron job failed: {str(e)}" + )backend/app/alembic/versions/6fe772038a5a_create_evaluation_run_table.py (2)
21-21: Add return type hint for upgrade.Please declare the signature as
def upgrade() -> None:to satisfy the project’s Python 3.11+ typing rule. As per coding guidelines.
232-232: Add return type hint for downgrade.Please update the signature to
def downgrade() -> None:in line with the typing guideline. As per coding guidelines.backend/app/core/batch/openai.py (1)
17-24: Add the missing__init__return annotation.Project guidelines require explicit return types, and the prior review already called this out. Please add
-> Noneso the constructor stays aligned with our Python 3.11+ typing policy.Apply this diff:
- def __init__(self, client: OpenAI): + def __init__(self, client: OpenAI) -> None:backend/app/crud/evaluations/embeddings.py (1)
92-109: Don’t discard falsy-but-valid outputs.
if not generated_output or not ground_truthstill skips legitimate zero/False answers, so whole embedding jobs can empty out again. We already called this out earlier—please tighten the guard to only drop truly missing or blank values and coerce non-strings to strings before building the payload.Apply this diff:
- generated_output = result.get("generated_output", "") - ground_truth = result.get("ground_truth", "") + generated_output = result.get("generated_output") + ground_truth = result.get("ground_truth") @@ - if not generated_output or not ground_truth: - logger.warning(f"Skipping item {item_id} - empty output or ground_truth") - continue + if generated_output is None or ground_truth is None: + logger.warning( + f"Skipping item {item_id} - output or ground_truth is missing" + ) + continue + + if isinstance(generated_output, str): + if not generated_output.strip(): + logger.warning( + f"Skipping item {item_id} - empty generated_output string" + ) + continue + else: + generated_output = str(generated_output) + + if isinstance(ground_truth, str): + if not ground_truth.strip(): + logger.warning( + f"Skipping item {item_id} - empty ground_truth string" + ) + continue + else: + ground_truth = str(ground_truth)backend/app/crud/evaluations/processing.py (1)
732-737: Count embedding outcomes instead of lumping them into “still processing”.
check_and_process_evaluation()can return"embeddings_completed"and"embeddings_failed". The current else branch drops both intototal_still_processing_count, so finished embedding runs vanish from the processed/failed totals and the summary is wrong. Please bucket those actions explicitly.- if result["action"] == "processed": - total_processed_count += 1 - elif result["action"] == "failed": - total_failed_count += 1 - else: - total_still_processing_count += 1 + action = result["action"] + if action in {"processed", "embeddings_completed"}: + total_processed_count += 1 + elif action in {"failed", "embeddings_failed"}: + total_failed_count += 1 + else: + total_still_processing_count += 1
🧹 Nitpick comments (1)
backend/app/api/routes/cron.py (1)
21-23: Consider a more specific return type.The return type
dictis vague. Per the coding guidelines for Python type hints, considerdict[str, Any]or define a TypedDict with the expected keys (status, organizations_processed, total_processed, total_failed, total_still_processing, error).def evaluation_cron_job( session: SessionDep, -) -> dict: +) -> dict[str, Any]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
backend/app/alembic/versions/6fe772038a5a_create_evaluation_run_table.py(1 hunks)backend/app/api/routes/cron.py(1 hunks)backend/app/core/batch/__init__.py(1 hunks)backend/app/core/batch/base.py(1 hunks)backend/app/core/batch/openai.py(1 hunks)backend/app/crud/batch_operations.py(1 hunks)backend/app/crud/evaluations/batch.py(1 hunks)backend/app/crud/evaluations/core.py(1 hunks)backend/app/crud/evaluations/embeddings.py(1 hunks)backend/app/crud/evaluations/processing.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/core/batch/init.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/alembic/versions/6fe772038a5a_create_evaluation_run_table.pybackend/app/crud/batch_operations.pybackend/app/core/batch/base.pybackend/app/crud/evaluations/core.pybackend/app/crud/evaluations/batch.pybackend/app/core/batch/openai.pybackend/app/api/routes/cron.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/embeddings.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/batch_operations.pybackend/app/crud/evaluations/core.pybackend/app/crud/evaluations/batch.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/embeddings.py
backend/app/core/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Place core functionality (config, DB session, security, exceptions, middleware) in backend/app/core/
Files:
backend/app/core/batch/base.pybackend/app/core/batch/openai.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/cron.py
🧠 Learnings (1)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to **/*.py : Use type hints in Python code (Python 3.11+ project)
Applied to files:
backend/app/alembic/versions/6fe772038a5a_create_evaluation_run_table.py
🧬 Code graph analysis (8)
backend/app/crud/batch_operations.py (5)
backend/app/core/batch/base.py (4)
BatchProvider(7-105)create_batch(11-32)get_batch_status(35-52)download_batch_results(55-72)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/core/storage_utils.py (1)
upload_jsonl_to_object_store(86-152)backend/app/crud/batch_job.py (2)
create_batch_job(13-56)update_batch_job(76-120)backend/app/models/batch_job.py (3)
BatchJob(15-80)BatchJobCreate(83-97)BatchJobUpdate(100-109)
backend/app/core/batch/base.py (2)
backend/app/core/batch/openai.py (5)
create_batch(26-87)get_batch_status(89-141)download_batch_results(143-191)upload_file(193-223)download_file(225-254)backend/app/crud/batch_operations.py (1)
download_batch_results(130-156)
backend/app/crud/evaluations/core.py (5)
backend/app/models/evaluation.py (2)
EvaluationRun(131-229)DatasetUploadResponse(24-43)backend/app/models/user.py (1)
UserProjectOrg(58-59)backend/app/utils.py (1)
get_langfuse_client(209-245)backend/app/core/langfuse/langfuse.py (1)
flush(108-109)backend/app/api/routes/evaluation.py (1)
list_evaluation_runs(573-591)
backend/app/crud/evaluations/batch.py (3)
backend/app/core/batch/openai.py (1)
OpenAIBatchProvider(14-254)backend/app/crud/batch_operations.py (1)
start_batch_job(20-86)backend/app/models/evaluation.py (1)
EvaluationRun(131-229)
backend/app/core/batch/openai.py (2)
backend/app/core/batch/base.py (6)
BatchProvider(7-105)create_batch(11-32)upload_file(75-89)get_batch_status(35-52)download_batch_results(55-72)download_file(92-105)backend/app/crud/batch_operations.py (1)
download_batch_results(130-156)
backend/app/api/routes/cron.py (2)
backend/app/api/permissions.py (2)
Permission(10-15)require_permission(45-70)backend/app/crud/evaluations/cron.py (1)
process_all_pending_evaluations_sync(142-154)
backend/app/crud/evaluations/processing.py (8)
backend/app/core/batch/openai.py (2)
OpenAIBatchProvider(14-254)download_batch_results(143-191)backend/app/crud/batch_job.py (1)
get_batch_job(59-73)backend/app/crud/batch_operations.py (3)
download_batch_results(130-156)upload_batch_results_to_object_store(205-232)poll_batch_status(89-127)backend/app/crud/evaluations/batch.py (1)
fetch_dataset_items(24-59)backend/app/crud/evaluations/core.py (1)
update_evaluation_run(261-308)backend/app/crud/evaluations/embeddings.py (3)
calculate_average_similarity(255-331)parse_embedding_results(132-218)start_embedding_batch(334-434)backend/app/crud/evaluations/langfuse.py (2)
create_langfuse_dataset_run(18-122)update_traces_with_cosine_scores(125-178)backend/app/utils.py (2)
get_langfuse_client(209-245)get_openai_client(176-206)
backend/app/crud/evaluations/embeddings.py (3)
backend/app/core/batch/openai.py (1)
OpenAIBatchProvider(14-254)backend/app/crud/batch_operations.py (1)
start_batch_job(20-86)backend/app/models/evaluation.py (1)
EvaluationRun(131-229)
🔇 Additional comments (1)
backend/app/api/routes/cron.py (1)
13-13: LGTM!Router initialization is correct with appropriate tagging.
| "batch_job", | ||
| sa.Column("id", sa.Integer(), nullable=False), | ||
| sa.Column( | ||
| "provider", | ||
| sa.String(), | ||
| nullable=False, | ||
| comment="LLM provider name (e.g., 'openai', 'anthropic')", | ||
| ), | ||
| sa.Column( | ||
| "job_type", | ||
| sa.String(), | ||
| nullable=False, | ||
| comment="Type of batch job (e.g., 'evaluation', 'classification', 'embedding')", | ||
| ), | ||
| sa.Column( | ||
| "config", | ||
| postgresql.JSONB(astext_type=sa.Text()), | ||
| nullable=False, | ||
| server_default=sa.text("'{}'::jsonb"), | ||
| comment="Complete batch configuration", | ||
| ), | ||
| sa.Column( | ||
| "provider_batch_id", | ||
| sa.String(), | ||
| nullable=True, | ||
| comment="Provider's batch job ID", | ||
| ), | ||
| sa.Column( | ||
| "provider_file_id", | ||
| sa.String(), | ||
| nullable=True, | ||
| comment="Provider's input file ID", | ||
| ), | ||
| sa.Column( | ||
| "provider_output_file_id", | ||
| sa.String(), | ||
| nullable=True, | ||
| comment="Provider's output file ID", | ||
| ), | ||
| sa.Column( | ||
| "provider_status", | ||
| sa.String(), | ||
| nullable=True, | ||
| comment="Provider-specific status (e.g., OpenAI: validating, in_progress, completed, failed)", | ||
| ), | ||
| sa.Column( | ||
| "raw_output_url", | ||
| sa.String(), | ||
| nullable=True, | ||
| comment="S3 URL of raw batch output file", | ||
| ), | ||
| sa.Column( | ||
| "total_items", | ||
| sa.Integer(), | ||
| nullable=False, | ||
| server_default=sa.text("0"), | ||
| comment="Total number of items in the batch", | ||
| ), | ||
| sa.Column( | ||
| "error_message", | ||
| sa.Text(), | ||
| nullable=True, | ||
| comment="Error message if batch failed", | ||
| ), | ||
| sa.Column("organization_id", sa.Integer(), nullable=False), | ||
| sa.Column("project_id", sa.Integer(), nullable=False), | ||
| sa.Column("inserted_at", sa.DateTime(), nullable=False), | ||
| sa.Column("updated_at", sa.DateTime(), nullable=False), | ||
| sa.ForeignKeyConstraint( | ||
| ["organization_id"], ["organization.id"], ondelete="CASCADE" | ||
| ), | ||
| sa.ForeignKeyConstraint(["project_id"], ["project.id"], ondelete="CASCADE"), | ||
| sa.PrimaryKeyConstraint("id"), | ||
| ) | ||
| op.create_index( | ||
| op.f("ix_batch_job_job_type"), "batch_job", ["job_type"], unique=False | ||
| ) | ||
| op.create_index( | ||
| op.f("ix_batch_job_organization_id"), | ||
| "batch_job", | ||
| ["organization_id"], | ||
| unique=False, | ||
| ) | ||
| op.create_index( | ||
| op.f("ix_batch_job_project_id"), "batch_job", ["project_id"], unique=False | ||
| ) | ||
| op.create_index( | ||
| "idx_batch_job_status_org", | ||
| "batch_job", | ||
| ["provider_status", "organization_id"], | ||
| unique=False, | ||
| ) | ||
| op.create_index( | ||
| "idx_batch_job_status_project", | ||
| "batch_job", | ||
| ["provider_status", "project_id"], | ||
| unique=False, | ||
| ) | ||
|
|
||
| # Create evaluation_dataset table | ||
| op.create_table( | ||
| "evaluation_dataset", | ||
| sa.Column("id", sa.Integer(), nullable=False), | ||
| sa.Column("name", sqlmodel.sql.sqltypes.AutoString(), nullable=False), | ||
| sa.Column("description", sqlmodel.sql.sqltypes.AutoString(), nullable=True), | ||
| sa.Column( | ||
| "dataset_metadata", | ||
| postgresql.JSONB(astext_type=sa.Text()), | ||
| nullable=False, | ||
| server_default=sa.text("'{}'::jsonb"), | ||
| ), | ||
| sa.Column( | ||
| "object_store_url", sqlmodel.sql.sqltypes.AutoString(), nullable=True | ||
| ), | ||
| sa.Column( | ||
| "langfuse_dataset_id", | ||
| sqlmodel.sql.sqltypes.AutoString(), | ||
| nullable=True, | ||
| ), | ||
| sa.Column("organization_id", sa.Integer(), nullable=False), | ||
| sa.Column("project_id", sa.Integer(), nullable=False), | ||
| sa.Column("inserted_at", sa.DateTime(), nullable=False), | ||
| sa.Column("updated_at", sa.DateTime(), nullable=False), | ||
| sa.ForeignKeyConstraint( | ||
| ["organization_id"], ["organization.id"], ondelete="CASCADE" | ||
| ), | ||
| sa.ForeignKeyConstraint(["project_id"], ["project.id"], ondelete="CASCADE"), | ||
| sa.PrimaryKeyConstraint("id"), | ||
| sa.UniqueConstraint( | ||
| "name", | ||
| "organization_id", | ||
| "project_id", | ||
| name="uq_evaluation_dataset_name_org_project", | ||
| ), | ||
| ) | ||
| op.create_index( | ||
| op.f("ix_evaluation_dataset_name"), | ||
| "evaluation_dataset", | ||
| ["name"], | ||
| unique=False, | ||
| ) | ||
|
|
||
| # Create evaluation_run table with all columns and foreign key references | ||
| op.create_table( | ||
| "evaluation_run", | ||
| sa.Column("run_name", sqlmodel.sql.sqltypes.AutoString(), nullable=False), | ||
| sa.Column("dataset_name", sqlmodel.sql.sqltypes.AutoString(), nullable=False), | ||
| sa.Column("config", sa.JSON(), nullable=False), | ||
| sa.Column("batch_job_id", sa.Integer(), nullable=True), | ||
| sa.Column( | ||
| "embedding_batch_job_id", | ||
| sa.Integer(), | ||
| nullable=True, | ||
| comment="Reference to the batch_job for embedding-based similarity scoring", | ||
| ), | ||
| sa.Column("dataset_id", sa.Integer(), nullable=False), | ||
| sa.Column("status", sqlmodel.sql.sqltypes.AutoString(), nullable=False), | ||
| sa.Column( | ||
| "object_store_url", sqlmodel.sql.sqltypes.AutoString(), nullable=True | ||
| ), | ||
| sa.Column("total_items", sa.Integer(), nullable=False), | ||
| sa.Column("score", sa.JSON(), nullable=True), | ||
| sa.Column("error_message", sa.Text(), nullable=True), | ||
| sa.Column("organization_id", sa.Integer(), nullable=False), | ||
| sa.Column("project_id", sa.Integer(), nullable=False), | ||
| sa.Column("id", sa.Integer(), nullable=False), | ||
| sa.Column("inserted_at", sa.DateTime(), nullable=False), | ||
| sa.Column("updated_at", sa.DateTime(), nullable=False), | ||
| sa.ForeignKeyConstraint( | ||
| ["batch_job_id"], | ||
| ["batch_job.id"], | ||
| ondelete="SET NULL", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["embedding_batch_job_id"], | ||
| ["batch_job.id"], | ||
| name="fk_evaluation_run_embedding_batch_job_id", | ||
| ondelete="SET NULL", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["dataset_id"], | ||
| ["evaluation_dataset.id"], | ||
| name="fk_evaluation_run_dataset_id", | ||
| ondelete="CASCADE", | ||
| ), | ||
| sa.ForeignKeyConstraint( | ||
| ["organization_id"], ["organization.id"], ondelete="CASCADE" | ||
| ), | ||
| sa.ForeignKeyConstraint(["project_id"], ["project.id"], ondelete="CASCADE"), | ||
| sa.PrimaryKeyConstraint("id"), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix primary keys to autoincrement.
batch_job, evaluation_dataset, and evaluation_run declare id as plain Integer NOT NULL with a separate PrimaryKeyConstraint. Postgres will not create a sequence/default in this setup, so inserts coming from SQLModel (which send id=None) will fail with NULL value in column "id" violates not-null constraint. Mark the columns as primary keys (or add sa.Identity()) so the database autogenerates IDs.
- sa.Column("id", sa.Integer(), nullable=False),
+ sa.Column("id", sa.Integer(), primary_key=True),
...
- sa.Column("id", sa.Integer(), nullable=False),
+ sa.Column("id", sa.Integer(), primary_key=True),
...
- sa.Column("id", sa.Integer(), nullable=False),
+ sa.Column("id", sa.Integer(), primary_key=True),🤖 Prompt for AI Agents
In backend/app/alembic/versions/6fe772038a5a_create_evaluation_run_table.py
around lines 24 to 214, the id columns for batch_job, evaluation_dataset, and
evaluation_run are defined as plain Integer with a separate PrimaryKeyConstraint
which prevents Postgres from creating sequences and causes NULL id insert
failures; update each table definition to make the id column a real primary key
with an autogenerating identity (either mark the id column as primary_key=True
in the sa.Column definition or add sa.Identity()/server_default identity
expression) so Postgres will auto-generate ids on insert, and remove or keep the
separate PrimaryKeyConstraint accordingly to avoid duplicate/conflicting PK
declarations.
| import logging | ||
|
|
||
| from app.api.permissions import Permission, require_permission | ||
| from fastapi import APIRouter, Depends | ||
| from sqlmodel import Session | ||
|
|
||
| from app.api.deps import SessionDep, AuthContextDep | ||
| from app.crud.evaluations import process_all_pending_evaluations_sync | ||
| from app.models import User | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Remove unused imports and add HTTPException.
Three imports are unused:
Sessionfrom sqlmodel (line 5)AuthContextDep(line 7)User(line 9)
Additionally, HTTPException is missing from the FastAPI import (line 4) but is required to properly fix the exception handling at lines 52-64.
Apply this diff:
import logging
from app.api.permissions import Permission, require_permission
-from fastapi import APIRouter, Depends
-from sqlmodel import Session
+from fastapi import APIRouter, Depends, HTTPException
-from app.api.deps import SessionDep, AuthContextDep
+from app.api.deps import SessionDep
from app.crud.evaluations import process_all_pending_evaluations_sync
-from app.models import User🤖 Prompt for AI Agents
In backend/app/api/routes/cron.py around lines 1 to 11, remove the unused
imports: Session (from sqlmodel), AuthContextDep, and User; and add
HTTPException to the FastAPI import list so exception handling later (lines
~52-64) can raise HTTPException properly; update the import line to include
HTTPException and delete the three unused import lines to keep imports clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/python/invoke-cron.py (1)
148-149: Validate the response body status field to avoid masking failures.The code logs "Endpoint invoked successfully" based only on the HTTP 200 status code without checking the response body. If the API returns
{"status": "error", ...}, the failure will be masked and monitoring will miss it.Apply this diff to validate the response body:
result = await self.invoke_endpoint(client) - logger.info(f"Endpoint invoked successfully: {result}") + + # Check response body status field + status = result.get("status") + if status != "success": + error_msg = result.get("message", "Unknown error") + logger.error(f"Endpoint returned failure status: {status}, message: {error_msg}, full response: {result}") + # Continue to next iteration rather than crash the cron + else: + logger.info(f"Endpoint invoked successfully: {result}")
🧹 Nitpick comments (6)
scripts/python/invoke-cron.py (6)
18-19: Add type hints to module-level constants.Per coding guidelines, type hints should be used throughout the Python code.
Apply this diff:
-ENDPOINT = "/api/v1/cron/evaluations" # Endpoint to invoke -REQUEST_TIMEOUT = 30 # Timeout for requests in seconds +ENDPOINT: str = "/api/v1/cron/evaluations" # Endpoint to invoke +REQUEST_TIMEOUT: int = 30 # Timeout for requests in secondsAs per coding guidelines.
32-51: Add return type hint to__init__method.Per coding guidelines, all methods should have type hints.
Apply this diff:
- def __init__(self): + def __init__(self) -> None:As per coding guidelines.
53-84: Add return type hint and uselogging.exceptionin exception handlers.Per coding guidelines, type hints should be used. Additionally,
logging.exceptionis preferred overlogging.errorin exception handlers as it automatically includes the traceback.Apply this diff:
- async def authenticate(self, client: httpx.AsyncClient) -> str: + async def authenticate(self, client: httpx.AsyncClient) -> str: """Authenticate and get access token.""" logger.info("Authenticating with API...") login_data = { "username": self.email, "password": self.password, } try: response = await client.post( f"{self.base_url}/api/v1/login/access-token", data=login_data, timeout=REQUEST_TIMEOUT, ) response.raise_for_status() data = response.json() self.access_token = data.get("access_token") if not self.access_token: raise ValueError("No access token in response") logger.info("Authentication successful") return self.access_token except httpx.HTTPStatusError as e: - logger.error(f"Authentication failed with status {e.response.status_code}") + logger.exception(f"Authentication failed with status {e.response.status_code}") raise except Exception as e: - logger.error(f"Authentication error: {e}") + logger.exception(f"Authentication error: {e}") raiseAs per coding guidelines.
86-129: Add return type hint and uselogging.exceptionin exception handlers.Per coding guidelines, type hints should be used. Additionally, use
logging.exceptionin exception handlers for automatic traceback inclusion.Apply this diff:
- async def invoke_endpoint(self, client: httpx.AsyncClient) -> dict: + async def invoke_endpoint(self, client: httpx.AsyncClient) -> dict: """Invoke the configured endpoint.""" if not self.access_token: await self.authenticate(client) headers = {"Authorization": f"Bearer {self.access_token}"} # Debug: Log what we're sending logger.debug(f"Request URL: {self.base_url}{self.endpoint}") logger.debug(f"Request headers: {headers}") try: response = await client.get( f"{self.base_url}{self.endpoint}", headers=headers, timeout=REQUEST_TIMEOUT, ) # Debug: Log response headers and first part of body logger.debug(f"Response status: {response.status_code}") logger.debug(f"Response headers: {dict(response.headers)}") # If unauthorized, re-authenticate and retry once if response.status_code == 401: logger.info("Token expired, re-authenticating...") await self.authenticate(client) headers = {"Authorization": f"Bearer {self.access_token}"} response = await client.get( f"{self.base_url}{self.endpoint}", headers=headers, timeout=REQUEST_TIMEOUT, ) response.raise_for_status() return response.json() except httpx.HTTPStatusError as e: - logger.error( + logger.exception( f"Endpoint invocation failed with status {e.response.status_code}: {e.response.text}" ) raise except Exception as e: - logger.error(f"Endpoint invocation error: {e}") + logger.exception(f"Endpoint invocation error: {e}") raiseAs per coding guidelines.
131-168: Add return type hint and uselogging.exceptionfor error logging.Per coding guidelines, add return type hint. Also prefer
logging.exceptionin exception handlers.Apply this diff:
- async def run(self): + async def run(self) -> None: """Main loop to invoke endpoint periodically.""" logger.info(f"Using API Base URL: {self.base_url}") logger.info( f"Starting cron job - invoking {self.endpoint} every {self.interval_minutes} minutes" ) # Use async context manager to ensure proper cleanup async with httpx.AsyncClient() as client: # Authenticate once at startup await self.authenticate(client) while True: try: start_time = datetime.now() logger.info(f"Invoking endpoint at {start_time}") result = await self.invoke_endpoint(client) logger.info(f"Endpoint invoked successfully: {result}") # Calculate next invocation time elapsed = (datetime.now() - start_time).total_seconds() sleep_time = max(0, self.interval_seconds - elapsed) if sleep_time > 0: logger.info( f"Sleeping for {sleep_time:.1f} seconds until next invocation" ) await asyncio.sleep(sleep_time) except KeyboardInterrupt: logger.info("Shutting down gracefully...") break except Exception as e: - logger.error(f"Error during invocation: {e}") + logger.exception(f"Error during invocation: {e}") # Wait before retrying on error logger.info(f"Waiting {self.interval_seconds} seconds before retry") await asyncio.sleep(self.interval_seconds)As per coding guidelines.
171-189: Add return type hint and uselogging.exceptionfor fatal errors.Per coding guidelines, add return type hint. Also use
logging.exceptionto capture traceback for fatal errors.Apply this diff:
-def main(): +def main() -> None: """Entry point for the script.""" # Load environment variables env_path = Path(__file__).parent.parent.parent / ".env" if env_path.exists(): load_dotenv(env_path) logger.info(f"Loaded environment from {env_path}") else: logger.warning(f"No .env file found at {env_path}") try: invoker = EndpointInvoker() asyncio.run(invoker.run()) except KeyboardInterrupt: logger.info("Interrupted by user") sys.exit(0) except Exception as e: - logger.error(f"Fatal error: {e}") + logger.exception(f"Fatal error: {e}") sys.exit(1)As per coding guidelines.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.env.example(1 hunks)scripts/python/invoke-cron.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
.env.example
📄 CodeRabbit inference engine (CLAUDE.md)
Provide .env.example as the template for .env
Files:
.env.example
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
scripts/python/invoke-cron.py
🪛 Ruff (0.14.3)
scripts/python/invoke-cron.py
1-1: Shebang is present but file is not executable
(EXE001)
49-51: Avoid specifying long messages outside the exception class
(TRY003)
74-74: Abstract raise to an inner function
(TRY301)
74-74: Avoid specifying long messages outside the exception class
(TRY003)
77-77: Consider moving this statement to an else block
(TRY300)
80-80: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
83-83: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
123-125: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
128-128: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
164-164: Do not catch blind exception: Exception
(BLE001)
165-165: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
187-187: Do not catch blind exception: Exception
(BLE001)
188-188: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (1)
.env.example (1)
26-30: LGTM! Clear documentation for new configuration.The new environment variables are well-documented with sensible defaults. The comments clearly explain their purpose and default behavior, making it easy for developers to understand the configuration options.
| @@ -0,0 +1,193 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the file executable to match the shebang.
The shebang is present but the file lacks executable permissions.
Run this command to fix:
chmod +x scripts/python/invoke-cron.py🧰 Tools
🪛 Ruff (0.14.3)
1-1: Shebang is present but file is not executable
(EXE001)
🤖 Prompt for AI Agents
In scripts/python/invoke-cron.py around line 1, the file contains a shebang but
is missing executable permissions; make the file executable by setting its
executable bit locally (so the shebang is honored), then commit the permission
change to the repo so the script runs as intended.
kartpop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved, lets merge in and release to staging; remnant issues can be taken up in the coming weeks and released with the unified API
Summary
Target issue is #417
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Here is the dataflow
Upload CSV → Langfuse Dataset (5x duplication)
Start Evaluation → Create EvaluationRun (pending)
Fetch Dataset → Build JSONL for OpenAI Batch API
Upload JSONL → OpenAI Files API
Create Batch → OpenAI Batch API (returns batch_id)
Update DB → BatchJob + EvaluationRun (processing)
Wait → OpenAI processes batch async (1-24 hours)
Celery Beat → Polls every 60 seconds
Check Status → Poll OpenAI batch status
When Response Batch Completed:
Build Embedding JSONL → [output, ground_truth] pairs per trace
Create Embedding Batch → OpenAI Embeddings API (returns batch_id)
Update DB → Link embedding_batch_job_id (status: processing)
Wait → OpenAI processes embeddings (1-24 hours)
Celery Beat → Polls embedding batch status
When Embedding Batch Completed:
User views results via API or Langfuse UI
Summary by CodeRabbit
New Features
Bug Fixes