-
Couldn't load subscription status.
- Fork 5
Classification: Merging endpoints #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ged from in progress to completed
WalkthroughAdds async CSV fine-tune upload handling: validates and stores CSV, creates a Document, spawns multiple FineTuning jobs per split ratio, and returns document_id; refresh now maps "cancelled" status and conditionally creates/queues ModelEvaluation when jobs complete. Tests migrated to moto S3 and validate auto-evaluation logic. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant API as FineTuning API
participant Storage as Cloud Storage
participant DB as Database
participant BG as Background Tasks
User->>API: POST /fine_tune_from_CSV (file, base_model, split_ratio, system_prompt)
API->>API: Validate CSV and parse split_ratio
API->>Storage: Upload CSV
Storage-->>API: file_key / signed URL
API->>DB: Create Document (with storage key)
DB-->>API: document_id
loop for each split ratio
API->>DB: Create FineTuningJob (pending, document_id, base_model, split)
end
API-->>User: { document_id, jobs[] }
Note over API,BG: Background processing handles provider interactions and job execution
sequenceDiagram
autonumber
actor User
participant API as FineTuning API
participant Provider as FT Provider
participant DB as Database
participant BG as Background Tasks
User->>API: POST /fine_tune/{id}/refresh
API->>Provider: Fetch job status
Provider-->>API: status (running/completed/failed/cancelled) + fine_tuned_model?
API->>DB: Update FineTuningJob status
alt status == completed and no active evaluation exists
API->>DB: Create ModelEvaluation (pending)
API->>BG: Queue run_model_evaluation(ModelEvaluation.id)
else status == completed and active evaluation exists
API-->>API: Skip creating evaluation
else not completed
API-->>API: No evaluation action
end
API-->>User: Updated job info
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/core/finetune/evaluation.py (1)
99-116: Possible infinite loop on latency timeout in generate_predictions.When elapsed_time > max_latency you log and continue without incrementing attempt, causing a tight infinite loop under chronic slowness. Increment attempts and fail gracefully after retries.
Apply this diff:
- elapsed_time = time.time() - start_time - if elapsed_time > self.max_latency: - logger.warning( - f"[generate_predictions] Timeout exceeded for prompt {idx}/{total_prompts}. Retrying..." - ) - continue + elapsed_time = time.time() - start_time + if elapsed_time > self.max_latency: + attempt += 1 + logger.warning( + f"[generate_predictions] Timeout exceeded for prompt {idx}/{total_prompts} " + f"({elapsed_time:.2f}s > {self.max_latency}s). Attempt {attempt}/{self.retries}." + ) + if attempt == self.retries: + predictions.append("timeout") + logger.error( + f"[generate_predictions] Maximum retries reached for prompt {idx}/{total_prompts} after timeouts. " + "Appending 'timeout'." + ) + break + else: + # brief backoff to avoid hot loop + time.sleep(1) + continueAdditionally, consider guarding normalize_prediction when allowed_labels is empty to avoid StopIteration (e.g., return the raw label or a sentinel).
Also applies to: 125-168
backend/app/api/routes/fine_tuning.py (2)
94-109: handle_openai_error is undefined — replace with str(e) to avoid NameError.This will crash both background processing and refresh paths. Use stringified exception (consistent with evaluation.py change).
- except openai.OpenAIError as e: - error_msg = handle_openai_error(e) + except openai.OpenAIError as e: + error_msg = str(e) logger.error( f"[process_fine_tuning_job] Failed to upload to OpenAI: {error_msg} | " f"job_id={job_id}, project_id={project_id}|" ) @@ - except openai.OpenAIError as e: - error_msg = handle_openai_error(e) + except openai.OpenAIError as e: + error_msg = str(e) logger.error( f"[process_fine_tuning_job] Failed to create OpenAI fine-tuning job: {error_msg} | " f"job_id={job_id}, project_id={project_id}|" ) update_finetune_job( @@ - except openai.OpenAIError as e: - error_msg = handle_openai_error(e) + except openai.OpenAIError as e: + error_msg = str(e) logger.error( f"[Retrieve_fine_tune_status] Failed to retrieve OpenAI job | " f"provider_job_id={mask_string(job.provider_job_id)}, " f"error={error_msg}, fine_tuning_id={fine_tuning_id}, project_id={project_id}" )Also applies to: 123-137, 312-325
161-176: Guard update on failure when fine_tune is None.If fetch_by_id fails before assignment, the except block will call update_finetune_job with job=None, masking the original error. Guard it.
except Exception as e: error_msg = str(e) logger.error( f"[process_fine_tuning_job] Background job failure: {e} | " f"job_id={job_id}, project_id={project_id}|" ) - update_finetune_job( - session=session, - job=fine_tune, - update=FineTuningUpdate( - status=FineTuningStatus.failed, - error_message="Error while processing the background job : " - + error_msg, - ), - ) + if fine_tune is not None: + update_finetune_job( + session=session, + job=fine_tune, + update=FineTuningUpdate( + status=FineTuningStatus.failed, + error_message="Error while processing the background job : " + error_msg, + ), + ) + else: + logger.error( + "[process_fine_tuning_job] Could not mark job failed because fine_tune was not loaded." + )
🧹 Nitpick comments (5)
backend/app/tests/api/routes/test_fine_tuning.py (3)
18-30: Remove unused helper create_file_mock.It’s not referenced; drop to reduce noise.
-def create_file_mock(file_type): - counter = {"train": 0, "test": 0} - - def _side_effect(file=None, purpose=None): - if purpose == "fine-tune": - if "train" in file.name: - counter["train"] += 1 - return MagicMock(id=f"file_{counter['train']}") - elif "test" in file.name: - counter["test"] += 1 - return MagicMock(id=f"file_{counter['test']}") - - return _side_effect
59-62: Avoid writing temporary JSONL files in this test.process_fine_tuning_job is patched; these files aren’t used. Removing speeds up tests and avoids FS writes.
- for path in ["/tmp/train.jsonl", "/tmp/test.jsonl"]: - with open(path, "w") as f: - f.write('{"prompt": "test", "completion": "label"}') + # No-op: background processing is mocked; no local files needed
111-116: Compare status using the Enum, not a string.DB model uses FineTuningStatus; assert against the Enum to avoid brittle string comparisons.
- assert ( - job.status == "pending" - ) # Since background processing is mocked, status remains pending + assert job.status == FineTuningStatus.pending # background job is mockedbackend/app/api/routes/fine_tuning.py (2)
326-340: Type alignment: mapped_status is an Enum; FineTuningUpdate.status is str.Not a blocker, but consider keeping types consistent (use FineTuningStatus throughout). Also rename annotation to Optional[FineTuningStatus] for clarity.
Also applies to: 341-346
213-216: CSV validation condition is permissive by design — confirm intended.Current check accepts if either extension OR content-type indicates CSV. If you intended both, switch to “or” inside the negation. Otherwise, this is fine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
backend/app/api/routes/fine_tuning.py(7 hunks)backend/app/core/finetune/evaluation.py(1 hunks)backend/app/models/fine_tuning.py(1 hunks)backend/app/tests/api/routes/test_fine_tuning.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/tests/api/routes/test_fine_tuning.py (6)
backend/app/tests/utils/test_data.py (1)
create_test_fine_tuning_jobs(133-160)backend/app/tests/utils/utils.py (1)
get_document(116-137)backend/app/models/fine_tuning.py (2)
Fine_Tuning(55-90)FineTuningStatus(13-18)backend/app/models/model_evaluation.py (2)
ModelEvaluation(38-85)ModelEvaluationStatus(14-18)backend/app/core/cloud/storage.py (5)
client(30-42)put(120-122)put(150-177)get_signed_url(135-137)get_signed_url(217-242)backend/app/tests/conftest.py (2)
db(24-41)user_api_key_header(77-79)
backend/app/api/routes/fine_tuning.py (9)
backend/app/models/fine_tuning.py (4)
FineTuningJobCreate(29-52)FineTuningJobPublic(104-120)FineTuningUpdate(93-101)FineTuningStatus(13-18)backend/app/models/document.py (1)
Document(20-40)backend/app/models/model_evaluation.py (2)
ModelEvaluationBase(21-26)ModelEvaluationStatus(14-18)backend/app/crud/model_evaluation.py (2)
create_model_evaluation(20-58)fetch_active_model_evals(150-170)backend/app/core/finetune/preprocessing.py (1)
DataPreprocessor(17-157)backend/app/api/routes/model_evaluation.py (1)
run_model_evaluation(49-112)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/core/cloud/storage.py (3)
get_cloud_storage(262-279)put(120-122)put(150-177)backend/app/crud/document.py (1)
DocumentCrud(14-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/fine_tuning.py (1)
13-19: Add “cancelled” status — looks good; verify downstream handling and persistence.Enum addition is fine. Please double-check:
- OPENAI status mapping includes “cancelled” (it does in API routes).
- Any DB constraints/migrations aren’t needed (column appears to be a string enum in SQLModel, so likely no migration, but verify).
…ctTech4DevAI/ai-platform into feature/classification-unified-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/core/finetune/evaluation.py (1)
5-5: Consider using built-insettype.
typing.Setis deprecated in Python 3.9+. Use the built-insetinstead.Apply this diff:
-from typing import Set +from typing import AnyAnd update line 38:
- self.allowed_labels: Set[str] = set() + self.allowed_labels: set[str] = set()Based on static analysis.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/core/finetune/evaluation.py(12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/finetune/evaluation.py (2)
backend/app/core/cloud/storage.py (1)
AmazonCloudStorage(145-259)backend/app/utils.py (1)
handle_openai_error(208-222)
🪛 Ruff (0.13.3)
backend/app/core/finetune/evaluation.py
5-5: typing.Set is deprecated, use set instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/core/finetune/evaluation.py (4)
52-53: Log message formatting improvements enhance readability.The string splitting across multiple lines improves code readability while maintaining semantic equivalence. These changes follow Python best practices for long strings.
Also applies to: 68-70, 73-74, 89-92, 96-97, 118-119, 125-126, 137-139, 152-153, 165-166, 172-174, 178-180, 185-187, 206-207, 216-217, 221-222, 232-233
198-199: UUID-based filename generation prevents collisions.Using
uuid.uuid4().hexto generate unique filenames is a good practice that prevents potential filename collisions in S3 storage.
248-251: Multi-line docstring follows PEP 257 conventions.Converting to a multi-line docstring improves readability and adheres to Python documentation best practices.
3-4: Import path update validated; no remaining imports of handle_openai_error from the old module. Addition of uuid import supports unique filename generation. Approving changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/app/api/routes/fine_tuning.py (2)
94-109: Undefinedhandle_openai_errorand fragile exception type (openai.OpenAIError) — will raise at runtime
handle_openai_erroris not defined/imported; these blocks will crash on exception. Also, with the current OpenAI Python client,OpenAIErrormay not exist; prefer catchingExceptionoropenai.APIErrorvariants.Apply these diffs to make error handling robust and avoid NameError:
- except openai.OpenAIError as e: - error_msg = handle_openai_error(e) + except Exception as e: + error_msg = getattr(e, "message", str(e)) logger.error( f"[process_fine_tuning_job] Failed to upload to OpenAI: {error_msg} | " f"job_id={job_id}, project_id={project_id}|" )- except openai.OpenAIError as e: - error_msg = handle_openai_error(e) + except Exception as e: + error_msg = getattr(e, "message", str(e)) logger.error( f"[process_fine_tuning_job] Failed to create OpenAI fine-tuning job: {error_msg} | " f"job_id={job_id}, project_id={project_id}|" )- except openai.OpenAIError as e: - error_msg = handle_openai_error(e) + except Exception as e: + error_msg = getattr(e, "message", str(e)) logger.error( f"[Retrieve_fine_tune_status] Failed to retrieve OpenAI job | " f"provider_job_id={mask_string(job.provider_job_id)}, " f"error={error_msg}, fine_tuning_id={fine_tuning_id}, project_id={project_id}" ) raise HTTPException( status_code=502, detail=f"OpenAI API error: {error_msg}" )Also applies to: 123-138, 303-313
161-175: Guard update on background failure whenfine_tuneis NoneIf
fetch_by_idfailed before assignment,update_finetune_jobwill be called withjob=None.- update_finetune_job( - session=session, - job=fine_tune, - update=FineTuningUpdate( - status=FineTuningStatus.failed, - error_message="Error while processing the background job : " - + error_msg, - ), - ) + if fine_tune is not None: + update_finetune_job( + session=session, + job=fine_tune, + update=FineTuningUpdate( + status=FineTuningStatus.failed, + error_message="Error while processing the background job : " + + error_msg, + ), + )
🧹 Nitpick comments (6)
backend/app/api/routes/fine_tuning.py (6)
202-205: CSV validation is brittle; accept common CSV MIME types or require at least one signalCurrent check rejects only when both extension and content-type are wrong; also many clients send application/vnd.ms-excel for CSV. Prefer validating if neither suggests CSV.
- if not file.filename.lower().endswith(".csv") and file.content_type != "text/csv": - raise HTTPException(status_code=400, detail="File must be a CSV file") + content_type = (file.content_type or "").lower() + if not (file.filename.lower().endswith(".csv") or "csv" in content_type): + raise HTTPException(status_code=400, detail="File must be a CSV file")
213-217: S3 object key lacks filename/extension; prefer a clear, namespaced keyStoring at just UUID loses context and extension; use a stable prefix and keep .csv for easier inspection and tooling.
- document_id = uuid4() - object_store_url = storage.put(file, Path(str(document_id))) + document_id = uuid4() + key = Path("datasets") / f"{document_id}.csv" + object_store_url = storage.put(file, key)
197-201: Deduplicate and validate split ratios before creating jobsAvoid duplicate jobs when users pass repeated ratios, and enforce model-level validation only once.
- try: - split_ratios = [float(r.strip()) for r in split_ratio.split(",")] + try: + split_ratios = [float(r.strip()) for r in split_ratio.split(",")] + # remove duplicates, preserve order + seen = set() + split_ratios = [r for r in split_ratios if (r not in seen and not seen.add(r))]
315-317: Type hint mismatch for mapped_statusThe mapping returns FineTuningStatus, not str. Update annotation.
- mapped_status: Optional[str] = OPENAI_TO_INTERNAL_STATUS.get( + mapped_status: Optional[FineTuningStatus] = OPENAI_TO_INTERNAL_STATUS.get( getattr(openai_job, "status", None) )
77-83: Cleanup may not run if preprocessing fails before upload
preprocessor.cleanup()is only in the inner upload try/finally; failures inpreprocessor.process()can bypass cleanup and leak temp artifacts.Wrap creation, processing, and upload in a broader try/finally and always call
cleanup()ifpreprocessorwas created:
- Initialize
preprocessor = Nonebefore use.- After creating it, use a surrounding
try: ... finally: preprocessor.cleanup().I can propose an exact patch if you prefer.
Also applies to: 110-114
280-282: Response payload: include counts for better UX (optional)Consider adding created_count/total to the response for clients to act on.
- return APIResponse.success_response( - {"message": message, "document_id": str(created_document.id), "jobs": job_infos} - ) + return APIResponse.success_response( + { + "message": message, + "document_id": str(created_document.id), + "created_count": created_count, + "total": total, + "jobs": job_infos, + } + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/api/routes/fine_tuning.py(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/api/routes/fine_tuning.py (9)
backend/app/models/fine_tuning.py (4)
FineTuningJobCreate(29-52)FineTuningJobPublic(104-120)FineTuningUpdate(93-101)FineTuningStatus(13-18)backend/app/models/document.py (1)
Document(20-40)backend/app/models/model_evaluation.py (2)
ModelEvaluationBase(21-26)ModelEvaluationStatus(14-18)backend/app/crud/model_evaluation.py (2)
create_model_evaluation(20-58)fetch_active_model_evals(150-170)backend/app/core/finetune/preprocessing.py (1)
DataPreprocessor(17-157)backend/app/api/routes/model_evaluation.py (1)
run_model_evaluation(49-112)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/core/cloud/storage.py (3)
get_cloud_storage(262-279)put(120-122)put(150-177)backend/app/crud/document.py (1)
DocumentCrud(14-135)
| # Check if status is changing from running to completed | ||
| is_newly_completed = ( | ||
| job.status == FineTuningStatus.running | ||
| and update_payload.status == FineTuningStatus.completed | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race: auto-evaluation can be created twice under concurrent refresh calls
Two concurrent requests can both observe “no active evals” and create duplicates. Add a DB-level guard (unique constraint/partial index) or a transactional re-check/catch on insert.
- Option A (preferred): add a unique index on model_evaluation.fine_tuning_id where is_deleted=false and status != 'failed', then wrap create in try/except IntegrityError; on conflict, skip.
- Option B: within a session transaction, lock the fine-tuning row (SELECT ... FOR UPDATE), re-check active evals, then create.
I can provide a concrete migration + guarded create if desired.
Also applies to: 343-369
🤖 Prompt for AI Agents
In backend/app/api/routes/fine_tuning.py around lines 330-335 (also applies to
343-369), two concurrent requests can both see “no active evals” and create
duplicate model_evaluation rows; implement a DB-level guard and guarded create.
Preferred fix: add a migration that creates a unique partial index on
model_evaluation(fine_tuning_id) WHERE is_deleted = false AND status !=
'failed', then wrap the create in a try/except catching IntegrityError and
skip/return if conflict occurs. Alternative: perform the check-and-create inside
a DB transaction that locks the fine_tuning row (SELECT ... FOR UPDATE),
re-check for active evals, then insert. Ensure sessions/transactions are used
consistently and surface a clear response when insert is skipped due to
integrity conflict.
…ctTech4DevAI/ai-platform into feature/classification-unified-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/tests/api/routes/test_fine_tuning.py (2)
42-53: Consider extracting S3 setup to a pytest fixture.The S3 bucket setup logic is correct and handles regional differences properly. However, if additional tests need S3 mocking in the future, this setup would need to be duplicated.
Consider creating a reusable fixture:
@pytest.fixture def mock_s3_bucket(): with mock_aws(): s3 = boto3.client("s3", region_name=settings.AWS_DEFAULT_REGION) bucket_name = settings.AWS_S3_BUCKET_PREFIX if settings.AWS_DEFAULT_REGION == "us-east-1": s3.create_bucket(Bucket=bucket_name) else: s3.create_bucket( Bucket=bucket_name, CreateBucketConfiguration={ "LocationConstraint": settings.AWS_DEFAULT_REGION }, ) yield s3
110-115: Use enum values instead of string literals for status checks.Line 113 uses a string literal
"pending"instead of the importedFineTuningStatus.pendingenum value. This pattern appears throughout the test file (lines 146, 174, 203, 260, 338, 392, 409, 456).Apply this diff to use the enum consistently:
for job in jobs: db.refresh(job) assert ( - job.status == "pending" + job.status == FineTuningStatus.pending ) # Since background processing is mocked, status remains pending assert job.split_ratio in [0.5, 0.7, 0.9]Repeat this change for all similar assertions throughout the file (lines 146, 174, 203, 260, 338, 392, 409, 456).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/tests/api/routes/test_fine_tuning.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/tests/api/routes/test_fine_tuning.py
🧬 Code graph analysis (1)
backend/app/tests/api/routes/test_fine_tuning.py (6)
backend/app/tests/utils/test_data.py (1)
create_test_fine_tuning_jobs(133-160)backend/app/tests/utils/utils.py (1)
get_document(116-137)backend/app/models/fine_tuning.py (2)
Fine_Tuning(55-90)FineTuningStatus(13-18)backend/app/models/model_evaluation.py (2)
ModelEvaluation(38-85)ModelEvaluationStatus(14-18)backend/app/core/cloud/storage.py (5)
client(30-42)put(120-122)put(150-177)get_signed_url(135-137)get_signed_url(217-242)backend/app/tests/conftest.py (2)
db(24-41)user_api_key_header(77-79)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (9)
backend/app/tests/api/routes/test_fine_tuning.py (9)
1-15: LGTM!The imports are well-organized and appropriate for the moto-based S3 mocking migration. The addition of
FineTuningStatus,ModelEvaluation, andModelEvaluationStatusaligns with the new auto-evaluation testing requirements.
118-180: LGTM with minor note!The retrieve tests properly validate status synchronization from OpenAI and error message propagation. The updated
provider_job_idnaming convention ("ftjob-mock_job_123") is consistent throughout.Note: The string literal status assertions (lines 146, 174) should use enum values as suggested in the previous comment.
182-203: LGTM!The fetch test properly verifies that multiple jobs can be retrieved by document ID and that their attributes are correctly populated.
206-212: LGTM!The test class is well-structured with appropriate decorators and a clear docstring explaining its purpose. The class-level patches ensure all test methods have the necessary mocks.
276-351: LGTM with verification note!This test properly validates that duplicate evaluations are not created when an active evaluation already exists. The logic correctly checks for existing evaluations before triggering a new one.
Note: The same foreign key field verification mentioned in the previous comment applies to line 346.
352-411: LGTM!Excellent test coverage for non-completion status transitions. The test validates that evaluations are only triggered on completion, not for intermediate states (running) or failures. This prevents unnecessary evaluation attempts.
412-467: LGTM!This test properly validates that refreshing an already completed job doesn't trigger new evaluations. The logic ensures evaluation is only triggered on the initial status transition to completed, preventing duplicate evaluation jobs.
213-275: No changes needed for ModelEvaluation foreign key
Verified that ModelEvaluationBase definesfine_tuning_idas a foreign key. The test's usage ofmodel_eval.fine_tuning_idis correct.
73-77: No change required for mock return type. The route casts theputresult tostrbefore persisting andDocument.object_store_urlis defined as a string, so returning a string in the test accurately reflects its usage.Likely an incorrect or invalid review comment.
| for path in ["/tmp/train.jsonl", "/tmp/test.jsonl"]: | ||
| with open(path, "w") as f: | ||
| f.write("{}") | ||
|
|
||
| mock_preprocessor = MagicMock() | ||
| mock_preprocessor.process.return_value = { | ||
| "train_jsonl_temp_filepath": "/tmp/train.jsonl", | ||
| "train_csv_s3_object": "s3://bucket/train.csv", | ||
| "test_csv_s3_object": "s3://bucket/test.csv", | ||
| } | ||
| mock_preprocessor.cleanup = MagicMock() | ||
| mock_preprocessor_cls.return_value = mock_preprocessor | ||
|
|
||
| mock_openai = MagicMock() | ||
| mock_openai.files.create.side_effect = create_file_mock("fine-tune") | ||
| mock_openai.fine_tuning.jobs.create.side_effect = [ | ||
| MagicMock(id=f"ft_mock_job_{i}", status="running") for i in range(1, 4) | ||
| ] | ||
| mock_get_openai_client.return_value = mock_openai | ||
|
|
||
| body = { | ||
| "document_id": str(document.id), | ||
| "base_model": "gpt-4", | ||
| "split_ratio": [0.5, 0.7, 0.9], | ||
| "system_prompt": "you are a model able to classify", | ||
| } | ||
|
|
||
| with patch("app.api.routes.fine_tuning.Session") as SessionMock: | ||
| SessionMock.return_value.__enter__.return_value = db | ||
| SessionMock.return_value.__exit__.return_value = None | ||
|
|
||
| response = client.post( | ||
| "/api/v1/fine_tuning/fine_tune", | ||
| json=body, | ||
| headers=user_api_key_header, | ||
| ) | ||
| f.write('{"prompt": "test", "completion": "label"}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add cleanup for temporary test files.
The test creates files in /tmp/ but doesn't clean them up after the test completes. This can lead to test pollution and potential conflicts in CI/CD environments.
Consider using pytest.fixture with cleanup or tempfile.TemporaryDirectory():
+from tempfile import TemporaryDirectory
+
# Setup test files for preprocessing
-for path in ["/tmp/train.jsonl", "/tmp/test.jsonl"]:
- with open(path, "w") as f:
- f.write('{"prompt": "test", "completion": "label"}')
+with TemporaryDirectory() as tmpdir:
+ for fname in ["train.jsonl", "test.jsonl"]:
+ path = os.path.join(tmpdir, fname)
+ with open(path, "w") as f:
+ f.write('{"prompt": "test", "completion": "label"}')Alternatively, use a pytest fixture with automatic cleanup:
@pytest.fixture
def temp_jsonl_files():
paths = ["/tmp/train.jsonl", "/tmp/test.jsonl"]
for path in paths:
with open(path, "w") as f:
f.write('{"prompt": "test", "completion": "label"}')
yield paths
for path in paths:
if os.path.exists(path):
os.remove(path)🤖 Prompt for AI Agents
In backend/app/tests/api/routes/test_fine_tuning.py around lines 59 to 61 the
test writes files directly into /tmp without cleaning them up; change the test
to create temporary files and ensure they are removed after the test — either
replace the hardcoded /tmp paths with tempfile.NamedTemporaryFile or
tempfile.TemporaryDirectory to produce paths that are automatically cleaned, or
implement a pytest fixture that creates the files, yields their paths, and
deletes them in the teardown (remove each path with os.remove if exists). Ensure
any required imports (tempfile, os, pytest) are added and that the test uses the
fixture or temp file paths instead of hardcoded /tmp paths.
Summary
Issue: #362
Enhancing the fine-tuning workflow by adding automatic evaluation triggers and improving status handling.
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Summary by CodeRabbit
New Features
Tests