Skip to content

Conversation

@AkhileshNegi
Copy link
Collaborator

@AkhileshNegi AkhileshNegi commented Oct 6, 2025

Summary

Issue: #362
Enhancing the fine-tuning workflow by adding automatic evaluation triggers and improving status handling.

  • Fine-tuning API improvements: Added logic to automatically trigger evaluation when status changes from in_progress to completed
  • Status ENUM update: Added cancelled status to fine-tuning model
  • Test coverage: Significantly expanded test cases for fine-tuning API

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

  • New Features

    • Upload a CSV to start fine-tuning via form inputs (base model, split ratios, system prompt).
    • Automatically creates multiple fine-tuning jobs per provided split ratios and links them to the uploaded dataset.
    • Retrieval endpoints return signed URLs for uploaded documents.
    • Fine-tune status now includes a “cancelled” state.
    • On job completion, an automatic model evaluation is queued if none exists.
  • Tests

    • Added tests covering CSV/cloud storage flows and automatic evaluation triggering.

@coderabbitai
Copy link

coderabbitai bot commented Oct 6, 2025

Walkthrough

Adds async CSV fine-tune upload handling: validates and stores CSV, creates a Document, spawns multiple FineTuning jobs per split ratio, and returns document_id; refresh now maps "cancelled" status and conditionally creates/queues ModelEvaluation when jobs complete. Tests migrated to moto S3 and validate auto-evaluation logic.

Changes

Cohort / File(s) Summary
API routes: CSV upload, job creation, status refresh
backend/app/api/routes/fine_tuning.py
Converts fine_tune_from_CSV to an async File/Form handler; validates CSV and parses split_ratio; uploads CSV to cloud storage and creates a Document; creates multiple FineTuning jobs (one per split); returns document_id; updates refresh_fine_tune_status to accept BackgroundTasks, map "cancelled" to FineTuningStatus.cancelled, and auto-create/queue ModelEvaluation when appropriate; removes route-local handle_openai_error.
Core evaluation: predictions & storage
backend/app/core/finetune/evaluation.py
Minor refactors and formatting: imports uuid and uses it to generate unique prediction CSV filenames; switches handle_openai_error import to internal app.utils; adjusts logging/message formatting; preserves retry/timeouts and evaluation logic.
Model enum
backend/app/models/fine_tuning.py
Adds FineTuningStatus.cancelled = "cancelled".
Tests: S3, mocks, auto-eval scenarios
backend/app/tests/api/routes/test_fine_tuning.py
Reworks tests to use moto S3 and real boto3 client, patches cloud storage/OpenAI/process_fine_tuning_job; asserts creation of pending jobs per split ratio; adds suites verifying auto-evaluation creation on completion, skipping when active evals exist, and ModelEvaluation record integrity.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant API as FineTuning API
  participant Storage as Cloud Storage
  participant DB as Database
  participant BG as Background Tasks

  User->>API: POST /fine_tune_from_CSV (file, base_model, split_ratio, system_prompt)
  API->>API: Validate CSV and parse split_ratio
  API->>Storage: Upload CSV
  Storage-->>API: file_key / signed URL
  API->>DB: Create Document (with storage key)
  DB-->>API: document_id
  loop for each split ratio
    API->>DB: Create FineTuningJob (pending, document_id, base_model, split)
  end
  API-->>User: { document_id, jobs[] }

  Note over API,BG: Background processing handles provider interactions and job execution
Loading
sequenceDiagram
  autonumber
  actor User
  participant API as FineTuning API
  participant Provider as FT Provider
  participant DB as Database
  participant BG as Background Tasks

  User->>API: POST /fine_tune/{id}/refresh
  API->>Provider: Fetch job status
  Provider-->>API: status (running/completed/failed/cancelled) + fine_tuned_model?
  API->>DB: Update FineTuningJob status
  alt status == completed and no active evaluation exists
    API->>DB: Create ModelEvaluation (pending)
    API->>BG: Queue run_model_evaluation(ModelEvaluation.id)
  else status == completed and active evaluation exists
    API-->>API: Skip creating evaluation
  else not completed
    API-->>API: No evaluation action
  end
  API-->>User: Updated job info
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Classification Feature #320: Direct overlap on fine-tuning routes, CSV/upload handling, and evaluation wiring — likely touches the same modules and tests.

Suggested reviewers

  • kartpop
  • avirajsingh7

Poem

I thump my paw on CSV plains,
I split the rows in tidy trains,
Buckets hum while jobs are queued,
When finished, evals are pursued,
Rabbity cheers — all tests green-tuned! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The current title "Classification: Merging endpoints" does not reflect the main changes in this pull request, which focus on enhancing the fine-tuning API with CSV support, automatic evaluation triggers, and adding a cancelled status. It misleads by suggesting classification endpoint merging rather than fine-tuning workflow improvements. Consequently, the title fails to clearly summarize the primary changes made. Please rename the pull request to concisely convey the main updates, for example: “Enhance fine-tuning API with CSV support, automatic evaluation triggers, and cancelled status.”
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/classification-unified-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AkhileshNegi AkhileshNegi self-assigned this Oct 6, 2025
@AkhileshNegi AkhileshNegi marked this pull request as ready for review October 6, 2025 04:59
@AkhileshNegi AkhileshNegi added the enhancement New feature or request label Oct 6, 2025
@codecov
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

❌ Patch coverage is 97.72727% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/api/routes/fine_tuning.py 89.65% 3 Missing ⚠️
backend/app/tests/api/routes/test_fine_tuning.py 99.29% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/core/finetune/evaluation.py (1)

99-116: Possible infinite loop on latency timeout in generate_predictions.

When elapsed_time > max_latency you log and continue without incrementing attempt, causing a tight infinite loop under chronic slowness. Increment attempts and fail gracefully after retries.

Apply this diff:

-                    elapsed_time = time.time() - start_time
-                    if elapsed_time > self.max_latency:
-                        logger.warning(
-                            f"[generate_predictions] Timeout exceeded for prompt {idx}/{total_prompts}. Retrying..."
-                        )
-                        continue
+                    elapsed_time = time.time() - start_time
+                    if elapsed_time > self.max_latency:
+                        attempt += 1
+                        logger.warning(
+                            f"[generate_predictions] Timeout exceeded for prompt {idx}/{total_prompts} "
+                            f"({elapsed_time:.2f}s > {self.max_latency}s). Attempt {attempt}/{self.retries}."
+                        )
+                        if attempt == self.retries:
+                            predictions.append("timeout")
+                            logger.error(
+                                f"[generate_predictions] Maximum retries reached for prompt {idx}/{total_prompts} after timeouts. "
+                                "Appending 'timeout'."
+                            )
+                            break
+                        else:
+                            # brief backoff to avoid hot loop
+                            time.sleep(1)
+                        continue

Additionally, consider guarding normalize_prediction when allowed_labels is empty to avoid StopIteration (e.g., return the raw label or a sentinel).

Also applies to: 125-168

backend/app/api/routes/fine_tuning.py (2)

94-109: handle_openai_error is undefined — replace with str(e) to avoid NameError.

This will crash both background processing and refresh paths. Use stringified exception (consistent with evaluation.py change).

-            except openai.OpenAIError as e:
-                error_msg = handle_openai_error(e)
+            except openai.OpenAIError as e:
+                error_msg = str(e)
                 logger.error(
                     f"[process_fine_tuning_job] Failed to upload to OpenAI: {error_msg} | "
                     f"job_id={job_id}, project_id={project_id}|"
                 )
@@
-            except openai.OpenAIError as e:
-                error_msg = handle_openai_error(e)
+            except openai.OpenAIError as e:
+                error_msg = str(e)
                 logger.error(
                     f"[process_fine_tuning_job] Failed to create OpenAI fine-tuning job: {error_msg} | "
                     f"job_id={job_id}, project_id={project_id}|"
                 )
                 update_finetune_job(
@@
-        except openai.OpenAIError as e:
-            error_msg = handle_openai_error(e)
+        except openai.OpenAIError as e:
+            error_msg = str(e)
             logger.error(
                 f"[Retrieve_fine_tune_status] Failed to retrieve OpenAI job | "
                 f"provider_job_id={mask_string(job.provider_job_id)}, "
                 f"error={error_msg}, fine_tuning_id={fine_tuning_id}, project_id={project_id}"
             )

Also applies to: 123-137, 312-325


161-176: Guard update on failure when fine_tune is None.

If fetch_by_id fails before assignment, the except block will call update_finetune_job with job=None, masking the original error. Guard it.

         except Exception as e:
             error_msg = str(e)
             logger.error(
                 f"[process_fine_tuning_job] Background job failure: {e} | "
                 f"job_id={job_id}, project_id={project_id}|"
             )
-            update_finetune_job(
-                session=session,
-                job=fine_tune,
-                update=FineTuningUpdate(
-                    status=FineTuningStatus.failed,
-                    error_message="Error while processing the background job : "
-                    + error_msg,
-                ),
-            )
+            if fine_tune is not None:
+                update_finetune_job(
+                    session=session,
+                    job=fine_tune,
+                    update=FineTuningUpdate(
+                        status=FineTuningStatus.failed,
+                        error_message="Error while processing the background job : " + error_msg,
+                    ),
+                )
+            else:
+                logger.error(
+                    "[process_fine_tuning_job] Could not mark job failed because fine_tune was not loaded."
+                )
🧹 Nitpick comments (5)
backend/app/tests/api/routes/test_fine_tuning.py (3)

18-30: Remove unused helper create_file_mock.

It’s not referenced; drop to reduce noise.

-def create_file_mock(file_type):
-    counter = {"train": 0, "test": 0}
-
-    def _side_effect(file=None, purpose=None):
-        if purpose == "fine-tune":
-            if "train" in file.name:
-                counter["train"] += 1
-                return MagicMock(id=f"file_{counter['train']}")
-            elif "test" in file.name:
-                counter["test"] += 1
-                return MagicMock(id=f"file_{counter['test']}")
-
-    return _side_effect

59-62: Avoid writing temporary JSONL files in this test.

process_fine_tuning_job is patched; these files aren’t used. Removing speeds up tests and avoids FS writes.

-        for path in ["/tmp/train.jsonl", "/tmp/test.jsonl"]:
-            with open(path, "w") as f:
-                f.write('{"prompt": "test", "completion": "label"}')
+        # No-op: background processing is mocked; no local files needed

111-116: Compare status using the Enum, not a string.

DB model uses FineTuningStatus; assert against the Enum to avoid brittle string comparisons.

-            assert (
-                job.status == "pending"
-            )  # Since background processing is mocked, status remains pending
+            assert job.status == FineTuningStatus.pending  # background job is mocked
backend/app/api/routes/fine_tuning.py (2)

326-340: Type alignment: mapped_status is an Enum; FineTuningUpdate.status is str.

Not a blocker, but consider keeping types consistent (use FineTuningStatus throughout). Also rename annotation to Optional[FineTuningStatus] for clarity.

Also applies to: 341-346


213-216: CSV validation condition is permissive by design — confirm intended.

Current check accepts if either extension OR content-type indicates CSV. If you intended both, switch to “or” inside the negation. Otherwise, this is fine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a41d99 and 724497b.

📒 Files selected for processing (4)
  • backend/app/api/routes/fine_tuning.py (7 hunks)
  • backend/app/core/finetune/evaluation.py (1 hunks)
  • backend/app/models/fine_tuning.py (1 hunks)
  • backend/app/tests/api/routes/test_fine_tuning.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/tests/api/routes/test_fine_tuning.py (6)
backend/app/tests/utils/test_data.py (1)
  • create_test_fine_tuning_jobs (133-160)
backend/app/tests/utils/utils.py (1)
  • get_document (116-137)
backend/app/models/fine_tuning.py (2)
  • Fine_Tuning (55-90)
  • FineTuningStatus (13-18)
backend/app/models/model_evaluation.py (2)
  • ModelEvaluation (38-85)
  • ModelEvaluationStatus (14-18)
backend/app/core/cloud/storage.py (5)
  • client (30-42)
  • put (120-122)
  • put (150-177)
  • get_signed_url (135-137)
  • get_signed_url (217-242)
backend/app/tests/conftest.py (2)
  • db (24-41)
  • user_api_key_header (77-79)
backend/app/api/routes/fine_tuning.py (9)
backend/app/models/fine_tuning.py (4)
  • FineTuningJobCreate (29-52)
  • FineTuningJobPublic (104-120)
  • FineTuningUpdate (93-101)
  • FineTuningStatus (13-18)
backend/app/models/document.py (1)
  • Document (20-40)
backend/app/models/model_evaluation.py (2)
  • ModelEvaluationBase (21-26)
  • ModelEvaluationStatus (14-18)
backend/app/crud/model_evaluation.py (2)
  • create_model_evaluation (20-58)
  • fetch_active_model_evals (150-170)
backend/app/core/finetune/preprocessing.py (1)
  • DataPreprocessor (17-157)
backend/app/api/routes/model_evaluation.py (1)
  • run_model_evaluation (49-112)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/core/cloud/storage.py (3)
  • get_cloud_storage (262-279)
  • put (120-122)
  • put (150-177)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/fine_tuning.py (1)

13-19: Add “cancelled” status — looks good; verify downstream handling and persistence.

Enum addition is fine. Please double-check:

  • OPENAI status mapping includes “cancelled” (it does in API routes).
  • Any DB constraints/migrations aren’t needed (column appears to be a string enum in SQLModel, so likely no migration, but verify).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/app/core/finetune/evaluation.py (1)

5-5: Consider using built-in set type.

typing.Set is deprecated in Python 3.9+. Use the built-in set instead.

Apply this diff:

-from typing import Set
+from typing import Any

And update line 38:

-        self.allowed_labels: Set[str] = set()
+        self.allowed_labels: set[str] = set()

Based on static analysis.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 724497b and 1bbd0dd.

📒 Files selected for processing (1)
  • backend/app/core/finetune/evaluation.py (12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/core/finetune/evaluation.py (2)
backend/app/core/cloud/storage.py (1)
  • AmazonCloudStorage (145-259)
backend/app/utils.py (1)
  • handle_openai_error (208-222)
🪛 Ruff (0.13.3)
backend/app/core/finetune/evaluation.py

5-5: typing.Set is deprecated, use set instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (4)
backend/app/core/finetune/evaluation.py (4)

52-53: Log message formatting improvements enhance readability.

The string splitting across multiple lines improves code readability while maintaining semantic equivalence. These changes follow Python best practices for long strings.

Also applies to: 68-70, 73-74, 89-92, 96-97, 118-119, 125-126, 137-139, 152-153, 165-166, 172-174, 178-180, 185-187, 206-207, 216-217, 221-222, 232-233


198-199: UUID-based filename generation prevents collisions.

Using uuid.uuid4().hex to generate unique filenames is a good practice that prevents potential filename collisions in S3 storage.


248-251: Multi-line docstring follows PEP 257 conventions.

Converting to a multi-line docstring improves readability and adheres to Python documentation best practices.


3-4: Import path update validated; no remaining imports of handle_openai_error from the old module. Addition of uuid import supports unique filename generation. Approving changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/api/routes/fine_tuning.py (2)

94-109: Undefined handle_openai_error and fragile exception type (openai.OpenAIError) — will raise at runtime

handle_openai_error is not defined/imported; these blocks will crash on exception. Also, with the current OpenAI Python client, OpenAIError may not exist; prefer catching Exception or openai.APIError variants.

Apply these diffs to make error handling robust and avoid NameError:

-            except openai.OpenAIError as e:
-                error_msg = handle_openai_error(e)
+            except Exception as e:
+                error_msg = getattr(e, "message", str(e))
                 logger.error(
                     f"[process_fine_tuning_job] Failed to upload to OpenAI: {error_msg} | "
                     f"job_id={job_id}, project_id={project_id}|"
                 )
-            except openai.OpenAIError as e:
-                error_msg = handle_openai_error(e)
+            except Exception as e:
+                error_msg = getattr(e, "message", str(e))
                 logger.error(
                     f"[process_fine_tuning_job] Failed to create OpenAI fine-tuning job: {error_msg} | "
                     f"job_id={job_id}, project_id={project_id}|"
                 )
-        except openai.OpenAIError as e:
-            error_msg = handle_openai_error(e)
+        except Exception as e:
+            error_msg = getattr(e, "message", str(e))
             logger.error(
                 f"[Retrieve_fine_tune_status] Failed to retrieve OpenAI job | "
                 f"provider_job_id={mask_string(job.provider_job_id)}, "
                 f"error={error_msg}, fine_tuning_id={fine_tuning_id}, project_id={project_id}"
             )
             raise HTTPException(
                 status_code=502, detail=f"OpenAI API error: {error_msg}"
             )

Also applies to: 123-138, 303-313


161-175: Guard update on background failure when fine_tune is None

If fetch_by_id failed before assignment, update_finetune_job will be called with job=None.

-            update_finetune_job(
-                session=session,
-                job=fine_tune,
-                update=FineTuningUpdate(
-                    status=FineTuningStatus.failed,
-                    error_message="Error while processing the background job : "
-                    + error_msg,
-                ),
-            )
+            if fine_tune is not None:
+                update_finetune_job(
+                    session=session,
+                    job=fine_tune,
+                    update=FineTuningUpdate(
+                        status=FineTuningStatus.failed,
+                        error_message="Error while processing the background job : "
+                        + error_msg,
+                    ),
+                )
🧹 Nitpick comments (6)
backend/app/api/routes/fine_tuning.py (6)

202-205: CSV validation is brittle; accept common CSV MIME types or require at least one signal

Current check rejects only when both extension and content-type are wrong; also many clients send application/vnd.ms-excel for CSV. Prefer validating if neither suggests CSV.

-    if not file.filename.lower().endswith(".csv") and file.content_type != "text/csv":
-        raise HTTPException(status_code=400, detail="File must be a CSV file")
+    content_type = (file.content_type or "").lower()
+    if not (file.filename.lower().endswith(".csv") or "csv" in content_type):
+        raise HTTPException(status_code=400, detail="File must be a CSV file")

213-217: S3 object key lacks filename/extension; prefer a clear, namespaced key

Storing at just UUID loses context and extension; use a stable prefix and keep .csv for easier inspection and tooling.

-    document_id = uuid4()
-    object_store_url = storage.put(file, Path(str(document_id)))
+    document_id = uuid4()
+    key = Path("datasets") / f"{document_id}.csv"
+    object_store_url = storage.put(file, key)

197-201: Deduplicate and validate split ratios before creating jobs

Avoid duplicate jobs when users pass repeated ratios, and enforce model-level validation only once.

-    try:
-        split_ratios = [float(r.strip()) for r in split_ratio.split(",")]
+    try:
+        split_ratios = [float(r.strip()) for r in split_ratio.split(",")]
+        # remove duplicates, preserve order
+        seen = set()
+        split_ratios = [r for r in split_ratios if (r not in seen and not seen.add(r))]

315-317: Type hint mismatch for mapped_status

The mapping returns FineTuningStatus, not str. Update annotation.

-        mapped_status: Optional[str] = OPENAI_TO_INTERNAL_STATUS.get(
+        mapped_status: Optional[FineTuningStatus] = OPENAI_TO_INTERNAL_STATUS.get(
             getattr(openai_job, "status", None)
         )

77-83: Cleanup may not run if preprocessing fails before upload

preprocessor.cleanup() is only in the inner upload try/finally; failures in preprocessor.process() can bypass cleanup and leak temp artifacts.

Wrap creation, processing, and upload in a broader try/finally and always call cleanup() if preprocessor was created:

  • Initialize preprocessor = None before use.
  • After creating it, use a surrounding try: ... finally: preprocessor.cleanup().

I can propose an exact patch if you prefer.

Also applies to: 110-114


280-282: Response payload: include counts for better UX (optional)

Consider adding created_count/total to the response for clients to act on.

-    return APIResponse.success_response(
-        {"message": message, "document_id": str(created_document.id), "jobs": job_infos}
-    )
+    return APIResponse.success_response(
+        {
+            "message": message,
+            "document_id": str(created_document.id),
+            "created_count": created_count,
+            "total": total,
+            "jobs": job_infos,
+        }
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bbd0dd and c47b254.

📒 Files selected for processing (1)
  • backend/app/api/routes/fine_tuning.py (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/api/routes/fine_tuning.py (9)
backend/app/models/fine_tuning.py (4)
  • FineTuningJobCreate (29-52)
  • FineTuningJobPublic (104-120)
  • FineTuningUpdate (93-101)
  • FineTuningStatus (13-18)
backend/app/models/document.py (1)
  • Document (20-40)
backend/app/models/model_evaluation.py (2)
  • ModelEvaluationBase (21-26)
  • ModelEvaluationStatus (14-18)
backend/app/crud/model_evaluation.py (2)
  • create_model_evaluation (20-58)
  • fetch_active_model_evals (150-170)
backend/app/core/finetune/preprocessing.py (1)
  • DataPreprocessor (17-157)
backend/app/api/routes/model_evaluation.py (1)
  • run_model_evaluation (49-112)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/core/cloud/storage.py (3)
  • get_cloud_storage (262-279)
  • put (120-122)
  • put (150-177)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)

Comment on lines +330 to +335
# Check if status is changing from running to completed
is_newly_completed = (
job.status == FineTuningStatus.running
and update_payload.status == FineTuningStatus.completed
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race: auto-evaluation can be created twice under concurrent refresh calls

Two concurrent requests can both observe “no active evals” and create duplicates. Add a DB-level guard (unique constraint/partial index) or a transactional re-check/catch on insert.

  • Option A (preferred): add a unique index on model_evaluation.fine_tuning_id where is_deleted=false and status != 'failed', then wrap create in try/except IntegrityError; on conflict, skip.
  • Option B: within a session transaction, lock the fine-tuning row (SELECT ... FOR UPDATE), re-check active evals, then create.

I can provide a concrete migration + guarded create if desired.

Also applies to: 343-369

🤖 Prompt for AI Agents
In backend/app/api/routes/fine_tuning.py around lines 330-335 (also applies to
343-369), two concurrent requests can both see “no active evals” and create
duplicate model_evaluation rows; implement a DB-level guard and guarded create.
Preferred fix: add a migration that creates a unique partial index on
model_evaluation(fine_tuning_id) WHERE is_deleted = false AND status !=
'failed', then wrap the create in a try/except catching IntegrityError and
skip/return if conflict occurs. Alternative: perform the check-and-create inside
a DB transaction that locks the fine_tuning row (SELECT ... FOR UPDATE),
re-check for active evals, then insert. Ensure sessions/transactions are used
consistently and surface a clear response when insert is skipped due to
integrity conflict.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/app/tests/api/routes/test_fine_tuning.py (2)

42-53: Consider extracting S3 setup to a pytest fixture.

The S3 bucket setup logic is correct and handles regional differences properly. However, if additional tests need S3 mocking in the future, this setup would need to be duplicated.

Consider creating a reusable fixture:

@pytest.fixture
def mock_s3_bucket():
    with mock_aws():
        s3 = boto3.client("s3", region_name=settings.AWS_DEFAULT_REGION)
        bucket_name = settings.AWS_S3_BUCKET_PREFIX
        if settings.AWS_DEFAULT_REGION == "us-east-1":
            s3.create_bucket(Bucket=bucket_name)
        else:
            s3.create_bucket(
                Bucket=bucket_name,
                CreateBucketConfiguration={
                    "LocationConstraint": settings.AWS_DEFAULT_REGION
                },
            )
        yield s3

110-115: Use enum values instead of string literals for status checks.

Line 113 uses a string literal "pending" instead of the imported FineTuningStatus.pending enum value. This pattern appears throughout the test file (lines 146, 174, 203, 260, 338, 392, 409, 456).

Apply this diff to use the enum consistently:

 for job in jobs:
     db.refresh(job)
     assert (
-        job.status == "pending"
+        job.status == FineTuningStatus.pending
     )  # Since background processing is mocked, status remains pending
     assert job.split_ratio in [0.5, 0.7, 0.9]

Repeat this change for all similar assertions throughout the file (lines 146, 174, 203, 260, 338, 392, 409, 456).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c04275b and 8801b39.

📒 Files selected for processing (1)
  • backend/app/tests/api/routes/test_fine_tuning.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/api/routes/test_fine_tuning.py
🧬 Code graph analysis (1)
backend/app/tests/api/routes/test_fine_tuning.py (6)
backend/app/tests/utils/test_data.py (1)
  • create_test_fine_tuning_jobs (133-160)
backend/app/tests/utils/utils.py (1)
  • get_document (116-137)
backend/app/models/fine_tuning.py (2)
  • Fine_Tuning (55-90)
  • FineTuningStatus (13-18)
backend/app/models/model_evaluation.py (2)
  • ModelEvaluation (38-85)
  • ModelEvaluationStatus (14-18)
backend/app/core/cloud/storage.py (5)
  • client (30-42)
  • put (120-122)
  • put (150-177)
  • get_signed_url (135-137)
  • get_signed_url (217-242)
backend/app/tests/conftest.py (2)
  • db (24-41)
  • user_api_key_header (77-79)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (9)
backend/app/tests/api/routes/test_fine_tuning.py (9)

1-15: LGTM!

The imports are well-organized and appropriate for the moto-based S3 mocking migration. The addition of FineTuningStatus, ModelEvaluation, and ModelEvaluationStatus aligns with the new auto-evaluation testing requirements.


118-180: LGTM with minor note!

The retrieve tests properly validate status synchronization from OpenAI and error message propagation. The updated provider_job_id naming convention ("ftjob-mock_job_123") is consistent throughout.

Note: The string literal status assertions (lines 146, 174) should use enum values as suggested in the previous comment.


182-203: LGTM!

The fetch test properly verifies that multiple jobs can be retrieved by document ID and that their attributes are correctly populated.


206-212: LGTM!

The test class is well-structured with appropriate decorators and a clear docstring explaining its purpose. The class-level patches ensure all test methods have the necessary mocks.


276-351: LGTM with verification note!

This test properly validates that duplicate evaluations are not created when an active evaluation already exists. The logic correctly checks for existing evaluations before triggering a new one.

Note: The same foreign key field verification mentioned in the previous comment applies to line 346.


352-411: LGTM!

Excellent test coverage for non-completion status transitions. The test validates that evaluations are only triggered on completion, not for intermediate states (running) or failures. This prevents unnecessary evaluation attempts.


412-467: LGTM!

This test properly validates that refreshing an already completed job doesn't trigger new evaluations. The logic ensures evaluation is only triggered on the initial status transition to completed, preventing duplicate evaluation jobs.


213-275: No changes needed for ModelEvaluation foreign key
Verified that ModelEvaluationBase defines fine_tuning_id as a foreign key. The test's usage of model_eval.fine_tuning_id is correct.


73-77: No change required for mock return type. The route casts the put result to str before persisting and Document.object_store_url is defined as a string, so returning a string in the test accurately reflects its usage.

Likely an incorrect or invalid review comment.

Comment on lines 59 to +61
for path in ["/tmp/train.jsonl", "/tmp/test.jsonl"]:
with open(path, "w") as f:
f.write("{}")

mock_preprocessor = MagicMock()
mock_preprocessor.process.return_value = {
"train_jsonl_temp_filepath": "/tmp/train.jsonl",
"train_csv_s3_object": "s3://bucket/train.csv",
"test_csv_s3_object": "s3://bucket/test.csv",
}
mock_preprocessor.cleanup = MagicMock()
mock_preprocessor_cls.return_value = mock_preprocessor

mock_openai = MagicMock()
mock_openai.files.create.side_effect = create_file_mock("fine-tune")
mock_openai.fine_tuning.jobs.create.side_effect = [
MagicMock(id=f"ft_mock_job_{i}", status="running") for i in range(1, 4)
]
mock_get_openai_client.return_value = mock_openai

body = {
"document_id": str(document.id),
"base_model": "gpt-4",
"split_ratio": [0.5, 0.7, 0.9],
"system_prompt": "you are a model able to classify",
}

with patch("app.api.routes.fine_tuning.Session") as SessionMock:
SessionMock.return_value.__enter__.return_value = db
SessionMock.return_value.__exit__.return_value = None

response = client.post(
"/api/v1/fine_tuning/fine_tune",
json=body,
headers=user_api_key_header,
)
f.write('{"prompt": "test", "completion": "label"}')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add cleanup for temporary test files.

The test creates files in /tmp/ but doesn't clean them up after the test completes. This can lead to test pollution and potential conflicts in CI/CD environments.

Consider using pytest.fixture with cleanup or tempfile.TemporaryDirectory():

+from tempfile import TemporaryDirectory
+
 # Setup test files for preprocessing
-for path in ["/tmp/train.jsonl", "/tmp/test.jsonl"]:
-    with open(path, "w") as f:
-        f.write('{"prompt": "test", "completion": "label"}')
+with TemporaryDirectory() as tmpdir:
+    for fname in ["train.jsonl", "test.jsonl"]:
+        path = os.path.join(tmpdir, fname)
+        with open(path, "w") as f:
+            f.write('{"prompt": "test", "completion": "label"}')

Alternatively, use a pytest fixture with automatic cleanup:

@pytest.fixture
def temp_jsonl_files():
    paths = ["/tmp/train.jsonl", "/tmp/test.jsonl"]
    for path in paths:
        with open(path, "w") as f:
            f.write('{"prompt": "test", "completion": "label"}')
    yield paths
    for path in paths:
        if os.path.exists(path):
            os.remove(path)
🤖 Prompt for AI Agents
In backend/app/tests/api/routes/test_fine_tuning.py around lines 59 to 61 the
test writes files directly into /tmp without cleaning them up; change the test
to create temporary files and ensure they are removed after the test — either
replace the hardcoded /tmp paths with tempfile.NamedTemporaryFile or
tempfile.TemporaryDirectory to produce paths that are automatically cleaned, or
implement a pytest fixture that creates the files, yields their paths, and
deletes them in the teardown (remove each path with os.remove if exists). Ensure
any required imports (tempfile, os, pytest) are added and that the test uses the
fixture or temp file paths instead of hardcoded /tmp paths.

@AkhileshNegi AkhileshNegi merged commit 4813d95 into main Oct 10, 2025
3 checks passed
@AkhileshNegi AkhileshNegi deleted the feature/classification-unified-api branch October 10, 2025 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Classification: Unified API endpoint handling all finetuning - model evaluations functions internally

2 participants