Skip to content

Conversation

@nishika26
Copy link
Collaborator

@nishika26 nishika26 commented Oct 20, 2025

Summary

Target issue is #398
Explain the motivation for making this change. What existing problem does the pull request solve?

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Create job (create_collection.execute_job) :

  • Introduced with_assistant flag: when model & instructions are both present, create an Assistant backed by a Vector Store; otherwise only create a Vector Store.

  • We now close the DB session before making external API calls—and reopen a fresh session only to persist results—so we don’t hold connections or risk transaction locks. Same is being done during deletion of collection.

Delete job (delete_collection.execute_job) :

  • Detects service type from collection.llm_service_name (Assistant vs Vector Store) and chooses the correct CRUD (OpenAIAssistantCrud / OpenAIVectorStoreCrud) to use delete method of that specific CRUD class.

  • Calls local deletion via CollectionCrud.delete_by_id() once remote deletion succeeds.

CRUD refactors :

  • CollectionCrud.delete_by_id() sets deleted_at=now() and persists.

  • Document→Collection cascade delete now excludes already-deleted collections (deleted_at IS NULL).

API route :

  • /collections/create: determines with_assistant by checking both model and instructions. Returns metadata note when creating vector-store-only.

  • Validation: if only one of model or instructions is provided, responds 422 with a clear message.

Additional :

  • Removed ResponsePayload class because we are now returning collection job object in response of create and delete collections, we use the collection job object for the callback URL as well.

  • Rephrased the endpoint docs to remove the indication that user can use any specific endpoint for polling.

Summary by CodeRabbit

  • New Features

    • Collections can now be created with optional AI assistant support when both model and instructions are provided; vector-only collections remain available without assistant requirements.
    • Callback notifications now supported for collection deletion operations.
  • Bug Fixes

    • Enhanced validation to ensure assistant model and instructions are provided together or not at all.
  • Documentation

    • Clarified collection creation behavior distinguishing assistant-backed vs. vector-only workflows.
    • Updated deletion endpoint documentation regarding callback notifications.
    • Improved collection job status endpoint descriptions.

@coderabbitai
Copy link

coderabbitai bot commented Oct 20, 2025

Walkthrough

Refactors collection create/delete flows to remove payload-centric handling, add a derived with_assistant flag, move callbacks to operate on collection-job metadata, make AssistantOptions fields optional with validation, add CollectionCrud.delete_by_id, and update services, helpers, routes, docs, and tests to use JSON-serialized requests and job-driven signatures.

Changes

Cohort / File(s) Summary
API Route Handlers
backend/app/api/routes/collections.py
Derive with_assistant from request fields; import HTTPException; pass with_assistant and request.model_dump(mode="json") to start_job; remove ResponsePayload usage and dynamic payload/route lookup; return optional metadata for vector-only create.
Collection CRUD
backend/app/crud/collection/collection.py
Remove owner-enforcement in _update; create returns existing when present or creates and persists (optionally creating documents); add delete_by_id(UUID) soft-delete; filter document operations by deleted_at.is_(None).
Collection Models
backend/app/models/collection.py
Remove ResponsePayload; make AssistantOptions.model and .instructions optional; add @model_validator(mode="before") to require both-or-none and normalize whitespace; CreationRequest.extract_super_type uses cls.model_fields.
Create Service
backend/app/services/collections/create_collection.py
Replace payload param with with_assistant: bool and JSON-serialized request; conditionally create vector store and/or assistant; update job lifecycle, callbacks, logging, add backout/cleanup on failure; remove payload reconstruction.
Delete Service
backend/app/services/collections/delete_collection.py
Use request.collection_id for context and JSON-serialized request in scheduling; remove payload param; resolve llm service type and call appropriate CRUD delete (assistant vs vector store); update job lifecycle, callbacks, and error handling.
Service Helpers
backend/app/services/collections/helpers.py
CallbackHandler / WebHookCallback now accept collection_job; WebHookCallback.__call__ takes only response; success/fail build APIResponse including collection_job_id metadata; _backout accepts generic CRUD and llm_service_id; simplified logging.
Tests — API Routes
backend/app/tests/api/routes/collections/*.py
Switch to shared helpers (get_project, get_collection, get_collection_job); patch start_job and assert with_assistant True/False and JSON request; validate metadata for vector-only creates; add validation error test; use settings.API_V1_STR.
Tests — CRUD & Utils
backend/app/tests/crud/collections/collection/*.py, backend/app/tests/utils/collection.py
New/updated helpers: get_collection(db, project, ...), get_vector_store_collection(...), get_collection_job(...); tests build Collections via CRUD and adapt signatures to project parameter; add helper for delete setups.
Tests — Services & Helpers
backend/app/tests/services/collections/*.py, backend/app/tests/services/collections/test_helpers.py
Adapt to with_assistant and JSON request.model_dump(mode="json"); add assistant failure/backout and deletion remote/local path tests; introduce comprehensive helper tests (error extraction, batching, callbacks, _backout).
API Docs
backend/app/api/docs/collections/*.md
Clarify assistant creation only when both model+instructions present; document callback behavior on deletion and rephrase job-info wording.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API
    participant Scheduler
    participant Worker
    participant LLM_Svc
    participant DB

    Client->>API: POST /collections/create (request JSON)
    rect rgb(245,250,255)
    Note over API: compute with_assistant = (model && instructions)
    end
    API->>Scheduler: start_job(with_assistant, request_json, project_id, org_id)
    Scheduler->>DB: insert CollectionJob(status=PENDING)
    Scheduler->>Client: 202 Accepted (+ job_id, optional metadata when vector-only)

    rect rgb(240,255,240)
    Note over Worker: async execute_job(with_assistant, job_id)
    Worker->>DB: load job -> update PROCESSING
    alt with_assistant == true
        Worker->>LLM_Svc: create vector store
        Worker->>LLM_Svc: create assistant (model+instructions)
        Worker->>DB: create Collection linked to assistant
    else with_assistant == false
        Worker->>LLM_Svc: create vector store only
        Worker->>DB: create Collection (vector-store)
    end
    Worker->>DB: store documents, update job -> SUCCESSFUL
    Worker->>API: invoke callback with collection_job metadata
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement, ready-for-review

Suggested reviewers

  • AkhileshNegi
  • kartpop
  • avirajsingh7

Poem

🐰 I hopped through code with nimble paws,
Jobs now carry JSON, not heavy claws.
When model and notes both come to play,
An assistant wakes; else vectors stay.
Callbacks hum — carrots for the day! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Collection : with assistant flag and additional enhancements" clearly references the primary change introduced in this pull request: the new with_assistant flag that determines whether to create an Assistant backed by a Vector Store or only a Vector Store. This flag is the main feature across the API routes, CRUD operations, services, and tests. A developer scanning git history would understand that this PR introduces assistant flag functionality. While the phrase "additional enhancements" is somewhat generic, it accurately refers to related improvements like session handling, deletion logic, and validation updates, without being misleading or as egregiously vague as generic terms like "misc updates."
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enhancement/collection_vector_store

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d19b51 and 344d77b.

📒 Files selected for processing (1)
  • backend/app/models/collection.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/collection.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/models/collection.py (3)

6-6: LGTM: Correct Pydantic v2 import.

The import of model_validator is appropriate for Pydantic v2 and replaces the deprecated v1 root_validator.


61-78: LGTM: Fields correctly made optional with clear deprecation notices.

The model and instructions fields are now optional (Optional[str] with default=None), enabling the with_assistant flag logic. Deprecation notices clearly communicate the future direction. The validator below ensures these fields are provided together or not at all.


126-126: LGTM: Correct Pydantic v2 field iteration.

The change from cls.__fields__ to cls.model_fields.keys() is the correct Pydantic v2 API for iterating over model fields.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Oct 20, 2025

@nishika26 nishika26 marked this pull request as ready for review October 20, 2025 13:25
@nishika26 nishika26 changed the title first commit containing everything Collection : with assistant flag and additional enhancements Oct 20, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
backend/app/models/collection.py (1)

120-124: Pydantic v2 break: use model_fields instead of __fields__ in CreationRequest.extract_super_type.

This will fail on v2 (no __fields__). It’s used by execute_job, so collection creation can break at runtime.

Apply this robust v1/v2‑compatible diff:

-    def extract_super_type(self, cls: "CreationRequest"):
-        for field_name in cls.__fields__.keys():
-            field_value = getattr(self, field_name)
-            yield (field_name, field_value)
+    def extract_super_type(self, cls: type[SQLModel]):
+        # Pydantic v2: `model_fields`; v1 fallback: `__fields__`
+        fields = getattr(cls, "model_fields", None) or getattr(cls, "__fields__", {})
+        for field_name in fields.keys():
+            yield (field_name, getattr(self, field_name))
backend/app/services/collections/helpers.py (1)

48-67: Validate batch_size to avoid infinite loops and misuse.

Guard against zero/negative values.

 def batch_documents(
     document_crud: DocumentCrud, documents: List[UUID], batch_size: int
 ):
@@
-    docs_batches = []
+    if batch_size <= 0:
+        raise ValueError("batch_size must be a positive integer")
+    docs_batches = []
backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (1)

3-4: Remove unused OpenAI client and import.

Cleans up dead code and silences F841.

-from openai import OpenAI
 from openai_responses import OpenAIMock
@@
-        client = OpenAI(api_key="sk-test-key")
         collection = get_collection(db, project=project)

Also applies to: 18-19

backend/app/tests/api/routes/collections/test_collection_info.py (1)

26-33: Fix Enum-vs-string comparisons in assertions.

API returns strings; tests compare against Enum instances. Compare to .value (or cast) to avoid false failures.

-    assert data["status"] == CollectionJobStatus.PENDING
+    assert data["status"] == CollectionJobStatus.PENDING.value
@@
-    assert data["status"] == CollectionJobStatus.SUCCESSFUL
-    assert data["action_type"] == CollectionActionType.CREATE
+    assert data["status"] == CollectionJobStatus.SUCCESSFUL.value
+    assert data["action_type"] == CollectionActionType.CREATE.value
@@
-    assert data["status"] == CollectionJobStatus.FAILED
+    assert data["status"] == CollectionJobStatus.FAILED.value

Also applies to: 54-58, 84-85

backend/app/crud/collection/collection.py (1)

22-31: Ensure updated_at is bumped on updates.

set updated_at before commit so soft-deletes and other updates record a fresh timestamp.

 def _update(self, collection: Collection):
+    collection.updated_at = now()
     self.session.add(collection)
     self.session.commit()
     self.session.refresh(collection)
backend/app/services/collections/delete_collection.py (1)

23-52: Return type mismatch in start_job.

Signature says -> str but returns a UUID. Return a string (or change the annotation), to avoid downstream type surprises.

-    return collection_job_id
+    return str(collection_job_id)
backend/app/services/collections/create_collection.py (1)

39-70: Return type mismatch in start_job.

Signature is -> str but returns a UUID. Align the return.

-    return collection_job_id
+    return str(collection_job_id)
🧹 Nitpick comments (10)
backend/app/models/collection.py (2)

6-6: Migrate deprecated root_validator to v2 model_validator(mode="before").

Removes deprecation and aligns with v2 semantics. Behavior remains the same.

Based on learnings

-from pydantic import HttpUrl, root_validator
+from pydantic import HttpUrl, model_validator
@@
-    @root_validator(pre=True)
-    def _assistant_fields_all_or_none(cls, values):
+    @model_validator(mode="before")
+    def _assistant_fields_all_or_none(cls, values: dict):
         def norm(x):
             return x.strip() if isinstance(x, str) and x.strip() else None

Also applies to: 89-105


61-69: Small copy/consistency fixes in field descriptions.

  • “compatable” → “compatible”.
  • Consider clarifying “[To Be Deprecated]” timeline or linking to the replacement API.
-            "must be compatable with the assistants API; see the "
+            "must be compatible with the Assistants API; see the "

Also applies to: 71-78, 80-87

backend/app/services/collections/helpers.py (1)

71-73: Add type hints to callbacks and align _backout docstring with generic resource.

Improves readability and linting; no behavior change.

-from typing import List
+from typing import List, Any, Optional
@@
-class CallbackHandler:
-    def __init__(self, collection_job):
-        self.collection_job = collection_job
+class CallbackHandler:
+    def __init__(self, collection_job: Any):
+        self.collection_job = collection_job
@@
-class WebHookCallback(CallbackHandler):
-    def __init__(self, url: HttpUrl, collection_job):
+class WebHookCallback(CallbackHandler):
+    def __init__(self, url: HttpUrl, collection_job: Any):
         super().__init__(collection_job)
         self.url = url
@@
-    def __call__(self, response: APIResponse):
+    def __call__(self, response: APIResponse) -> None:
         logger.info(
             f"[WebHookCallback.call] Posting callback | {{'url': '{self.url}'}}"
         )
-        post_callback(self.url, response)
+        ok = post_callback(self.url, response)
+        if not ok:
+            logger.warning("[WebHookCallback.call] Callback POST failed")
@@
-def _backout(crud, llm_service_id: str):
-    """Best-effort cleanup: attempt to delete the assistant by ID"""
+def _backout(crud, llm_service_id: str):
+    """Best-effort cleanup: attempt to delete the remote resource by ID"""

Also applies to: 92-119

backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)

71-75: Remove unused APIKey lookup.

api_key isn’t used; delete dead code to keep tests lean.

-        stmt = select(APIKey).where(
-            APIKey.project_id == project.id, APIKey.is_deleted == False
-        )
-        api_key = db.exec(stmt).first()
backend/app/api/routes/collections.py (1)

55-58: Make with_assistant detection robust to “provided vs empty”.

Use Pydantic v2’s model_fields_set to detect fields provided in the body and keep truthiness checks explicit.

-    # True iff both model and instructions were provided in the request body
-    with_assistant = bool(
-        getattr(request, "model", None) and getattr(request, "instructions", None)
-    )
+    # True iff both fields were present in the payload and non-empty
+    provided = getattr(request, "model_fields_set", set())
+    with_assistant = (
+        {"model", "instructions"}.issubset(provided)
+        and bool(getattr(request, "model"))
+        and bool(getattr(request, "instructions"))
+    )

If you intend “present regardless of emptiness,” drop the last two bool(...) checks.

backend/app/crud/collection/collection.py (1)

1-6: Duplicate import.

Drop the second import logging.

-import logging
+import logging
backend/app/services/collections/delete_collection.py (3)

83-96: Brittle service detection by display name.

String-equality on llm_service_name is fragile. At minimum, normalize with casefold() and accept a small set of aliases; longer term, persist a canonical enum/boolean (e.g., service_type or with_assistant) on Collection.

-        service = (collection.llm_service_name or "").strip().lower()
-        is_vector = service == "openai vector store"
+        service = (collection.llm_service_name or "").casefold().strip()
+        is_vector = service in {"openai vector store", "vector_store", "vector-store"}

Please confirm all producers set llm_service_name consistently; otherwise, consider adding a service_type enum.


85-95: Unnecessary getattr fallbacks; simplify to the modeled field.

Collection exposes llm_service_id. The vector_store_id/assistant_id attributes aren’t part of the model here, so the fallbacks add noise.

-        llm_service_id = (
-            (
-                getattr(collection, "vector_store_id", None)
-                or getattr(collection, "llm_service_id", None)
-            )
-            if is_vector
-            else (
-                getattr(collection, "assistant_id", None)
-                or getattr(collection, "llm_service_id", None)
-            )
-        )
+        llm_service_id = getattr(collection, "llm_service_id", None)

54-62: Unused parameter task_instance.

Prefix with _ (or remove) to satisfy linters.

-def execute_job(
+def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
     collection_id: str,
-    task_instance,
+    _task_instance,
 ) -> None:
backend/app/services/collections/create_collection.py (1)

72-80: Unused parameter task_instance.

Rename to _task_instance (or drop if not required by the worker runner).

-def execute_job(
+def execute_job(
     request: dict,
     project_id: int,
     organization_id: int,
     task_id: str,
     job_id: str,
-    with_assistant: bool,
-    task_instance,
+    with_assistant: bool,
+    _task_instance,
 ) -> None:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b7b640 and 642e39c.

📒 Files selected for processing (16)
  • backend/app/api/routes/collections.py (3 hunks)
  • backend/app/crud/collection/collection.py (3 hunks)
  • backend/app/models/collection.py (2 hunks)
  • backend/app/services/collections/create_collection.py (3 hunks)
  • backend/app/services/collections/delete_collection.py (3 hunks)
  • backend/app/services/collections/helpers.py (2 hunks)
  • backend/app/tests/api/routes/collections/test_collection_info.py (3 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_create.py (2 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (2 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (1 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (7 hunks)
  • backend/app/tests/services/collections/test_delete_collection.py (6 hunks)
  • backend/app/tests/services/collections/test_helpers.py (1 hunks)
  • backend/app/tests/utils/collection.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/crud/collection/collection.py
  • backend/app/tests/utils/collection.py
  • backend/app/tests/api/routes/collections/test_collection_info.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py
  • backend/app/tests/api/routes/collections/test_create_collections.py
  • backend/app/api/routes/collections.py
  • backend/app/services/collections/delete_collection.py
  • backend/app/models/collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/helpers.py
  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py
  • backend/app/tests/services/collections/test_helpers.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py
  • backend/app/tests/services/collections/test_delete_collection.py
  • backend/app/tests/crud/collections/collection/test_crud_collection_create.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/collection/collection.py
backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

  • backend/app/api/routes/collections.py
backend/app/services/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement business logic services under backend/app/services/

Files:

  • backend/app/services/collections/delete_collection.py
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/helpers.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
🧬 Code graph analysis (15)
backend/app/crud/collection/collection.py (4)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/tests/utils/collection.py (5)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/models/collection_job.py (3)
  • CollectionActionType (18-20)
  • CollectionJob (35-64)
  • CollectionJobStatus (11-15)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • create (78-96)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
  • get_collection (16-33)
backend/app/tests/api/routes/collections/test_collection_info.py (4)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-48)
  • get_collection_job (77-98)
backend/app/models/collection_job.py (2)
  • CollectionActionType (18-20)
  • CollectionJobStatus (11-15)
backend/app/tests/conftest.py (3)
  • db (28-45)
  • client (65-68)
  • user_api_key_header (90-92)
backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)
backend/app/tests/utils/collection.py (1)
  • get_collection (25-48)
backend/app/tests/api/routes/collections/test_create_collections.py (3)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
backend/app/models/collection.py (1)
  • CreationRequest (115-123)
backend/app/tests/conftest.py (3)
  • client (65-68)
  • user_api_key_header (90-92)
  • user_api_key (102-104)
backend/app/api/routes/collections.py (3)
backend/app/services/collections/create_collection.py (1)
  • start_job (39-69)
backend/app/services/collections/delete_collection.py (1)
  • start_job (23-51)
backend/app/utils.py (2)
  • APIResponse (29-53)
  • success_response (36-39)
backend/app/services/collections/delete_collection.py (7)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIAssistantCrud (192-246)
  • OpenAIVectorStoreCrud (101-189)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionJobUpdate (74-79)
backend/app/models/collection.py (1)
  • DeletionRequest (126-127)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/crud/collection/collection_job.py (3)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • update (58-76)
backend/app/crud/collection/collection.py (4)
  • read_one (68-92)
  • CollectionCrud (17-154)
  • delete (112-120)
  • delete_by_id (105-109)
backend/app/services/collections/helpers.py (8)
  • SilentCallback (81-88)
  • WebHookCallback (91-118)
  • fail (74-75)
  • fail (82-84)
  • fail (105-113)
  • success (77-78)
  • success (86-88)
  • success (115-118)
backend/app/services/collections/create_collection.py (10)
backend/app/models/collection.py (4)
  • CreationRequest (115-123)
  • extract_super_type (120-123)
  • AssistantOptions (57-105)
  • Collection (13-36)
backend/app/crud/collection/collection_job.py (4)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
  • update (58-76)
  • create (78-96)
backend/app/crud/collection/collection.py (3)
  • read_one (68-92)
  • create (42-66)
  • CollectionCrud (17-154)
backend/app/models/collection_job.py (2)
  • CollectionJobUpdate (74-79)
  • CollectionJobStatus (11-15)
backend/app/utils.py (1)
  • get_openai_client (175-205)
backend/app/core/cloud/storage.py (1)
  • get_cloud_storage (262-279)
backend/app/crud/document.py (1)
  • DocumentCrud (14-135)
backend/app/services/collections/helpers.py (10)
  • batch_documents (48-67)
  • SilentCallback (81-88)
  • WebHookCallback (91-118)
  • success (77-78)
  • success (86-88)
  • success (115-118)
  • _backout (121-129)
  • fail (74-75)
  • fail (82-84)
  • fail (105-113)
backend/app/crud/rag/open_ai.py (2)
  • OpenAIVectorStoreCrud (101-189)
  • OpenAIAssistantCrud (192-246)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/services/collections/helpers.py (4)
backend/app/utils.py (3)
  • APIResponse (29-53)
  • failure_response (42-53)
  • success_response (36-39)
backend/app/core/util.py (1)
  • post_callback (29-39)
backend/app/crud/collection/collection.py (1)
  • delete (112-120)
backend/app/tests/services/collections/test_helpers.py (2)
  • delete (145-146)
  • delete (158-159)
backend/app/tests/services/collections/test_create_collection.py (6)
backend/app/models/collection.py (1)
  • CreationRequest (115-123)
backend/app/api/routes/collections.py (1)
  • create_collection (41-79)
backend/app/services/collections/create_collection.py (2)
  • start_job (39-69)
  • execute_job (72-238)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/collection.py (1)
  • get_collection_job (77-98)
backend/app/tests/conftest.py (1)
  • db (28-45)
backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (3)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
  • get_collection (16-33)
backend/app/tests/utils/collection.py (1)
  • get_collection (25-48)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/services/collections/test_helpers.py (2)
backend/app/services/collections/helpers.py (11)
  • extract_error_message (19-45)
  • batch_documents (48-67)
  • SilentCallback (81-88)
  • success (77-78)
  • success (86-88)
  • success (115-118)
  • fail (74-75)
  • fail (82-84)
  • fail (105-113)
  • WebHookCallback (91-118)
  • _backout (121-129)
backend/app/utils.py (1)
  • APIResponse (29-53)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (5)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/collection.py (1)
  • get_collection (25-48)
backend/app/tests/services/collections/test_delete_collection.py (6)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
backend/app/tests/utils/collection.py (2)
  • get_collection (25-48)
  • get_collection_job (77-98)
backend/app/services/collections/delete_collection.py (2)
  • start_job (23-51)
  • execute_job (54-194)
backend/app/crud/collection/collection.py (5)
  • _ (123-130)
  • _ (133-154)
  • delete (112-120)
  • delete_by_id (105-109)
  • read_one (68-92)
backend/app/tests/crud/collections/collection/test_crud_collection_create.py (4)
backend/app/crud/collection/collection.py (1)
  • CollectionCrud (17-154)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/tests/utils/document.py (1)
  • project (65-66)
🪛 Ruff (0.14.0)
backend/app/tests/utils/collection.py

30-30: Unused function argument: model

(ARG001)


56-56: Unused function argument: model

(ARG001)

backend/app/tests/api/routes/collections/test_create_collections.py

5-5: Redefinition of unused patch from line 2

Remove definition: patch

(F811)

backend/app/services/collections/create_collection.py

79-79: Unused function argument: task_instance

(ARG001)

backend/app/tests/services/collections/test_create_collection.py

220-220: Local variable mock_client is assigned to but never used

Remove assignment to unused variable mock_client

(F841)

backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py

18-18: Local variable client is assigned to but never used

Remove assignment to unused variable client

(F841)

backend/app/tests/crud/collections/collection/test_crud_collection_delete.py

16-16: Redefinition of unused get_collection from line 13

(F811)

🔇 Additional comments (19)
backend/app/tests/crud/collections/collection/test_crud_collection_create.py (1)

18-25: LGTM — direct Collection construction matches the model and CRUD flow.

backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)

14-29: The review comment is technically incorrect about the validation behavior.

The Collection model does not define assistant_id or vector_store_id fields, but this won't cause validation errors. By default, SQLModel/Pydantic v2 uses extra='ignore' behavior, so extra fields are silently ignored rather than forbidden. The Collection model has no custom model_config or Config class that would change this.

While passing undefined parameters is poor practice and suggests a code organization issue, it will not error under Pydantic v2 as claimed. The actual concern is code clarity: these extra parameters are silently dropped, making the helper's intent unclear.

The unused client = OpenAI(...) variable on line 19 is a valid minor concern but unrelated to the Collection model issue.

Likely an incorrect or invalid review comment.

backend/app/api/routes/collections.py (1)

69-79: Nice UX: metadata explains vector-store-only path.

Clear guidance in responses when with_assistant is False. LGTM.

backend/app/tests/services/collections/test_delete_collection.py (3)

16-21: start_job test aligns with new scheduling contract.

Asserts on kwargs (function_path, ids, payload via model_dump(mode="json")) look correct.

Also applies to: 61-73


98-137: execute_job success: verifies remote delete and local soft-delete.

Good use of patches; asserts on delete_by_id, task/status updates, and assistant id match are accurate.


160-205: execute_job failure: correct FAILED status and no local delete.

Mocked exception path and assertions look right, including error_message contents.

backend/app/tests/services/collections/test_helpers.py (5)

10-35: extract_error_message tests cover JSON, dict, fallback, truncation.

Concise and robust. LGTM.


41-66: batch_documents tests validate chunking and outputs.

Even/ragged/empty paths well covered.


86-91: SilentCallback no-ops confirmed.

Simple and sufficient.


93-135: WebHookCallback success/failure flows verified with metadata.

Good assertions on APIResponse shape and job id propagation.


140-161: _backout cleanup behavior validated.

Delete called once; OpenAIError swallowed as intended.

backend/app/crud/collection/collection.py (1)

140-143: Good: filter out already-deleted collections when deleting by Document.

This avoids double-deletes and keeps semantics consistent.

backend/app/tests/api/routes/collections/test_create_collections.py (4)

12-14: Helper for metadata is tidy.

Nicely handles both metadata and meta.


16-66: End-to-end assertions look solid.

Good coverage: enums, ids, trace, and with_assistant=True propagation to start_job.


68-110: Vector-only path checks are appropriate.

Validates metadata note and with_assistant=False flow.


112-137: Validation error test is precise.

Matches the model’s “both or none” requirement.

backend/app/services/collections/create_collection.py (3)

125-153: Resource creation flow looks correct.

Vector store created/updated; assistant created conditionally; IDs wired into Collection. Logs are informative.


154-195: Metrics/logging block is useful.

File types, sizes, and timing aid observability.


220-238: Backout and failure marking are robust.

Best-effort cleanup plus job status update and callback are handled.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/api/docs/collections/job_info.md (1)

9-9: Improve clarity of grammar on collection deletion response.

Line 9 uses awkward phrasing: "nothing will be returned as collection" is unclear.

Apply this diff to improve clarity:

-* If the job of delete collection was successful, we will get the status as successful and nothing will be returned as collection.
+* If the collection deletion job was successful, the status will be marked as successful and no collection object will be returned.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 642e39c and aa23f3d.

📒 Files selected for processing (3)
  • backend/app/api/docs/collections/create.md (2 hunks)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/api/docs/collections/job_info.md (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • backend/app/api/docs/collections/create.md
🧰 Additional context used
🪛 LanguageTool
backend/app/api/docs/collections/delete.md

[grammar] ~13-~13: Ensure spelling is correct
Context: ...deleted and marked as deleted. You will recieve a message in the callback_url about t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/crud/collection/collection.py (3)

1-5: Remove duplicate logging import.

The logging module is imported on both lines 1 and 5.

 import logging
 import functools as ft
 from uuid import UUID
 from typing import Optional
-import logging

22-30: Add return type hint to _update method.

Per the coding guidelines for Python 3.11+ projects, type hints should be used consistently.

As per coding guidelines.

-    def _update(self, collection: Collection):
+    def _update(self, collection: Collection) -> Collection:
         self.session.add(collection)
         self.session.commit()
         self.session.refresh(collection)

94-103: Add return type hint to read_all method.

Per the coding guidelines for Python 3.11+ projects, type hints should be used consistently.

As per coding guidelines.

-    def read_all(self):
+    def read_all(self) -> list[Collection]:
         statement = select(Collection).where(
             and_(
                 Collection.project_id == self.project_id,
♻️ Duplicate comments (2)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)

15-34: Org/project mismatch: helper ignores passed project_id when fetching project.

The helper calls get_project(db) which retrieves a random active project, but then uses the passed project_id parameter directly when constructing the Collection. This causes organization_id and project_id to reference mismatched entities when the caller passes a specific project_id.

Accept an explicit project parameter instead:

 def get_collection_for_delete(
-    db: Session, client=None, project_id: int = None
+    db: Session, project, client=None
 ) -> Collection:
-    project = get_project(db)
     if client is None:
         client = OpenAI(api_key="test_api_key")

     vector_store = client.vector_stores.create()
     assistant = client.beta.assistants.create(
         model="gpt-4o",
         tools=[{"type": "file_search"}],
         tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
     )

     return Collection(
         organization_id=project.organization_id,
-        project_id=project_id,
+        project_id=project.id,
         llm_service_id=assistant.id,
         llm_service_name="gpt-4o",
     )

Then update call sites:

-collection = get_collection_for_delete(db, client, project_id=project.id)
+collection = get_collection_for_delete(db, project, client)
backend/app/crud/collection/collection.py (1)

42-66: PK collision risk with soft-deleted rows remains unresolved.

The past review identified that read_one() filters out soft-deleted collections (line 73: Collection.deleted_at.is_(None)). If a soft-deleted collection with the same id exists, read_one() raises 404, and the subsequent session.add() at line 59 attempts an insert with a duplicate primary key, causing an IntegrityError.

The suggested fix from the previous review was not applied. Additionally, the existing = None initialization on line 45 is redundant.

Apply the corrected diff from the past review:

 def create(
     self, collection: Collection, documents: Optional[list[Document]] = None
 ) -> Collection:
-    existing = None
-    try:
-        existing = self.read_one(collection.id)
-    except HTTPException as e:
-        if e.status_code != 404:
-            raise
-
-    if existing is not None:
+    # First, see if any row exists for this id (deleted or not)
+    any_row = self.session.exec(
+        select(Collection).where(Collection.id == collection.id)
+    ).one_or_none()
+    if any_row is not None:
+        if any_row.deleted_at is None:
+            logger.warning(
+                "[CollectionCrud.create] Collection already present | {'collection_id': '%s'}",
+                collection.id,
+            )
+            return any_row
+        # id is already used by a soft-deleted row
+        raise HTTPException(status_code=409, detail="Collection id already exists (soft-deleted)")
-        logger.warning(
-            "[CollectionCrud.create] Collection already present | "
-            f"{{'collection_id': '{collection.id}'}}"
-        )
-        return existing
 
     self.session.add(collection)
     self.session.commit()
     self.session.refresh(collection)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa23f3d and 501b46a.

📒 Files selected for processing (7)
  • backend/app/api/docs/collections/delete.md (1 hunks)
  • backend/app/crud/collection/collection.py (3 hunks)
  • backend/app/models/collection.py (3 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (1 hunks)
  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (4 hunks)
  • backend/app/tests/services/collections/test_create_collection.py (7 hunks)
  • backend/app/tests/utils/collection.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/api/docs/collections/delete.md
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/tests/crud/collections/collection/test_crud_collection_delete.py
  • backend/app/tests/services/collections/test_create_collection.py
  • backend/app/models/collection.py
  • backend/app/crud/collection/collection.py
  • backend/app/tests/api/routes/collections/test_create_collections.py
  • backend/app/tests/utils/collection.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/collection/collection.py
🧬 Code graph analysis (5)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (5)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/rag/open_ai.py (1)
  • OpenAIAssistantCrud (192-246)
backend/app/tests/utils/utils.py (1)
  • get_project (54-73)
backend/app/tests/utils/document.py (2)
  • DocumentStore (53-80)
  • project (65-66)
backend/app/tests/services/collections/test_create_collection.py (4)
backend/app/services/collections/create_collection.py (2)
  • start_job (39-69)
  • execute_job (72-238)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (90-117)
backend/app/tests/utils/collection.py (1)
  • get_collection_job (74-95)
backend/app/crud/collection/collection_job.py (4)
  • create (78-96)
  • update (58-76)
  • CollectionJobCrud (19-96)
  • read_one (24-45)
backend/app/crud/collection/collection.py (4)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/crud/document_collection.py (1)
  • DocumentCollectionCrud (8-48)
backend/app/core/util.py (1)
  • now (15-16)
backend/app/models/document_collection.py (1)
  • DocumentCollection (9-23)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/models/collection_job.py (2)
  • CollectionJobStatus (11-15)
  • CollectionActionType (18-20)
backend/app/tests/conftest.py (3)
  • client (65-68)
  • user_api_key_header (90-92)
  • user_api_key (102-104)
backend/app/tests/utils/collection.py (5)
backend/app/models/collection.py (1)
  • Collection (13-36)
backend/app/models/collection_job.py (3)
  • CollectionActionType (18-20)
  • CollectionJob (35-64)
  • CollectionJobStatus (11-15)
backend/app/crud/collection/collection.py (2)
  • CollectionCrud (17-154)
  • create (42-66)
backend/app/crud/collection/collection_job.py (2)
  • CollectionJobCrud (19-96)
  • create (78-96)
backend/app/tests/conftest.py (1)
  • db (28-45)
🔇 Additional comments (8)
backend/app/tests/api/routes/collections/test_create_collections.py (1)

1-136: LGTM!

The test suite comprehensively covers the new with_assistant flow, including:

  • Assistant-enabled path with proper callback verification
  • Vector-only path with metadata validation
  • Validation error for incomplete assistant options

The assertions correctly verify job status, metadata content, and all callback arguments.

backend/app/tests/services/collections/test_create_collection.py (1)

1-255: LGTM!

The service tests correctly verify the updated start_job and execute_job flows with with_assistant=True. The new failure test at lines 194-255 properly validates error handling and vector store cleanup on assistant creation failure.

backend/app/models/collection.py (1)

120-123: LGTM!

Correctly updated to use model_fields (Pydantic v2 API) instead of the deprecated __fields__.

backend/app/tests/utils/collection.py (3)

25-47: LGTM!

The get_collection helper correctly creates an assistant-backed Collection with all required fields matching the schema. The model parameter is properly used for llm_service_name.


50-71: LGTM!

The get_vector_store_collection helper correctly creates a vector-store-only Collection without assistant fields. All fields match the Collection schema.


74-95: LGTM!

The get_collection_job helper provides a clean, reusable way to seed CollectionJob test data with configurable fields. The enum value extraction handles both enum instances and string values properly.

backend/app/crud/collection/collection.py (2)

105-109: LGTM: Clean soft-delete implementation.

The new delete_by_id method correctly implements soft-delete by reading the collection (which verifies it exists and is not already deleted), setting deleted_at, and persisting via _update.


140-143: Good addition: filter excludes soft-deleted collections.

The updated where clause correctly filters out soft-deleted collections when processing document deletions, preventing redundant deletion attempts.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/app/models/collection.py (2)

61-69: Surface deprecations in OpenAPI schema (don’t just say “To Be Deprecated”).

Mark these fields as deprecated in the generated schema so client SDKs and docs flag them. Add json_schema_extra={"deprecated": True} to model, instructions, and temperature.

Apply:

-    model: Optional[str] = Field(
-        default=None,
+    model: Optional[str] = Field(
+        default=None,
         description=(
             "**[To Be Deprecated]**  "
             "OpenAI model to attach to this assistant. The model "
             "must be compatable with the assistants API; see the "
             "OpenAI [model documentation](https://platform.openai.com/docs/models/compare) for more."
-        ),
+        ),
+        json_schema_extra={"deprecated": True},
     )
@@
-    instructions: Optional[str] = Field(
-        default=None,
+    instructions: Optional[str] = Field(
+        default=None,
         description=(
             "**[To Be Deprecated]**  "
             "Assistant instruction. Sometimes referred to as the "
             '"system" prompt.'
-        ),
+        ),
+        json_schema_extra={"deprecated": True},
     )
@@
-    temperature: float = Field(
-        default=1e-6,
+    temperature: float = Field(
+        default=1e-6,
         description=(
             "**[To Be Deprecated]**  "
             "Model temperature. The default is slightly "
             "greater-than zero because it is [unknown how OpenAI "
             "handles zero](https://community.openai.com/t/clarifications-on-setting-temperature-0/886447/5)."
-        ),
+        ),
+        json_schema_extra={"deprecated": True},
     )

Based on learnings.

Also applies to: 71-78, 80-87


122-122: v2-compatible field access is correct; tighten typing and simplify iteration.

Switch the parameter type to a model class and add a return type; iterating dict keys explicitly isn’t needed.

Apply:

-from typing import Any, Optional
+from typing import Any, Optional, Iterator
@@
-    def extract_super_type(self, cls: "CreationRequest"):
-        for field_name in cls.model_fields.keys():
+    def extract_super_type(self, cls: type[SQLModel]) -> Iterator[tuple[str, Any]]:
+        for field_name in cls.model_fields:
             field_value = getattr(self, field_name)
             yield (field_name, field_value)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 501b46a and 1d19b51.

📒 Files selected for processing (1)
  • backend/app/models/collection.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/collection.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/collection.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/collection.py (1)

6-6: Good move to Pydantic v2 API.

Importing model_validator is correct for v2; no issues.

@nishika26 nishika26 linked an issue Oct 20, 2025 that may be closed by this pull request
@nishika26 nishika26 self-assigned this Oct 20, 2025
@nishika26 nishika26 added enhancement New feature or request ready-for-review labels Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collection : enhancement of the module

1 participant