-
Couldn't load subscription status.
- Fork 5
Collection : with assistant flag and additional enhancements #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughRefactors collection create/delete flows to remove payload-centric handling, add a derived Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API
participant Scheduler
participant Worker
participant LLM_Svc
participant DB
Client->>API: POST /collections/create (request JSON)
rect rgb(245,250,255)
Note over API: compute with_assistant = (model && instructions)
end
API->>Scheduler: start_job(with_assistant, request_json, project_id, org_id)
Scheduler->>DB: insert CollectionJob(status=PENDING)
Scheduler->>Client: 202 Accepted (+ job_id, optional metadata when vector-only)
rect rgb(240,255,240)
Note over Worker: async execute_job(with_assistant, job_id)
Worker->>DB: load job -> update PROCESSING
alt with_assistant == true
Worker->>LLM_Svc: create vector store
Worker->>LLM_Svc: create assistant (model+instructions)
Worker->>DB: create Collection linked to assistant
else with_assistant == false
Worker->>LLM_Svc: create vector store only
Worker->>DB: create Collection (vector-store)
end
Worker->>DB: store documents, update job -> SUCCESSFUL
Worker->>API: invoke callback with collection_job metadata
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (2)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
backend/app/models/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
backend/app/models/collection.py (1)
120-124: Pydantic v2 break: usemodel_fieldsinstead of__fields__inCreationRequest.extract_super_type.This will fail on v2 (no
__fields__). It’s used byexecute_job, so collection creation can break at runtime.Apply this robust v1/v2‑compatible diff:
- def extract_super_type(self, cls: "CreationRequest"): - for field_name in cls.__fields__.keys(): - field_value = getattr(self, field_name) - yield (field_name, field_value) + def extract_super_type(self, cls: type[SQLModel]): + # Pydantic v2: `model_fields`; v1 fallback: `__fields__` + fields = getattr(cls, "model_fields", None) or getattr(cls, "__fields__", {}) + for field_name in fields.keys(): + yield (field_name, getattr(self, field_name))backend/app/services/collections/helpers.py (1)
48-67: Validatebatch_sizeto avoid infinite loops and misuse.Guard against zero/negative values.
def batch_documents( document_crud: DocumentCrud, documents: List[UUID], batch_size: int ): @@ - docs_batches = [] + if batch_size <= 0: + raise ValueError("batch_size must be a positive integer") + docs_batches = []backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (1)
3-4: Remove unused OpenAI client and import.Cleans up dead code and silences F841.
-from openai import OpenAI from openai_responses import OpenAIMock @@ - client = OpenAI(api_key="sk-test-key") collection = get_collection(db, project=project)Also applies to: 18-19
backend/app/tests/api/routes/collections/test_collection_info.py (1)
26-33: Fix Enum-vs-string comparisons in assertions.API returns strings; tests compare against Enum instances. Compare to
.value(or cast) to avoid false failures.- assert data["status"] == CollectionJobStatus.PENDING + assert data["status"] == CollectionJobStatus.PENDING.value @@ - assert data["status"] == CollectionJobStatus.SUCCESSFUL - assert data["action_type"] == CollectionActionType.CREATE + assert data["status"] == CollectionJobStatus.SUCCESSFUL.value + assert data["action_type"] == CollectionActionType.CREATE.value @@ - assert data["status"] == CollectionJobStatus.FAILED + assert data["status"] == CollectionJobStatus.FAILED.valueAlso applies to: 54-58, 84-85
backend/app/crud/collection/collection.py (1)
22-31: Ensure updated_at is bumped on updates.set updated_at before commit so soft-deletes and other updates record a fresh timestamp.
def _update(self, collection: Collection): + collection.updated_at = now() self.session.add(collection) self.session.commit() self.session.refresh(collection)backend/app/services/collections/delete_collection.py (1)
23-52: Return type mismatch in start_job.Signature says
-> strbut returns a UUID. Return a string (or change the annotation), to avoid downstream type surprises.- return collection_job_id + return str(collection_job_id)backend/app/services/collections/create_collection.py (1)
39-70: Return type mismatch in start_job.Signature is
-> strbut returns a UUID. Align the return.- return collection_job_id + return str(collection_job_id)
🧹 Nitpick comments (10)
backend/app/models/collection.py (2)
6-6: Migrate deprecatedroot_validatorto v2model_validator(mode="before").Removes deprecation and aligns with v2 semantics. Behavior remains the same.
Based on learnings
-from pydantic import HttpUrl, root_validator +from pydantic import HttpUrl, model_validator @@ - @root_validator(pre=True) - def _assistant_fields_all_or_none(cls, values): + @model_validator(mode="before") + def _assistant_fields_all_or_none(cls, values: dict): def norm(x): return x.strip() if isinstance(x, str) and x.strip() else NoneAlso applies to: 89-105
61-69: Small copy/consistency fixes in field descriptions.
- “compatable” → “compatible”.
- Consider clarifying “[To Be Deprecated]” timeline or linking to the replacement API.
- "must be compatable with the assistants API; see the " + "must be compatible with the Assistants API; see the "Also applies to: 71-78, 80-87
backend/app/services/collections/helpers.py (1)
71-73: Add type hints to callbacks and align_backoutdocstring with generic resource.Improves readability and linting; no behavior change.
-from typing import List +from typing import List, Any, Optional @@ -class CallbackHandler: - def __init__(self, collection_job): - self.collection_job = collection_job +class CallbackHandler: + def __init__(self, collection_job: Any): + self.collection_job = collection_job @@ -class WebHookCallback(CallbackHandler): - def __init__(self, url: HttpUrl, collection_job): +class WebHookCallback(CallbackHandler): + def __init__(self, url: HttpUrl, collection_job: Any): super().__init__(collection_job) self.url = url @@ - def __call__(self, response: APIResponse): + def __call__(self, response: APIResponse) -> None: logger.info( f"[WebHookCallback.call] Posting callback | {{'url': '{self.url}'}}" ) - post_callback(self.url, response) + ok = post_callback(self.url, response) + if not ok: + logger.warning("[WebHookCallback.call] Callback POST failed") @@ -def _backout(crud, llm_service_id: str): - """Best-effort cleanup: attempt to delete the assistant by ID""" +def _backout(crud, llm_service_id: str): + """Best-effort cleanup: attempt to delete the remote resource by ID"""Also applies to: 92-119
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
71-75: Remove unused APIKey lookup.
api_keyisn’t used; delete dead code to keep tests lean.- stmt = select(APIKey).where( - APIKey.project_id == project.id, APIKey.is_deleted == False - ) - api_key = db.exec(stmt).first()backend/app/api/routes/collections.py (1)
55-58: Make with_assistant detection robust to “provided vs empty”.Use Pydantic v2’s
model_fields_setto detect fields provided in the body and keep truthiness checks explicit.- # True iff both model and instructions were provided in the request body - with_assistant = bool( - getattr(request, "model", None) and getattr(request, "instructions", None) - ) + # True iff both fields were present in the payload and non-empty + provided = getattr(request, "model_fields_set", set()) + with_assistant = ( + {"model", "instructions"}.issubset(provided) + and bool(getattr(request, "model")) + and bool(getattr(request, "instructions")) + )If you intend “present regardless of emptiness,” drop the last two
bool(...)checks.backend/app/crud/collection/collection.py (1)
1-6: Duplicate import.Drop the second
import logging.-import logging +import loggingbackend/app/services/collections/delete_collection.py (3)
83-96: Brittle service detection by display name.String-equality on
llm_service_nameis fragile. At minimum, normalize withcasefold()and accept a small set of aliases; longer term, persist a canonical enum/boolean (e.g.,service_typeorwith_assistant) on Collection.- service = (collection.llm_service_name or "").strip().lower() - is_vector = service == "openai vector store" + service = (collection.llm_service_name or "").casefold().strip() + is_vector = service in {"openai vector store", "vector_store", "vector-store"}Please confirm all producers set
llm_service_nameconsistently; otherwise, consider adding aservice_typeenum.
85-95: Unnecessary getattr fallbacks; simplify to the modeled field.
Collectionexposesllm_service_id. Thevector_store_id/assistant_idattributes aren’t part of the model here, so the fallbacks add noise.- llm_service_id = ( - ( - getattr(collection, "vector_store_id", None) - or getattr(collection, "llm_service_id", None) - ) - if is_vector - else ( - getattr(collection, "assistant_id", None) - or getattr(collection, "llm_service_id", None) - ) - ) + llm_service_id = getattr(collection, "llm_service_id", None)
54-62: Unused parametertask_instance.Prefix with
_(or remove) to satisfy linters.-def execute_job( +def execute_job( request: dict, project_id: int, organization_id: int, task_id: str, job_id: str, collection_id: str, - task_instance, + _task_instance, ) -> None:backend/app/services/collections/create_collection.py (1)
72-80: Unused parametertask_instance.Rename to
_task_instance(or drop if not required by the worker runner).-def execute_job( +def execute_job( request: dict, project_id: int, organization_id: int, task_id: str, job_id: str, - with_assistant: bool, - task_instance, + with_assistant: bool, + _task_instance, ) -> None:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
backend/app/api/routes/collections.py(3 hunks)backend/app/crud/collection/collection.py(3 hunks)backend/app/models/collection.py(2 hunks)backend/app/services/collections/create_collection.py(3 hunks)backend/app/services/collections/delete_collection.py(3 hunks)backend/app/services/collections/helpers.py(2 hunks)backend/app/tests/api/routes/collections/test_collection_info.py(3 hunks)backend/app/tests/api/routes/collections/test_create_collections.py(1 hunks)backend/app/tests/crud/collections/collection/test_crud_collection_create.py(2 hunks)backend/app/tests/crud/collections/collection/test_crud_collection_delete.py(1 hunks)backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py(2 hunks)backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py(1 hunks)backend/app/tests/services/collections/test_create_collection.py(7 hunks)backend/app/tests/services/collections/test_delete_collection.py(6 hunks)backend/app/tests/services/collections/test_helpers.py(1 hunks)backend/app/tests/utils/collection.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/crud/collection/collection.pybackend/app/tests/utils/collection.pybackend/app/tests/api/routes/collections/test_collection_info.pybackend/app/tests/crud/collections/collection/test_crud_collection_read_all.pybackend/app/tests/api/routes/collections/test_create_collections.pybackend/app/api/routes/collections.pybackend/app/services/collections/delete_collection.pybackend/app/models/collection.pybackend/app/services/collections/create_collection.pybackend/app/services/collections/helpers.pybackend/app/tests/services/collections/test_create_collection.pybackend/app/tests/crud/collections/collection/test_crud_collection_read_one.pybackend/app/tests/services/collections/test_helpers.pybackend/app/tests/crud/collections/collection/test_crud_collection_delete.pybackend/app/tests/services/collections/test_delete_collection.pybackend/app/tests/crud/collections/collection/test_crud_collection_create.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/collection/collection.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/collections.py
backend/app/services/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement business logic services under backend/app/services/
Files:
backend/app/services/collections/delete_collection.pybackend/app/services/collections/create_collection.pybackend/app/services/collections/helpers.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/collection.py
🧬 Code graph analysis (15)
backend/app/crud/collection/collection.py (4)
backend/app/models/collection.py (1)
Collection(13-36)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/core/util.py (1)
now(15-16)backend/app/models/document_collection.py (1)
DocumentCollection(9-23)
backend/app/tests/utils/collection.py (5)
backend/app/models/collection.py (1)
Collection(13-36)backend/app/models/collection_job.py (3)
CollectionActionType(18-20)CollectionJob(35-64)CollectionJobStatus(11-15)backend/app/crud/collection/collection.py (2)
CollectionCrud(17-154)create(42-66)backend/app/crud/collection/collection_job.py (2)
CollectionJobCrud(19-96)create(78-96)backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
get_collection(16-33)
backend/app/tests/api/routes/collections/test_collection_info.py (4)
backend/app/tests/utils/utils.py (1)
get_project(54-73)backend/app/tests/utils/collection.py (2)
get_collection(25-48)get_collection_job(77-98)backend/app/models/collection_job.py (2)
CollectionActionType(18-20)CollectionJobStatus(11-15)backend/app/tests/conftest.py (3)
db(28-45)client(65-68)user_api_key_header(90-92)
backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)
backend/app/tests/utils/collection.py (1)
get_collection(25-48)
backend/app/tests/api/routes/collections/test_create_collections.py (3)
backend/app/models/collection_job.py (2)
CollectionJobStatus(11-15)CollectionActionType(18-20)backend/app/models/collection.py (1)
CreationRequest(115-123)backend/app/tests/conftest.py (3)
client(65-68)user_api_key_header(90-92)user_api_key(102-104)
backend/app/api/routes/collections.py (3)
backend/app/services/collections/create_collection.py (1)
start_job(39-69)backend/app/services/collections/delete_collection.py (1)
start_job(23-51)backend/app/utils.py (2)
APIResponse(29-53)success_response(36-39)
backend/app/services/collections/delete_collection.py (7)
backend/app/crud/rag/open_ai.py (2)
OpenAIAssistantCrud(192-246)OpenAIVectorStoreCrud(101-189)backend/app/models/collection_job.py (2)
CollectionJobStatus(11-15)CollectionJobUpdate(74-79)backend/app/models/collection.py (1)
DeletionRequest(126-127)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/crud/collection/collection_job.py (3)
CollectionJobCrud(19-96)read_one(24-45)update(58-76)backend/app/crud/collection/collection.py (4)
read_one(68-92)CollectionCrud(17-154)delete(112-120)delete_by_id(105-109)backend/app/services/collections/helpers.py (8)
SilentCallback(81-88)WebHookCallback(91-118)fail(74-75)fail(82-84)fail(105-113)success(77-78)success(86-88)success(115-118)
backend/app/services/collections/create_collection.py (10)
backend/app/models/collection.py (4)
CreationRequest(115-123)extract_super_type(120-123)AssistantOptions(57-105)Collection(13-36)backend/app/crud/collection/collection_job.py (4)
CollectionJobCrud(19-96)read_one(24-45)update(58-76)create(78-96)backend/app/crud/collection/collection.py (3)
read_one(68-92)create(42-66)CollectionCrud(17-154)backend/app/models/collection_job.py (2)
CollectionJobUpdate(74-79)CollectionJobStatus(11-15)backend/app/utils.py (1)
get_openai_client(175-205)backend/app/core/cloud/storage.py (1)
get_cloud_storage(262-279)backend/app/crud/document.py (1)
DocumentCrud(14-135)backend/app/services/collections/helpers.py (10)
batch_documents(48-67)SilentCallback(81-88)WebHookCallback(91-118)success(77-78)success(86-88)success(115-118)_backout(121-129)fail(74-75)fail(82-84)fail(105-113)backend/app/crud/rag/open_ai.py (2)
OpenAIVectorStoreCrud(101-189)OpenAIAssistantCrud(192-246)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)
backend/app/services/collections/helpers.py (4)
backend/app/utils.py (3)
APIResponse(29-53)failure_response(42-53)success_response(36-39)backend/app/core/util.py (1)
post_callback(29-39)backend/app/crud/collection/collection.py (1)
delete(112-120)backend/app/tests/services/collections/test_helpers.py (2)
delete(145-146)delete(158-159)
backend/app/tests/services/collections/test_create_collection.py (6)
backend/app/models/collection.py (1)
CreationRequest(115-123)backend/app/api/routes/collections.py (1)
create_collection(41-79)backend/app/services/collections/create_collection.py (2)
start_job(39-69)execute_job(72-238)backend/app/tests/utils/openai.py (1)
get_mock_openai_client_with_vector_store(90-117)backend/app/tests/utils/collection.py (1)
get_collection_job(77-98)backend/app/tests/conftest.py (1)
db(28-45)
backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py (3)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
get_collection(16-33)backend/app/tests/utils/collection.py (1)
get_collection(25-48)backend/app/tests/utils/utils.py (1)
get_project(54-73)
backend/app/tests/services/collections/test_helpers.py (2)
backend/app/services/collections/helpers.py (11)
extract_error_message(19-45)batch_documents(48-67)SilentCallback(81-88)success(77-78)success(86-88)success(115-118)fail(74-75)fail(82-84)fail(105-113)WebHookCallback(91-118)_backout(121-129)backend/app/utils.py (1)
APIResponse(29-53)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (5)
backend/app/crud/collection/collection.py (2)
CollectionCrud(17-154)create(42-66)backend/app/models/collection.py (1)
Collection(13-36)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/tests/utils/utils.py (1)
get_project(54-73)backend/app/tests/utils/collection.py (1)
get_collection(25-48)
backend/app/tests/services/collections/test_delete_collection.py (6)
backend/app/tests/utils/utils.py (1)
get_project(54-73)backend/app/crud/collection/collection_job.py (2)
CollectionJobCrud(19-96)read_one(24-45)backend/app/models/collection_job.py (2)
CollectionJobStatus(11-15)CollectionActionType(18-20)backend/app/tests/utils/collection.py (2)
get_collection(25-48)get_collection_job(77-98)backend/app/services/collections/delete_collection.py (2)
start_job(23-51)execute_job(54-194)backend/app/crud/collection/collection.py (5)
_(123-130)_(133-154)delete(112-120)delete_by_id(105-109)read_one(68-92)
backend/app/tests/crud/collections/collection/test_crud_collection_create.py (4)
backend/app/crud/collection/collection.py (1)
CollectionCrud(17-154)backend/app/models/document_collection.py (1)
DocumentCollection(9-23)backend/app/models/collection.py (1)
Collection(13-36)backend/app/tests/utils/document.py (1)
project(65-66)
🪛 Ruff (0.14.0)
backend/app/tests/utils/collection.py
30-30: Unused function argument: model
(ARG001)
56-56: Unused function argument: model
(ARG001)
backend/app/tests/api/routes/collections/test_create_collections.py
5-5: Redefinition of unused patch from line 2
Remove definition: patch
(F811)
backend/app/services/collections/create_collection.py
79-79: Unused function argument: task_instance
(ARG001)
backend/app/tests/services/collections/test_create_collection.py
220-220: Local variable mock_client is assigned to but never used
Remove assignment to unused variable mock_client
(F841)
backend/app/tests/crud/collections/collection/test_crud_collection_read_one.py
18-18: Local variable client is assigned to but never used
Remove assignment to unused variable client
(F841)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py
16-16: Redefinition of unused get_collection from line 13
(F811)
🔇 Additional comments (19)
backend/app/tests/crud/collections/collection/test_crud_collection_create.py (1)
18-25: LGTM — directCollectionconstruction matches the model and CRUD flow.backend/app/tests/crud/collections/collection/test_crud_collection_read_all.py (1)
14-29: The review comment is technically incorrect about the validation behavior.The Collection model does not define
assistant_idorvector_store_idfields, but this won't cause validation errors. By default, SQLModel/Pydantic v2 usesextra='ignore'behavior, so extra fields are silently ignored rather than forbidden. The Collection model has no custommodel_configorConfigclass that would change this.While passing undefined parameters is poor practice and suggests a code organization issue, it will not error under Pydantic v2 as claimed. The actual concern is code clarity: these extra parameters are silently dropped, making the helper's intent unclear.
The unused
client = OpenAI(...)variable on line 19 is a valid minor concern but unrelated to the Collection model issue.Likely an incorrect or invalid review comment.
backend/app/api/routes/collections.py (1)
69-79: Nice UX: metadata explains vector-store-only path.Clear guidance in responses when
with_assistantis False. LGTM.backend/app/tests/services/collections/test_delete_collection.py (3)
16-21: start_job test aligns with new scheduling contract.Asserts on kwargs (function_path, ids, payload via
model_dump(mode="json")) look correct.Also applies to: 61-73
98-137: execute_job success: verifies remote delete and local soft-delete.Good use of patches; asserts on
delete_by_id, task/status updates, and assistant id match are accurate.
160-205: execute_job failure: correct FAILED status and no local delete.Mocked exception path and assertions look right, including error_message contents.
backend/app/tests/services/collections/test_helpers.py (5)
10-35: extract_error_message tests cover JSON, dict, fallback, truncation.Concise and robust. LGTM.
41-66: batch_documents tests validate chunking and outputs.Even/ragged/empty paths well covered.
86-91: SilentCallback no-ops confirmed.Simple and sufficient.
93-135: WebHookCallback success/failure flows verified with metadata.Good assertions on APIResponse shape and job id propagation.
140-161: _backout cleanup behavior validated.Delete called once; OpenAIError swallowed as intended.
backend/app/crud/collection/collection.py (1)
140-143: Good: filter out already-deleted collections when deleting by Document.This avoids double-deletes and keeps semantics consistent.
backend/app/tests/api/routes/collections/test_create_collections.py (4)
12-14: Helper for metadata is tidy.Nicely handles both
metadataandmeta.
16-66: End-to-end assertions look solid.Good coverage: enums, ids, trace, and
with_assistant=Truepropagation tostart_job.
68-110: Vector-only path checks are appropriate.Validates metadata note and
with_assistant=Falseflow.
112-137: Validation error test is precise.Matches the model’s “both or none” requirement.
backend/app/services/collections/create_collection.py (3)
125-153: Resource creation flow looks correct.Vector store created/updated; assistant created conditionally; IDs wired into Collection. Logs are informative.
154-195: Metrics/logging block is useful.File types, sizes, and timing aid observability.
220-238: Backout and failure marking are robust.Best-effort cleanup plus job status update and callback are handled.
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py
Outdated
Show resolved
Hide resolved
backend/app/tests/services/collections/test_create_collection.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/api/docs/collections/job_info.md (1)
9-9: Improve clarity of grammar on collection deletion response.Line 9 uses awkward phrasing: "nothing will be returned as collection" is unclear.
Apply this diff to improve clarity:
-* If the job of delete collection was successful, we will get the status as successful and nothing will be returned as collection. +* If the collection deletion job was successful, the status will be marked as successful and no collection object will be returned.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/api/docs/collections/create.md(2 hunks)backend/app/api/docs/collections/delete.md(1 hunks)backend/app/api/docs/collections/job_info.md(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- backend/app/api/docs/collections/create.md
🧰 Additional context used
🪛 LanguageTool
backend/app/api/docs/collections/delete.md
[grammar] ~13-~13: Ensure spelling is correct
Context: ...deleted and marked as deleted. You will recieve a message in the callback_url about t...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/crud/collection/collection.py (3)
1-5: Remove duplicateloggingimport.The
loggingmodule is imported on both lines 1 and 5.import logging import functools as ft from uuid import UUID from typing import Optional -import logging
22-30: Add return type hint to_updatemethod.Per the coding guidelines for Python 3.11+ projects, type hints should be used consistently.
As per coding guidelines.
- def _update(self, collection: Collection): + def _update(self, collection: Collection) -> Collection: self.session.add(collection) self.session.commit() self.session.refresh(collection)
94-103: Add return type hint toread_allmethod.Per the coding guidelines for Python 3.11+ projects, type hints should be used consistently.
As per coding guidelines.
- def read_all(self): + def read_all(self) -> list[Collection]: statement = select(Collection).where( and_( Collection.project_id == self.project_id,
♻️ Duplicate comments (2)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (1)
15-34: Org/project mismatch: helper ignores passedproject_idwhen fetching project.The helper calls
get_project(db)which retrieves a random active project, but then uses the passedproject_idparameter directly when constructing the Collection. This causesorganization_idandproject_idto reference mismatched entities when the caller passes a specificproject_id.Accept an explicit
projectparameter instead:def get_collection_for_delete( - db: Session, client=None, project_id: int = None + db: Session, project, client=None ) -> Collection: - project = get_project(db) if client is None: client = OpenAI(api_key="test_api_key") vector_store = client.vector_stores.create() assistant = client.beta.assistants.create( model="gpt-4o", tools=[{"type": "file_search"}], tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}}, ) return Collection( organization_id=project.organization_id, - project_id=project_id, + project_id=project.id, llm_service_id=assistant.id, llm_service_name="gpt-4o", )Then update call sites:
-collection = get_collection_for_delete(db, client, project_id=project.id) +collection = get_collection_for_delete(db, project, client)backend/app/crud/collection/collection.py (1)
42-66: PK collision risk with soft-deleted rows remains unresolved.The past review identified that
read_one()filters out soft-deleted collections (line 73:Collection.deleted_at.is_(None)). If a soft-deleted collection with the sameidexists,read_one()raises 404, and the subsequentsession.add()at line 59 attempts an insert with a duplicate primary key, causing anIntegrityError.The suggested fix from the previous review was not applied. Additionally, the
existing = Noneinitialization on line 45 is redundant.Apply the corrected diff from the past review:
def create( self, collection: Collection, documents: Optional[list[Document]] = None ) -> Collection: - existing = None - try: - existing = self.read_one(collection.id) - except HTTPException as e: - if e.status_code != 404: - raise - - if existing is not None: + # First, see if any row exists for this id (deleted or not) + any_row = self.session.exec( + select(Collection).where(Collection.id == collection.id) + ).one_or_none() + if any_row is not None: + if any_row.deleted_at is None: + logger.warning( + "[CollectionCrud.create] Collection already present | {'collection_id': '%s'}", + collection.id, + ) + return any_row + # id is already used by a soft-deleted row + raise HTTPException(status_code=409, detail="Collection id already exists (soft-deleted)") - logger.warning( - "[CollectionCrud.create] Collection already present | " - f"{{'collection_id': '{collection.id}'}}" - ) - return existing self.session.add(collection) self.session.commit() self.session.refresh(collection)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/api/docs/collections/delete.md(1 hunks)backend/app/crud/collection/collection.py(3 hunks)backend/app/models/collection.py(3 hunks)backend/app/tests/api/routes/collections/test_create_collections.py(1 hunks)backend/app/tests/crud/collections/collection/test_crud_collection_delete.py(4 hunks)backend/app/tests/services/collections/test_create_collection.py(7 hunks)backend/app/tests/utils/collection.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/api/docs/collections/delete.md
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/tests/crud/collections/collection/test_crud_collection_delete.pybackend/app/tests/services/collections/test_create_collection.pybackend/app/models/collection.pybackend/app/crud/collection/collection.pybackend/app/tests/api/routes/collections/test_create_collections.pybackend/app/tests/utils/collection.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/collection.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/collection/collection.py
🧬 Code graph analysis (5)
backend/app/tests/crud/collections/collection/test_crud_collection_delete.py (5)
backend/app/crud/collection/collection.py (2)
CollectionCrud(17-154)create(42-66)backend/app/models/collection.py (1)
Collection(13-36)backend/app/crud/rag/open_ai.py (1)
OpenAIAssistantCrud(192-246)backend/app/tests/utils/utils.py (1)
get_project(54-73)backend/app/tests/utils/document.py (2)
DocumentStore(53-80)project(65-66)
backend/app/tests/services/collections/test_create_collection.py (4)
backend/app/services/collections/create_collection.py (2)
start_job(39-69)execute_job(72-238)backend/app/tests/utils/openai.py (1)
get_mock_openai_client_with_vector_store(90-117)backend/app/tests/utils/collection.py (1)
get_collection_job(74-95)backend/app/crud/collection/collection_job.py (4)
create(78-96)update(58-76)CollectionJobCrud(19-96)read_one(24-45)
backend/app/crud/collection/collection.py (4)
backend/app/models/collection.py (1)
Collection(13-36)backend/app/crud/document_collection.py (1)
DocumentCollectionCrud(8-48)backend/app/core/util.py (1)
now(15-16)backend/app/models/document_collection.py (1)
DocumentCollection(9-23)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/models/collection_job.py (2)
CollectionJobStatus(11-15)CollectionActionType(18-20)backend/app/tests/conftest.py (3)
client(65-68)user_api_key_header(90-92)user_api_key(102-104)
backend/app/tests/utils/collection.py (5)
backend/app/models/collection.py (1)
Collection(13-36)backend/app/models/collection_job.py (3)
CollectionActionType(18-20)CollectionJob(35-64)CollectionJobStatus(11-15)backend/app/crud/collection/collection.py (2)
CollectionCrud(17-154)create(42-66)backend/app/crud/collection/collection_job.py (2)
CollectionJobCrud(19-96)create(78-96)backend/app/tests/conftest.py (1)
db(28-45)
🔇 Additional comments (8)
backend/app/tests/api/routes/collections/test_create_collections.py (1)
1-136: LGTM!The test suite comprehensively covers the new
with_assistantflow, including:
- Assistant-enabled path with proper callback verification
- Vector-only path with metadata validation
- Validation error for incomplete assistant options
The assertions correctly verify job status, metadata content, and all callback arguments.
backend/app/tests/services/collections/test_create_collection.py (1)
1-255: LGTM!The service tests correctly verify the updated
start_jobandexecute_jobflows withwith_assistant=True. The new failure test at lines 194-255 properly validates error handling and vector store cleanup on assistant creation failure.backend/app/models/collection.py (1)
120-123: LGTM!Correctly updated to use
model_fields(Pydantic v2 API) instead of the deprecated__fields__.backend/app/tests/utils/collection.py (3)
25-47: LGTM!The
get_collectionhelper correctly creates an assistant-backed Collection with all required fields matching the schema. Themodelparameter is properly used forllm_service_name.
50-71: LGTM!The
get_vector_store_collectionhelper correctly creates a vector-store-only Collection without assistant fields. All fields match the Collection schema.
74-95: LGTM!The
get_collection_jobhelper provides a clean, reusable way to seed CollectionJob test data with configurable fields. The enum value extraction handles both enum instances and string values properly.backend/app/crud/collection/collection.py (2)
105-109: LGTM: Clean soft-delete implementation.The new
delete_by_idmethod correctly implements soft-delete by reading the collection (which verifies it exists and is not already deleted), settingdeleted_at, and persisting via_update.
140-143: Good addition: filter excludes soft-deleted collections.The updated
whereclause correctly filters out soft-deleted collections when processing document deletions, preventing redundant deletion attempts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/models/collection.py (2)
61-69: Surface deprecations in OpenAPI schema (don’t just say “To Be Deprecated”).Mark these fields as deprecated in the generated schema so client SDKs and docs flag them. Add json_schema_extra={"deprecated": True} to model, instructions, and temperature.
Apply:
- model: Optional[str] = Field( - default=None, + model: Optional[str] = Field( + default=None, description=( "**[To Be Deprecated]** " "OpenAI model to attach to this assistant. The model " "must be compatable with the assistants API; see the " "OpenAI [model documentation](https://platform.openai.com/docs/models/compare) for more." - ), + ), + json_schema_extra={"deprecated": True}, ) @@ - instructions: Optional[str] = Field( - default=None, + instructions: Optional[str] = Field( + default=None, description=( "**[To Be Deprecated]** " "Assistant instruction. Sometimes referred to as the " '"system" prompt.' - ), + ), + json_schema_extra={"deprecated": True}, ) @@ - temperature: float = Field( - default=1e-6, + temperature: float = Field( + default=1e-6, description=( "**[To Be Deprecated]** " "Model temperature. The default is slightly " "greater-than zero because it is [unknown how OpenAI " "handles zero](https://community.openai.com/t/clarifications-on-setting-temperature-0/886447/5)." - ), + ), + json_schema_extra={"deprecated": True}, )Based on learnings.
Also applies to: 71-78, 80-87
122-122: v2-compatible field access is correct; tighten typing and simplify iteration.Switch the parameter type to a model class and add a return type; iterating dict keys explicitly isn’t needed.
Apply:
-from typing import Any, Optional +from typing import Any, Optional, Iterator @@ - def extract_super_type(self, cls: "CreationRequest"): - for field_name in cls.model_fields.keys(): + def extract_super_type(self, cls: type[SQLModel]) -> Iterator[tuple[str, Any]]: + for field_name in cls.model_fields: field_value = getattr(self, field_name) yield (field_name, field_value)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/models/collection.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/models/collection.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/collection.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/collection.py (1)
6-6: Good move to Pydantic v2 API.Importing model_validator is correct for v2; no issues.
Summary
Target issue is #398
Explain the motivation for making this change. What existing problem does the pull request solve?
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Create job (create_collection.execute_job) :
Introduced with_assistant flag: when model & instructions are both present, create an Assistant backed by a Vector Store; otherwise only create a Vector Store.
We now close the DB session before making external API calls—and reopen a fresh session only to persist results—so we don’t hold connections or risk transaction locks. Same is being done during deletion of collection.
Delete job (delete_collection.execute_job) :
Detects service type from collection.llm_service_name (Assistant vs Vector Store) and chooses the correct CRUD (OpenAIAssistantCrud / OpenAIVectorStoreCrud) to use delete method of that specific CRUD class.
Calls local deletion via CollectionCrud.delete_by_id() once remote deletion succeeds.
CRUD refactors :
CollectionCrud.delete_by_id() sets deleted_at=now() and persists.
Document→Collection cascade delete now excludes already-deleted collections (deleted_at IS NULL).
API route :
/collections/create: determines with_assistant by checking both model and instructions. Returns metadata note when creating vector-store-only.
Validation: if only one of model or instructions is provided, responds 422 with a clear message.
Additional :
Removed ResponsePayload class because we are now returning collection job object in response of create and delete collections, we use the collection job object for the callback URL as well.
Rephrased the endpoint docs to remove the indication that user can use any specific endpoint for polling.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation