-
Couldn't load subscription status.
- Fork 5
Refactor Collection background tasks to Remove session dependency #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughWorkers for collection create/delete no longer accept external DB sessions; they open their own SQLModel Session(engine), perform CRUD and vector-store operations inside that session, handle errors within a session, and background task invocations/tests were updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as API Client
participant API as /collections endpoints
participant BT as BackgroundTasks
participant Worker as do_create_collection / do_delete_collection
participant DB as Session(engine)
participant CRUD as *Crud classes
participant VS as Vector Store (OpenAI)
Client->>API: POST/DELETE /collections
API->>BT: add_task(Worker, current_user, request, payload, client)
note right of BT: No Session passed
BT-->>Worker: invoke (async)
rect rgba(220,240,255,0.5)
Worker->>DB: with Session(engine)
DB-->>Worker: session
Worker->>CRUD: instantiate with session
Worker->>CRUD: read/write docs & collections
Worker->>VS: create/update vector store
Worker->>CRUD: update collection status & metadata
end
alt Error
Worker->>DB: with Session(engine)
Worker->>CRUD: set status=failed, store error_message
end
Worker-->>BT: complete
BT-->>API: background task finished
API-->>Client: 202 Accepted / 200 OK
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/tests/api/routes/collections/test_create_collections.py (1)
106-106: Align assertion with Enum storage.Production compares/assigns Enum members (not
.value). Assert against the Enum to avoid brittle coupling.- assert collection.status == CollectionStatus.successful.value + assert collection.status == CollectionStatus.successful
🧹 Nitpick comments (3)
backend/app/tests/api/routes/collections/test_create_collections.py (1)
7-7: Remove unused MagicMock import.
MagicMockisn’t used in this test file. Clean it up to satisfy Ruff F401.-from unittest.mock import patch, MagicMock +from unittest.mock import patchbackend/app/api/routes/collections.py (2)
234-240: Consider using a single session/transaction for read+write path.You open one session to read docs and a second to update the collection/links. While valid, a single session can reduce state drift and extra round-trips; wrap both phases in one
with Session(engine)and flush at boundaries if needed.
252-265: Redundant commit on collection update when docs exist.
DocumentCollectionCrud.create()commits and refreshes; since the mutatedcollectionis attached to the same session, its changes are flushed on that commit._update()right after is an extra commit. Keep_update()only when there are no docs.- if flat_docs: - DocumentCollectionCrud(session).create(collection, flat_docs) - - collection_crud._update(collection) + if flat_docs: + DocumentCollectionCrud(session).create(collection, flat_docs) + else: + collection_crud._update(collection)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
backend/app/api/routes/collections.py(9 hunks)backend/app/tests/api/routes/collections/test_create_collections.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/api/routes/collections.py (6)
backend/app/tests/conftest.py (1)
db(24-41)backend/app/models/user.py (1)
UserProjectOrg(68-69)backend/app/crud/document.py (4)
DocumentCrud(14-135)update(99-124)read_one(19-34)delete(126-135)backend/app/crud/collection.py (7)
create(63-82)CollectionCrud(17-146)read_one(84-93)_update(22-46)delete(107-115)_(128-146)__init__(18-20)backend/app/crud/document_collection.py (3)
create(12-23)DocumentCollectionCrud(8-48)__init__(9-10)backend/app/core/util.py (1)
now(13-14)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/tests/conftest.py (3)
client(52-55)db(24-41)user_api_key_header(77-79)backend/app/tests/utils/openai.py (1)
get_mock_openai_client_with_vector_store(54-81)
🪛 Ruff (0.12.2)
backend/app/tests/api/routes/collections/test_create_collections.py
7-7: unittest.mock.MagicMock imported but unused
Remove unused import: unittest.mock.MagicMock
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (9)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
68-69: Session context patching looks good; consider minor hardening.Returning the test
dbfromSession().__enter__is correct for the new worker pattern. If flakiness appears later, add__exit__to no-op to avoid accidental closes by context management.
46-66: Fix BackgroundTasks.add_task side_effect signature (needs self).When patching a bound method, the mock is invoked with
selfas the first arg. Your side_effect currently misses it and will raise a TypeError. Update to acceptself.- def run_task_immediately(task_func, *args, **kwargs): - task_func(*args, **kwargs) + def run_task_immediately(self, task_func, *args, **kwargs): + task_func(*args, **kwargs)Likely an incorrect or invalid review comment.
backend/app/api/routes/collections.py (7)
215-220: Good API shift: worker now independent of request session.Switching
do_create_collectionto takeUserProjectOrgand manage its ownSessionisolates transaction boundaries and prevents closed-session reuse.
281-290: Failure path is robust and scoped; nice.Capturing a concise error message and updating status/error_message in a fresh session avoids partial state and preserves diagnostics.
326-329: Background task invocation matches new worker signature.Dropping the session arg keeps the background job decoupled from the request lifecycle.
337-358: Delete worker mirrors create worker session pattern.Creating a local session for delete is consistent and avoids request-session leakage.
16-19: Imports reflect the refactor appropriately.Bringing in
Sessionandengine, plusUserProjectOrg, is necessary for in-worker session management.Also applies to: 29-29
301-335: Minor: route payload/time calculation is fine; keep as-is.No action needed. Route creates the DB record before scheduling the task; the worker picks it up via the key.
323-329: All background task call sites match the new worker signaturesA grep for
do_create_collection(anddo_delete_collection(confirms there are no stale invocations anywhere other than the updated calls in this file. Both the/createand/deleteroutes now invoke their workers with exactly(current_user, request, payload, client), matching the definitions:
do_create_collection(current_user, request, payload, client)at lines 323–329do_delete_collection(current_user, request, payload, client)at lines 391–393No further changes are needed here.
| collection_crud = CollectionCrud(session, current_user.id) | ||
| collection = collection_crud.read_one(request.collection_id) | ||
| assistant = OpenAIAssistantCrud(client) | ||
| data = collection_crud.delete(collection, assistant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As call to openAI is tightly coupled with collection delete, session may be open untill collection deleted from openAI.
@kartpop @AkhileshNegi should we de-couple this from session?
Not doing this right now because this may require changes in document endpoint also. or better question can be do we require background job to delete a collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
if we are not doing this right now, at least put a TODO comment here in code in this PR itself so that we can revisit this later
| collection_crud = CollectionCrud(session, current_user.id) | ||
| collection = collection_crud.read_one(request.collection_id) | ||
| assistant = OpenAIAssistantCrud(client) | ||
| data = collection_crud.delete(collection, assistant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
if we are not doing this right now, at least put a TODO comment here in code in this PR itself so that we can revisit this later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
backend/app/api/routes/collections.py (1)
340-343: Same TODO + whitespace—dedupe and trim.This duplicates the earlier TODO and has trailing whitespace (W291). Keep a single TODO above both workers and remove the duplicate.
Apply whitespace-only fix:
-# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload) -# into background job functions. When moving to Celery or another task queue, -# this can cause issues with serialization/deserialization. Instead, pass only +# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload) +# into background job functions. When moving to Celery or another task queue, +# this can cause issues with serialization/deserialization. Instead, pass only # primitive types (IDs, strings, etc.) and rehydrate objects inside the task.
🧹 Nitpick comments (3)
backend/app/api/routes/collections.py (3)
214-217: Trim trailing whitespace and consider acting on the TODO.
- There’s trailing whitespace here (Ruff W291) and it’s breaking CI.
- The TODO is valid: passing ORM/Pydantic objects to background tasks complicates future Celery adoption. Prefer passing primitives (user_id, org_id, project_id, payload_key, etc.) and rehydrate inside the task.
Apply the whitespace-only fix:
-# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload) -# into background job functions. When moving to Celery or another task queue, -# this can cause issues with serialization/deserialization. Instead, pass only +# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload) +# into background job functions. When moving to Celery or another task queue, +# this can cause issues with serialization/deserialization. Instead, pass only # primitive types (IDs, strings, etc.) and rehydrate objects inside the task.
237-243: Naming nit and readability.Minor: docs is a list of batches and flat_docs is the flattened list. Consider renaming docs -> doc_batches for clarity; no functional change.
255-268: Two commits inside one session (minor).DocumentCollectionCrud.create and collection_crud._update both commit. It’s acceptable, but if this path becomes hot, consider changing DocumentCollectionCrud.create to not commit and perform a single commit at the end.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
backend/app/api/routes/collections.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/api/routes/collections.py (7)
backend/app/tests/conftest.py (2)
db(24-41)client(52-55)backend/app/models/user.py (1)
UserProjectOrg(68-69)backend/app/crud/document.py (4)
DocumentCrud(14-135)update(99-124)read_one(19-34)delete(126-135)backend/app/core/cloud/storage.py (4)
get_file_size_kb(171-189)create(40-80)client(26-38)delete(191-206)backend/app/crud/rag/open_ai.py (6)
create(102-110)create(193-215)update(118-172)OpenAIAssistantCrud(192-246)delete(174-189)delete(217-246)backend/app/crud/collection.py (7)
create(63-82)CollectionCrud(17-146)read_one(84-93)_update(22-46)delete(107-115)__init__(18-20)_(128-146)backend/app/crud/document_collection.py (3)
create(12-23)DocumentCollectionCrud(8-48)__init__(9-10)
🪛 Ruff (0.12.2)
backend/app/api/routes/collections.py
214-214: Trailing whitespace
Remove trailing whitespace
(W291)
215-215: Trailing whitespace
Remove trailing whitespace
(W291)
216-216: Trailing whitespace
Remove trailing whitespace
(W291)
340-340: Trailing whitespace
Remove trailing whitespace
(W291)
341-341: Trailing whitespace
Remove trailing whitespace
(W291)
342-342: Trailing whitespace
Remove trailing whitespace
(W291)
360-360: Trailing whitespace
Remove trailing whitespace
(W291)
361-361: Trailing whitespace
Remove trailing whitespace
(W291)
🪛 GitHub Actions: AI Platform CI
backend/app/api/routes/collections.py
[error] 1-1: Trailing whitespace detected. Pre-commit hook 'trailing-whitespace' modified the file.
[error] 1-1: Black formatting changed the file. Pre-commit hook reformatted it; please re-run pre-commit and commit the changes.
🔇 Additional comments (6)
backend/app/api/routes/collections.py (6)
16-18: Good move: workers now own their DB session.Importing Session and engine (and switching to UserProjectOrg) aligns with the PR goal to remove session dependency from background workers.
Also applies to: 29-29
219-223: Signature change LGTM.Removing Session from do_create_collection and accepting UserProjectOrg is consistent with the refactor.
330-330: Call site updated correctly.Background task no longer passes a DB session to the worker. Matches the new signature.
345-349: Signature change LGTM (delete worker).Mirrors the create worker; removes session coupling.
403-403: Call site updated correctly (delete worker).No session passed; consistent with refactor.
1-1: Run formatters and hooks locally: ensurepre-commit,black, andruffare installed and on your PATH, then runpre-commit run -a ruff --fix backend/app/api/routes/collections.py black backend/app/api/routes/collections.pyand commit the formatting/fix outputs.
Summary
This PR refactors the create_collection and delete_collection background task execution to simplify dependencies and improve reliability in both production and tests.
Now each background task opens its own database Session when needed, ensuring clean transaction boundaries and avoiding reusing closed sessions.
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Note: No changes to user-facing API behavior.