Skip to content

Conversation

@avirajsingh7
Copy link
Collaborator

@avirajsingh7 avirajsingh7 commented Aug 27, 2025

Summary

This PR refactors the create_collection and delete_collection background task execution to simplify dependencies and improve reliability in both production and tests.
Now each background task opens its own database Session when needed, ensuring clean transaction boundaries and avoiding reusing closed sessions.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features

    • Clearer status updates and error messages during collection creation and deletion.
  • Refactor

    • Background processing for collection create/delete now manages work more reliably and consistently.
  • Bug Fixes

    • Fewer edge-case failures by ensuring collection state and error info are recorded reliably.
  • Tests

    • Tests updated to better simulate background tasks and session behavior for more robust coverage.

Note: No changes to user-facing API behavior.

@coderabbitai
Copy link

coderabbitai bot commented Aug 27, 2025

Walkthrough

Workers for collection create/delete no longer accept external DB sessions; they open their own SQLModel Session(engine), perform CRUD and vector-store operations inside that session, handle errors within a session, and background task invocations/tests were updated accordingly.

Changes

Cohort / File(s) Summary
Collections route worker refactor
backend/app/api/routes/collections.py
Removed SessionDep parameter from do_create_collection and do_delete_collection; worker functions now accept current_user: UserProjectOrg, open with Session(engine) internally, instantiate CRUDs with that session, perform document/vector-store operations and collection status updates inside the session, and handle errors by updating status/error_message inside a session. BackgroundTasks calls updated to omit session. TODOs added about passing ORM objects into background tasks.
Tests updated to mock background tasks and sessions
backend/app/tests/api/routes/collections/test_create_collections.py
Test signature updated to include mock_session and mock_add_task; mocks BackgroundTasks.add_task to execute synchronously and mocks Session context manager to yield test DB; OpenAI client mock adjusted for vector-store usage; removed redundant mocks and simplified response parsing.

Sequence Diagram(s)

sequenceDiagram
    participant Client as API Client
    participant API as /collections endpoints
    participant BT as BackgroundTasks
    participant Worker as do_create_collection / do_delete_collection
    participant DB as Session(engine)
    participant CRUD as *Crud classes
    participant VS as Vector Store (OpenAI)

    Client->>API: POST/DELETE /collections
    API->>BT: add_task(Worker, current_user, request, payload, client)
    note right of BT: No Session passed
    BT-->>Worker: invoke (async)

    rect rgba(220,240,255,0.5)
    Worker->>DB: with Session(engine)
    DB-->>Worker: session
    Worker->>CRUD: instantiate with session
    Worker->>CRUD: read/write docs & collections
    Worker->>VS: create/update vector store
    Worker->>CRUD: update collection status & metadata
    end

    alt Error
      Worker->>DB: with Session(engine)
      Worker->>CRUD: set status=failed, store error_message
    end

    Worker-->>BT: complete
    BT-->>API: background task finished
    API-->>Client: 202 Accepted / 200 OK
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • AkhileshNegi
  • kartpop

Poem

A rabbit hops through sessions new,
I open my engine, sip the brew.
Background winds now carry the task,
No borrowed sessions, no need to ask.
If failure blooms, I mark and stew — then hop along anew. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hotfix/remove_session_from_collection_job

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codecov
Copy link

codecov bot commented Aug 27, 2025

Codecov Report

❌ Patch coverage is 64.44444% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/api/routes/collections.py 55.55% 16 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/tests/api/routes/collections/test_create_collections.py (1)

106-106: Align assertion with Enum storage.

Production compares/assigns Enum members (not .value). Assert against the Enum to avoid brittle coupling.

-        assert collection.status == CollectionStatus.successful.value
+        assert collection.status == CollectionStatus.successful
🧹 Nitpick comments (3)
backend/app/tests/api/routes/collections/test_create_collections.py (1)

7-7: Remove unused MagicMock import.

MagicMock isn’t used in this test file. Clean it up to satisfy Ruff F401.

-from unittest.mock import patch, MagicMock
+from unittest.mock import patch
backend/app/api/routes/collections.py (2)

234-240: Consider using a single session/transaction for read+write path.

You open one session to read docs and a second to update the collection/links. While valid, a single session can reduce state drift and extra round-trips; wrap both phases in one with Session(engine) and flush at boundaries if needed.


252-265: Redundant commit on collection update when docs exist.

DocumentCollectionCrud.create() commits and refreshes; since the mutated collection is attached to the same session, its changes are flushed on that commit. _update() right after is an extra commit. Keep _update() only when there are no docs.

-            if flat_docs:
-                DocumentCollectionCrud(session).create(collection, flat_docs)
-
-            collection_crud._update(collection)
+            if flat_docs:
+                DocumentCollectionCrud(session).create(collection, flat_docs)
+            else:
+                collection_crud._update(collection)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f187606 and 3f4dff5.

📒 Files selected for processing (2)
  • backend/app/api/routes/collections.py (9 hunks)
  • backend/app/tests/api/routes/collections/test_create_collections.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
backend/app/api/routes/collections.py (6)
backend/app/tests/conftest.py (1)
  • db (24-41)
backend/app/models/user.py (1)
  • UserProjectOrg (68-69)
backend/app/crud/document.py (4)
  • DocumentCrud (14-135)
  • update (99-124)
  • read_one (19-34)
  • delete (126-135)
backend/app/crud/collection.py (7)
  • create (63-82)
  • CollectionCrud (17-146)
  • read_one (84-93)
  • _update (22-46)
  • delete (107-115)
  • _ (128-146)
  • __init__ (18-20)
backend/app/crud/document_collection.py (3)
  • create (12-23)
  • DocumentCollectionCrud (8-48)
  • __init__ (9-10)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/tests/api/routes/collections/test_create_collections.py (2)
backend/app/tests/conftest.py (3)
  • client (52-55)
  • db (24-41)
  • user_api_key_header (77-79)
backend/app/tests/utils/openai.py (1)
  • get_mock_openai_client_with_vector_store (54-81)
🪛 Ruff (0.12.2)
backend/app/tests/api/routes/collections/test_create_collections.py

7-7: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (9)
backend/app/tests/api/routes/collections/test_create_collections.py (2)

68-69: Session context patching looks good; consider minor hardening.

Returning the test db from Session().__enter__ is correct for the new worker pattern. If flakiness appears later, add __exit__ to no-op to avoid accidental closes by context management.


46-66: Fix BackgroundTasks.add_task side_effect signature (needs self).

When patching a bound method, the mock is invoked with self as the first arg. Your side_effect currently misses it and will raise a TypeError. Update to accept self.

-        def run_task_immediately(task_func, *args, **kwargs):
-            task_func(*args, **kwargs)
+        def run_task_immediately(self, task_func, *args, **kwargs):
+            task_func(*args, **kwargs)

Likely an incorrect or invalid review comment.

backend/app/api/routes/collections.py (7)

215-220: Good API shift: worker now independent of request session.

Switching do_create_collection to take UserProjectOrg and manage its own Session isolates transaction boundaries and prevents closed-session reuse.


281-290: Failure path is robust and scoped; nice.

Capturing a concise error message and updating status/error_message in a fresh session avoids partial state and preserves diagnostics.


326-329: Background task invocation matches new worker signature.

Dropping the session arg keeps the background job decoupled from the request lifecycle.


337-358: Delete worker mirrors create worker session pattern.

Creating a local session for delete is consistent and avoids request-session leakage.


16-19: Imports reflect the refactor appropriately.

Bringing in Session and engine, plus UserProjectOrg, is necessary for in-worker session management.

Also applies to: 29-29


301-335: Minor: route payload/time calculation is fine; keep as-is.

No action needed. Route creates the DB record before scheduling the task; the worker picks it up via the key.


323-329: All background task call sites match the new worker signatures

A grep for do_create_collection( and do_delete_collection( confirms there are no stale invocations anywhere other than the updated calls in this file. Both the /create and /delete routes now invoke their workers with exactly (current_user, request, payload, client), matching the definitions:

  • do_create_collection(current_user, request, payload, client) at lines 323–329
  • do_delete_collection(current_user, request, payload, client) at lines 391–393

No further changes are needed here.

collection_crud = CollectionCrud(session, current_user.id)
collection = collection_crud.read_one(request.collection_id)
assistant = OpenAIAssistantCrud(client)
data = collection_crud.delete(collection, assistant)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As call to openAI is tightly coupled with collection delete, session may be open untill collection deleted from openAI.
@kartpop @AkhileshNegi should we de-couple this from session?
Not doing this right now because this may require changes in document endpoint also. or better question can be do we require background job to delete a collection?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

if we are not doing this right now, at least put a TODO comment here in code in this PR itself so that we can revisit this later

collection_crud = CollectionCrud(session, current_user.id)
collection = collection_crud.read_one(request.collection_id)
assistant = OpenAIAssistantCrud(client)
data = collection_crud.delete(collection, assistant)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

if we are not doing this right now, at least put a TODO comment here in code in this PR itself so that we can revisit this later

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
backend/app/api/routes/collections.py (1)

340-343: Same TODO + whitespace—dedupe and trim.

This duplicates the earlier TODO and has trailing whitespace (W291). Keep a single TODO above both workers and remove the duplicate.

Apply whitespace-only fix:

-# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload) 
-# into background job functions. When moving to Celery or another task queue, 
-# this can cause issues with serialization/deserialization. Instead, pass only 
+# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload)
+# into background job functions. When moving to Celery or another task queue,
+# this can cause issues with serialization/deserialization. Instead, pass only
 # primitive types (IDs, strings, etc.) and rehydrate objects inside the task.
🧹 Nitpick comments (3)
backend/app/api/routes/collections.py (3)

214-217: Trim trailing whitespace and consider acting on the TODO.

  • There’s trailing whitespace here (Ruff W291) and it’s breaking CI.
  • The TODO is valid: passing ORM/Pydantic objects to background tasks complicates future Celery adoption. Prefer passing primitives (user_id, org_id, project_id, payload_key, etc.) and rehydrate inside the task.

Apply the whitespace-only fix:

-# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload) 
-# into background job functions. When moving to Celery or another task queue, 
-# this can cause issues with serialization/deserialization. Instead, pass only 
+# TODO: Avoid passing ORM / object models (e.g., UserProjectOrg, ResponsePayload)
+# into background job functions. When moving to Celery or another task queue,
+# this can cause issues with serialization/deserialization. Instead, pass only
 # primitive types (IDs, strings, etc.) and rehydrate objects inside the task.

237-243: Naming nit and readability.

Minor: docs is a list of batches and flat_docs is the flattened list. Consider renaming docs -> doc_batches for clarity; no functional change.


255-268: Two commits inside one session (minor).

DocumentCollectionCrud.create and collection_crud._update both commit. It’s acceptable, but if this path becomes hot, consider changing DocumentCollectionCrud.create to not commit and perform a single commit at the end.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3f4dff5 and 01dd1bf.

📒 Files selected for processing (1)
  • backend/app/api/routes/collections.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/app/api/routes/collections.py (7)
backend/app/tests/conftest.py (2)
  • db (24-41)
  • client (52-55)
backend/app/models/user.py (1)
  • UserProjectOrg (68-69)
backend/app/crud/document.py (4)
  • DocumentCrud (14-135)
  • update (99-124)
  • read_one (19-34)
  • delete (126-135)
backend/app/core/cloud/storage.py (4)
  • get_file_size_kb (171-189)
  • create (40-80)
  • client (26-38)
  • delete (191-206)
backend/app/crud/rag/open_ai.py (6)
  • create (102-110)
  • create (193-215)
  • update (118-172)
  • OpenAIAssistantCrud (192-246)
  • delete (174-189)
  • delete (217-246)
backend/app/crud/collection.py (7)
  • create (63-82)
  • CollectionCrud (17-146)
  • read_one (84-93)
  • _update (22-46)
  • delete (107-115)
  • __init__ (18-20)
  • _ (128-146)
backend/app/crud/document_collection.py (3)
  • create (12-23)
  • DocumentCollectionCrud (8-48)
  • __init__ (9-10)
🪛 Ruff (0.12.2)
backend/app/api/routes/collections.py

214-214: Trailing whitespace

Remove trailing whitespace

(W291)


215-215: Trailing whitespace

Remove trailing whitespace

(W291)


216-216: Trailing whitespace

Remove trailing whitespace

(W291)


340-340: Trailing whitespace

Remove trailing whitespace

(W291)


341-341: Trailing whitespace

Remove trailing whitespace

(W291)


342-342: Trailing whitespace

Remove trailing whitespace

(W291)


360-360: Trailing whitespace

Remove trailing whitespace

(W291)


361-361: Trailing whitespace

Remove trailing whitespace

(W291)

🪛 GitHub Actions: AI Platform CI
backend/app/api/routes/collections.py

[error] 1-1: Trailing whitespace detected. Pre-commit hook 'trailing-whitespace' modified the file.


[error] 1-1: Black formatting changed the file. Pre-commit hook reformatted it; please re-run pre-commit and commit the changes.

🔇 Additional comments (6)
backend/app/api/routes/collections.py (6)

16-18: Good move: workers now own their DB session.

Importing Session and engine (and switching to UserProjectOrg) aligns with the PR goal to remove session dependency from background workers.

Also applies to: 29-29


219-223: Signature change LGTM.

Removing Session from do_create_collection and accepting UserProjectOrg is consistent with the refactor.


330-330: Call site updated correctly.

Background task no longer passes a DB session to the worker. Matches the new signature.


345-349: Signature change LGTM (delete worker).

Mirrors the create worker; removes session coupling.


403-403: Call site updated correctly (delete worker).

No session passed; consistent with refactor.


1-1: Run formatters and hooks locally: ensure pre-commit, black, and ruff are installed and on your PATH, then run

pre-commit run -a
ruff --fix backend/app/api/routes/collections.py
black backend/app/api/routes/collections.py

and commit the formatting/fix outputs.

@nishika26 nishika26 closed this Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants