Skip to content

Conversation

@avirajsingh7
Copy link
Collaborator

@avirajsingh7 avirajsingh7 commented Jul 10, 2025

Summary

This PR adds support for ingesting OpenAI Assistants into the platform and refactors the openai_assistant table to support multiple vector stores per assistant via a new vector_store_ids field.

Target issue is #277 and #272

Key Changes

  1. New Route: /api/v1/assistant/{assistant_id}/ingest

  2. Model Changes: Assistant
    Replaces vector_store_id: str with vector_store_ids: List[str].
    Uses PostgreSQL ARRAY(String) for storage.

  3. Updated Alembic migration:
    Adds vector_store_ids column.
    Copies existing vector_store_id to vector_store_ids as a single-element array.
    Drops the vector_store_id column.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

  • New Features

    • Introduced an API endpoint to ingest assistants from OpenAI and store them in the platform.
    • Expanded support for assistants to handle multiple vector store IDs.
    • Added support for managing multiple projects and credentials in seed data.
    • Added utility to obtain OpenAI client credentials per organization and project.
  • Bug Fixes

    • Improved handling of assistants with no vector stores or missing fields.
  • Tests

    • Added comprehensive tests for assistant ingestion, CRUD operations, and utility functions.
    • Updated tests to reflect multiple vector store IDs support.
  • Database Migration

    • Migrated assistant data structure to support multiple vector store IDs.

@avirajsingh7 avirajsingh7 self-assigned this Jul 10, 2025
@avirajsingh7 avirajsingh7 added the enhancement New feature or request label Jul 10, 2025
@codecov
Copy link

codecov bot commented Jul 10, 2025

Codecov Report

Attention: Patch coverage is 92.34973% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/crud/assistants.py 80.00% 9 Missing ⚠️
backend/app/utils.py 78.57% 3 Missing ⚠️
backend/app/tests/utils/utils.py 81.81% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

{
"type": "file_search",
"vector_store_ids": [assistant.vector_store_id],
"vector_store_ids": assistant.vector_store_ids,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just confirm did u test if it comes in the same format and the API works

Copy link
Collaborator Author

@avirajsingh7 avirajsingh7 Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkhileshNegi vector_store_id does not exist in db.
What do you mean by in same format?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in the im sarying that vector_store_ids key expects value to be list, do we get list here and the API is working

Comment on lines 74 to 86
vector_store_ids = []
if assistant.tool_resources and hasattr(assistant.tool_resources, "file_search"):
file_search = assistant.tool_resources.file_search
if file_search and hasattr(file_search, "vector_store_ids"):
vector_store_ids = file_search.vector_store_ids or []

max_num_results = 20
for tool in assistant.tools or []:
if tool.type == "file_search":
file_search = getattr(tool, "file_search", None)
if file_search and hasattr(file_search, "max_num_results"):
max_num_results = file_search.max_num_results
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if there is better way to get nested key rather than going nested if

@coderabbitai
Copy link

coderabbitai bot commented Jul 12, 2025

Walkthrough

The changes introduce support for associating multiple vector store IDs with assistants instead of a single ID. This involves schema migrations, model and API updates, new CRUD and route logic for ingesting assistants from OpenAI, and expanded seed and test data to accommodate multiple projects and assistants. New tests and utilities are added for the new ingestion flow.

Changes

File(s) Change Summary
backend/app/models/assistants.py, backend/app/seed_data/seed_data.py Changed assistant model and seed data to use vector_store_ids: List[str] instead of vector_store_id: str.
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py Added migration to replace vector_store_id with vector_store_ids (array of strings) in the database.
backend/app/api/routes/assistants.py Added new /assistant/{assistant_id}/ingest POST endpoint for assistant ingestion from OpenAI.
backend/app/crud/assistants.py Added fetch_assistant_from_openai and sync_assistant CRUD functions for assistant ingestion and storage.
backend/app/crud/init.py Exposed new assistant CRUD functions (fetch_assistant_from_openai, sync_assistant) in the public API.
backend/app/api/main.py Registered the new assistants route with the main FastAPI application.
backend/app/api/routes/responses.py Updated response processing to use plural vector_store_ids for file search tools.
backend/app/seed_data/seed_data.json Expanded seed data to include multiple projects, credentials, and assistants with multiple vector store IDs.
backend/app/tests/api/routes/test_responses.py Updated tests to use vector_store_ids and new assistant IDs.
backend/app/tests/api/routes/test_assistants.py Added tests for the new assistant ingestion endpoint, including success and error scenarios.
backend/app/tests/crud/test_assistants.py Added tests for assistant synchronization logic, covering various field and error cases.
backend/app/tests/utils/openai.py Added utility to mock OpenAI assistant objects for testing.
backend/app/tests/utils/utils.py Added utility to fetch projects by name or any active project for testing.
backend/app/utils.py Added get_openai_client utility to retrieve OpenAI client configured by organization and project credentials.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as Assistants Route
    participant CRUD as CRUD Layer
    participant OpenAI as OpenAI API
    participant DB as Database

    Client->>API: POST /assistant/{assistant_id}/ingest
    API->>CRUD: get_openai_client(session, org_id, proj_id)
    API->>CRUD: fetch_assistant_from_openai(assistant_id, client)
    OpenAI-->>API: OpenAI Assistant Data
    API->>CRUD: sync_assistant(session, org_id, proj_id, openai_assistant)
    CRUD->>DB: Store assistant with vector_store_ids
    DB-->>CRUD: Assistant record
    CRUD-->>API: Assistant record
    API-->>Client: Success response with assistant data
Loading

Suggested reviewers

  • AkhileshNegi

Poem

🐇 Hopping through code with a twitch of my nose,
Assistants now gather where the vector stores grow.
From OpenAI’s garden, new helpers I bring,
Syncing and storing, hear the data sing!
Projects are many, with keys shining bright,
Together we code through the day and the night. 🌿✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 091f78d and 07cc058.

📒 Files selected for processing (1)
  • backend/app/crud/assistants.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/crud/assistants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
backend/app/api/routes/responses.py (1)

128-128: LGTM: Correct migration to multiple vector store IDs.

The changes properly handle the migration from single vector_store_id to multiple vector_store_ids. The conditional check and direct list passing to the OpenAI API are correctly implemented.

Also applies to: 132-132

backend/app/crud/__init__.py (1)

8-10: Remove duplicate import of get_provider_credential.

The function get_provider_credential is imported twice - here on line 9 and again on line 38. Remove this duplicate import to avoid confusion.

-
-from .credentials import get_provider_credential
-
+
🧹 Nitpick comments (3)
backend/app/models/assistants.py (1)

15-17: Update type annotation to use built-in list.

The model change correctly implements multiple vector store ID support. However, consider using the built-in list type instead of List from typing for Python 3.9+ compatibility.

-    vector_store_ids: List[str] = Field(
+    vector_store_ids: list[str] = Field(
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_.py (1)

10-10: Remove unused import.

The sqlmodel.sql.sqltypes import is not used in this migration.

-import sqlmodel.sql.sqltypes
backend/app/crud/assistants.py (1)

74-81: Consider simplifying nested attribute checks.

The nested attribute checking could be more concise using getattr with default values.

-    vector_store_ids = []
-    if openai_assistant.tool_resources and hasattr(
-        openai_assistant.tool_resources, "file_search"
-    ):
-        file_search = openai_assistant.tool_resources.file_search
-        if file_search and hasattr(file_search, "vector_store_ids"):
-            vector_store_ids = file_search.vector_store_ids or []
+    vector_store_ids = []
+    tool_resources = getattr(openai_assistant, "tool_resources", None)
+    if tool_resources:
+        file_search = getattr(tool_resources, "file_search", None)
+        if file_search:
+            vector_store_ids = getattr(file_search, "vector_store_ids", []) or []
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b6c7fa and 7cb0a9d.

📒 Files selected for processing (9)
  • backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_.py (1 hunks)
  • backend/app/api/main.py (1 hunks)
  • backend/app/api/routes/assistant.py (1 hunks)
  • backend/app/api/routes/responses.py (1 hunks)
  • backend/app/crud/__init__.py (2 hunks)
  • backend/app/crud/assistants.py (2 hunks)
  • backend/app/models/assistants.py (2 hunks)
  • backend/app/tests/api/routes/test_responses.py (1 hunks)
  • backend/app/utils.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/crud/assistants.py (3)
backend/app/models/assistants.py (1)
  • Assistant (24-33)
backend/app/utils.py (3)
  • APIResponse (27-46)
  • handle_openai_error (168-172)
  • mask_string (156-165)
backend/app/api/routes/responses.py (1)
  • handle_openai_error (22-26)
🪛 Ruff (0.11.9)
backend/app/models/assistants.py

15-15: Use list instead of List for type annotation

Replace with list

(UP006)

backend/app/crud/__init__.py

9-9: .credentials.get_provider_credential imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


48-48: .assistants.get_assistant_by_id imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


49-49: .assistants.fetch_assistant_from_openai imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


50-50: .assistants.insert_assistant imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_.py

10-10: sqlmodel.sql.sqltypes imported but unused

Remove unused import: sqlmodel.sql.sqltypes

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/utils.py (2)

13-13: LGTM: Clean import addition.

The OpenAI import is properly placed and necessary for the new error handling function.


168-173: LGTM: Robust error handling implementation.

The function properly extracts user-friendly error messages from OpenAI errors with appropriate fallback handling.

backend/app/api/main.py (1)

18-18: LGTM: Proper route integration.

The assistant routes are correctly imported and included following the established pattern in the codebase.

Also applies to: 24-24

backend/app/tests/api/routes/test_responses.py (1)

83-83: LGTM: Test updated to match schema migration.

The change from vector_store_id = None to vector_store_ids = [] correctly aligns with the migration to support multiple vector store IDs while maintaining the same test logic.

backend/app/models/assistants.py (1)

4-5: LGTM: Necessary imports for PostgreSQL array support.

The SQLAlchemy imports are correctly added to support the ARRAY column type for multiple vector store IDs.

backend/app/api/routes/assistant.py (1)

19-54: LGTM! Clean implementation following best practices.

The route implementation properly:

  • Uses dependency injection for database and user context
  • Leverages CRUD functions for business logic separation
  • Handles errors appropriately with meaningful status codes
  • Uses the centralized configure_openai utility as suggested in past reviews

@avirajsingh7 avirajsingh7 marked this pull request as draft July 13, 2025 09:48
@avirajsingh7 avirajsingh7 marked this pull request as ready for review July 13, 2025 11:47
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/app/crud/assistants.py (1)

47-52: Fix return type annotation.

The function is annotated to return APIResponse[Assistant] but actually returns Assistant directly.

 def insert_assistant(
     session: Session,
     organization_id: int,
     project_id: int,
     openai_assistant: OpenAIAssistant,
-) -> APIResponse[Assistant]:
+) -> Assistant:
🧹 Nitpick comments (5)
backend/app/tests/utils/utils.py (2)

4-4: Update type annotations to use modern Python syntax.

According to static analysis, typing.Type is deprecated. Use the built-in type instead.

-from typing import Optional, Type, TypeVar
+from typing import Optional, TypeVar

62-83: Improve boolean comparisons and type annotations.

The function logic is correct, but there are style improvements that align with Python best practices.

Apply these improvements:

-def get_project(session: Session, name: Optional[str] = None) -> Project:
+def get_project(session: Session, name: str | None = None) -> Project:
     """
     Retrieve an active project from the database.

     If a project name is provided, fetch the active project with that name.
     If no name is provided, fetch any random project.
     """
     if name:
         statement = (
             select(Project)
-            .where(Project.name == name, Project.is_active == True)
+            .where(Project.name == name, Project.is_active)
             .limit(1)
         )
     else:
-        statement = select(Project).where(Project.is_active == True).limit(1)
+        statement = select(Project).where(Project.is_active).limit(1)

     project = session.exec(statement).first()

     if not project:
         raise ValueError("No active projects found")

     return project
backend/app/tests/api/routes/test_assistants.py (3)

1-8: Remove unused imports.

Several imports are not used in the test file and should be removed to keep the codebase clean.

-import pytest
-from fastapi.testclient import TestClient
-from sqlmodel import Session
-from unittest.mock import MagicMock, patch
-from app.main import app
-from app.tests.utils.utils import random_email
-from app.core.security import get_password_hash
-from app.tests.utils.openai import mock_openai_assistant
+import pytest
+from fastapi.testclient import TestClient
+from sqlmodel import Session
+from unittest.mock import patch
+from app.main import app
+from app.tests.utils.openai import mock_openai_assistant

18-23: Fix inconsistent type hints and remove unused parameter.

The normal_user_api_key_header parameter has inconsistent type hints between functions (str vs dict), and the db parameter is unused.

 @patch("app.api.routes.assistants.fetch_assistant_from_openai")
 def test_ingest_assistant_success(
     mock_fetch_assistant,
-    db: Session,
-    normal_user_api_key_header: str,
+    normal_user_api_key_header: dict,
 ):

43-48: Remove unused parameter.

The db parameter is not used in this test function.

 @patch("app.api.routes.assistants.configure_openai")
 def test_ingest_assistant_openai_not_configured(
     mock_configure_openai,
-    db: Session,
     normal_user_api_key_header: dict,
 ):
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cb0a9d and 77ac3f2.

📒 Files selected for processing (10)
  • backend/app/api/main.py (2 hunks)
  • backend/app/api/routes/assistants.py (1 hunks)
  • backend/app/crud/assistants.py (2 hunks)
  • backend/app/seed_data/seed_data.json (1 hunks)
  • backend/app/seed_data/seed_data.py (2 hunks)
  • backend/app/tests/api/routes/test_assistants.py (1 hunks)
  • backend/app/tests/api/routes/test_responses.py (2 hunks)
  • backend/app/tests/crud/test_assistants.py (1 hunks)
  • backend/app/tests/utils/openai.py (1 hunks)
  • backend/app/tests/utils/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/app/api/main.py
  • backend/app/tests/api/routes/test_responses.py
🧰 Additional context used
🧬 Code Graph Analysis (4)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
  • get_user_by_email (34-37)
backend/app/crud/api_key.py (1)
  • get_api_key_by_value (104-122)
backend/app/models/api_key.py (1)
  • APIKeyPublic (23-25)
backend/app/models/project.py (1)
  • Project (28-49)
backend/app/api/routes/assistants.py (6)
backend/app/api/deps.py (2)
  • get_db (33-35)
  • get_current_user_org_project (110-131)
backend/app/core/util.py (1)
  • configure_openai (74-93)
backend/app/crud/assistants.py (2)
  • fetch_assistant_from_openai (30-44)
  • insert_assistant (47-109)
backend/app/crud/credentials.py (1)
  • get_provider_credential (90-107)
backend/app/models/user.py (1)
  • UserProjectOrg (68-69)
backend/app/utils.py (2)
  • APIResponse (27-46)
  • success_response (34-37)
backend/app/tests/api/routes/test_assistants.py (4)
backend/app/tests/utils/utils.py (1)
  • random_email (28-29)
backend/app/core/security.py (1)
  • get_password_hash (93-103)
backend/app/tests/utils/openai.py (1)
  • mock_openai_assistant (10-39)
backend/app/tests/conftest.py (1)
  • db (27-41)
backend/app/tests/crud/test_assistants.py (4)
backend/app/tests/utils/openai.py (1)
  • mock_openai_assistant (10-39)
backend/app/tests/utils/utils.py (1)
  • get_project (62-83)
backend/app/crud/assistants.py (1)
  • insert_assistant (47-109)
backend/app/tests/conftest.py (1)
  • db (27-41)
🪛 Ruff (0.11.9)
backend/app/tests/utils/utils.py

4-4: typing.Type is deprecated, use type instead

(UP035)


62-62: Use X | Y for type annotations

Convert to X | Y

(UP007)


72-72: Avoid equality comparisons to True; use if Project.is_active: for truth checks

Replace with Project.is_active

(E712)


76-76: Avoid equality comparisons to True; use if Project.is_active: for truth checks

Replace with Project.is_active

(E712)

backend/app/tests/utils/openai.py

12-12: Use X | Y for type annotations

Convert to X | Y

(UP007)


12-12: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

backend/app/tests/api/routes/test_assistants.py

4-4: unittest.mock.MagicMock imported but unused

Remove unused import: unittest.mock.MagicMock

(F401)


6-6: app.tests.utils.utils.random_email imported but unused

Remove unused import: app.tests.utils.utils.random_email

(F401)


7-7: app.core.security.get_password_hash imported but unused

Remove unused import: app.core.security.get_password_hash

(F401)


21-21: Unused function argument: db

(ARG001)


46-46: Unused function argument: db

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (14)
backend/app/seed_data/seed_data.py (2)

60-60: LGTM! Field update aligns with schema migration.

The change from vector_store_id: str to vector_store_ids: list[str] correctly reflects the database schema migration to support multiple vector store IDs per assistant.


264-264: LGTM! Constructor argument updated correctly.

The assistant creation logic has been updated to use the new vector_store_ids field, maintaining consistency with the model changes.

backend/app/seed_data/seed_data.json (4)

15-20: LGTM! Added multi-project support.

The addition of the "Dalgo" project expands the seed data to support multiple projects, which aligns with the enhanced multi-project functionality introduced in this PR.


65-72: LGTM! Added credentials for the new project.

The credential structure for the Dalgo project is consistent with the existing Glific credentials, providing proper OpenAI API key configuration for both projects.


80-80: LGTM! Updated to support multiple vector store IDs.

The change from vector_store_id to vector_store_ids as an array correctly reflects the database schema migration and enables support for multiple vector stores per assistant.

Also applies to: 91-91


86-96: LGTM! Added project-specific assistant.

The Dalgo assistant entry is properly structured and maintains consistency with the Glific assistant while being project-specific.

backend/app/api/routes/assistants.py (4)

1-16: LGTM! Clean imports and router setup.

The imports are well-organized and the router configuration follows FastAPI conventions with appropriate prefix and tags.


19-28: LGTM! Proper route definition and dependency injection.

The route definition follows FastAPI best practices with:

  • Appropriate HTTP method (POST) for data creation
  • Clear path parameter with type annotation and description
  • Proper dependency injection for database session and user context
  • Correct status code for resource creation (201)

32-44: LGTM! Robust credential handling and error management.

The implementation properly:

  • Retrieves provider credentials with organization and project context
  • Configures the OpenAI client with proper error handling
  • Returns meaningful HTTP 400 error when OpenAI is not configured

46-54: LGTM! Clean assistant ingestion workflow.

The final workflow properly:

  • Fetches the assistant from OpenAI using the configured client
  • Inserts the assistant with proper organization and project association
  • Returns a success response with the stored assistant data
backend/app/tests/utils/openai.py (1)

15-39: LGTM! Comprehensive mock assistant creation.

The mock assistant creation is well-structured and includes all necessary fields to simulate a realistic OpenAI assistant object. The configurability of key parameters (assistant_id, vector_store_ids, max_num_results) makes it flexible for different test scenarios.

backend/app/crud/assistants.py (2)

30-44: LGTM - Well-implemented OpenAI integration.

The function properly handles OpenAI API errors with appropriate logging and error handling. The use of masked assistant IDs in logs is a good security practice.


74-89: LGTM - Robust vector store ID extraction logic.

The implementation correctly handles the migration from single vector store ID to multiple IDs, with proper null checks and fallback handling for both tool resources and tools.

backend/app/tests/crud/test_assistants.py (1)

10-116: Excellent comprehensive test coverage.

This test class provides thorough coverage of the insert_assistant function, including:

  • Successful insertion with proper field mapping
  • Duplicate prevention (409 error)
  • Validation of required fields (400 error for missing instructions)
  • Default value handling (name defaults to ID)
  • Edge cases for missing vector stores and tool resources

The tests properly verify the migration from single vector_store_id to multiple vector_store_ids and handle all the various data scenarios effectively.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
backend/app/crud/assistants.py (1)

46-51: Fix return type annotation mismatch.

The function is annotated to return APIResponse[Assistant] but actually returns Assistant directly on line 108.

 def insert_assistant(
     session: Session,
     organization_id: int,
     project_id: int,
     openai_assistant: OpenAIAssistant,
-) -> APIResponse[Assistant]:
+) -> Assistant:
🧹 Nitpick comments (2)
backend/app/crud/assistants.py (2)

73-79: Consider more robust vector store ID extraction.

The current implementation uses multiple hasattr checks which may be fragile if the OpenAI API response structure changes. Consider using a try-catch approach for more robust handling.

 vector_store_ids = []
-if openai_assistant.tool_resources and hasattr(
-    openai_assistant.tool_resources, "file_search"
-):
-    file_search = openai_assistant.tool_resources.file_search
-    if file_search and hasattr(file_search, "vector_store_ids"):
-        vector_store_ids = file_search.vector_store_ids or []
+try:
+    if (openai_assistant.tool_resources and 
+        openai_assistant.tool_resources.file_search and
+        openai_assistant.tool_resources.file_search.vector_store_ids):
+        vector_store_ids = openai_assistant.tool_resources.file_search.vector_store_ids
+except AttributeError:
+    vector_store_ids = []

81-87: Extract magic number to a constant.

The default value 20 for max_num_results should be defined as a module-level constant for better maintainability.

+DEFAULT_MAX_NUM_RESULTS = 20
+
 def insert_assistant(
     # ... existing parameters
-    max_num_results = 20
+    max_num_results = DEFAULT_MAX_NUM_RESULTS
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77ac3f2 and a0203f6.

📒 Files selected for processing (1)
  • backend/app/crud/assistants.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/crud/assistants.py (3)

1-13: LGTM: Clean imports and setup.

The imports are well-organized and include all necessary dependencies for OpenAI integration and database operations.


30-43: LGTM: Robust error handling for OpenAI API calls.

The function properly handles OpenAI API errors with appropriate logging (using masked assistant ID for privacy) and converts them to HTTP exceptions with meaningful error messages.


89-108: LGTM: Proper database operations and logging.

The Assistant model creation, database operations, and success logging are implemented correctly. The mapping from OpenAI assistant to local model handles the new vector_store_ids field appropriately for the schema migration.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/tests/utils/utils.py (1)

4-4: Replace deprecated Type import with type.

The static analysis tool correctly identifies that typing.Type is deprecated in favor of the built-in type.

-from typing import Type, TypeVar
+from typing import TypeVar

And update line 57 accordingly:

-def get_non_existent_id(session: Session, model: Type[T]) -> int:
+def get_non_existent_id(session: Session, model: type[T]) -> int:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 233ab82 and 6dd98d7.

📒 Files selected for processing (3)
  • backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py (1 hunks)
  • backend/app/tests/api/routes/test_assistants.py (1 hunks)
  • backend/app/tests/utils/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/tests/api/routes/test_assistants.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
  • get_user_by_email (34-37)
backend/app/crud/api_key.py (1)
  • get_api_key_by_value (104-122)
backend/app/models/api_key.py (1)
  • APIKeyPublic (23-25)
backend/app/models/project.py (1)
  • Project (28-49)
🪛 Ruff (0.11.9)
backend/app/tests/utils/utils.py

4-4: typing.Type is deprecated, use type instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/tests/utils/utils.py (2)

12-13: LGTM! Import additions support new functionality.

The new imports for get_api_key_by_value, APIKeyPublic, and Project are properly organized and support the utility functions in this file.


62-84: LGTM! Well-implemented utility function with good error handling.

The get_project function is well-designed with:

  • Clear docstring explaining the behavior
  • Proper optional parameter handling
  • Appropriate database querying with filters
  • Good error handling with descriptive ValueError

The function supports the new assistant ingestion workflows mentioned in the PR objectives and provides flexibility for both named and random project retrieval.

backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py (3)

1-16: LGTM! Proper migration setup.

The migration header follows Alembic conventions correctly with appropriate imports for PostgreSQL ARRAY support.


19-35: LGTM! Safe and correct upgrade migration.

The upgrade function follows migration best practices:

  • Adds the new column as nullable first for safety
  • Correctly migrates existing non-NULL data to single-element arrays
  • Drops the old column as the final step

The SQL logic properly handles the data type conversion from string to array.


26-32: LGTM! Correct SQL migration logic.

The SQL migration logic correctly handles:

  • PostgreSQL ARRAY type conversion
  • Proper 1-based array indexing (vector_store_ids[1])
  • NULL value safety with appropriate WHERE clauses
  • Empty array validation with array_length(vector_store_ids, 1) > 0

The data migration preserves existing data in the upgrade direction and handles the array-to-single conversion appropriately in the downgrade.

Also applies to: 50-56

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/app/tests/utils/utils.py (1)

4-4: Update deprecated type annotation import.

The static analysis tool correctly identifies that typing.Type is deprecated in favor of the built-in type (available since Python 3.9).

Apply this diff to use the modern type annotation:

-from typing import Type, TypeVar
+from typing import TypeVar

And update the function signature on line 57:

-def get_non_existent_id(session: Session, model: Type[T]) -> int:
+def get_non_existent_id(session: Session, model: type[T]) -> int:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd98d7 and 63c3b26.

📒 Files selected for processing (2)
  • backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py (1 hunks)
  • backend/app/tests/utils/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
  • get_user_by_email (34-37)
backend/app/crud/api_key.py (1)
  • get_api_key_by_value (104-122)
backend/app/models/api_key.py (1)
  • APIKeyPublic (23-25)
backend/app/models/project.py (1)
  • Project (28-49)
🪛 Ruff (0.11.9)
backend/app/tests/utils/utils.py

4-4: typing.Type is deprecated, use type instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/tests/utils/utils.py (2)

12-13: LGTM! Import additions support new functionality.

The new imports for get_api_key_by_value and Project model are appropriate for the added utility function and align with the multi-project structure introduced in the codebase.


62-82: LGTM! Well-implemented utility function.

The get_project function is well-designed with:

  • Clear documentation explaining the behavior
  • Proper error handling with descriptive error message
  • Flexible parameter design (optional name filtering)
  • Consistent coding patterns with other utility functions
  • Correct SQLModel query syntax

This utility will be valuable for tests that need project context in the new multi-project structure.

@avirajsingh7 avirajsingh7 linked an issue Jul 14, 2025 that may be closed by this pull request
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/app/tests/utils/utils.py (2)

4-4: Address the deprecated typing.Type import.

The static analysis tool correctly identifies that typing.Type is deprecated in favor of the built-in type for Python 3.9+.

Apply this diff to use the modern import:

-from typing import Type, TypeVar
+from typing import TypeVar

And update the type annotation on line 61:

-def get_non_existent_id(session: Session, model: Type[T]) -> int:
+def get_non_existent_id(session: Session, model: type[T]) -> int:

66-85: Clarify the "random" behavior in the documentation.

The function implementation is solid and follows good practices with proper error handling. However, the docstring mentions "fetch any random project" which is misleading since the function returns the first project found by the database query, not a truly random one.

Apply this diff to improve the documentation accuracy:

-    If no name is provided, fetch any random project.
+    If no name is provided, fetch any active project.

The function correctly handles the optional name parameter, includes appropriate error handling, and aligns with the project's database patterns. The use case explained in past comments makes sense for test scenarios where you need an existing project reference.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63c3b26 and b1296a0.

📒 Files selected for processing (2)
  • backend/app/crud/__init__.py (3 hunks)
  • backend/app/tests/utils/utils.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
  • get_user_by_email (34-37)
backend/app/crud/api_key.py (1)
  • get_api_key_by_value (104-122)
backend/app/models/api_key.py (1)
  • APIKeyPublic (23-25)
backend/app/models/project.py (1)
  • Project (28-49)
🪛 Ruff (0.12.2)
backend/app/crud/__init__.py

49-49: .assistants.get_assistant_by_id imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


50-50: .assistants.fetch_assistant_from_openai imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


51-51: .assistants.insert_assistant imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

backend/app/tests/utils/utils.py

4-4: typing.Type is deprecated, use type instead

(UP035)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/tests/utils/utils.py (1)

12-13: LGTM! Import additions support new functionality.

The new imports for get_api_key_by_value and Project model are correctly added to support the new get_project utility function and are properly organized.

backend/app/crud/__init__.py (2)

8-8: LGTM: Code organization improvement.

The blank line addition improves readability by better grouping the imports.


48-52: Assistant CRUD imports are utilized across the codebase

The functions imported in backend/app/crud/__init__.py are in active use, including in API routes and tests:

  • backend/app/api/routes/assistants.py uses fetch_assistant_from_openai and insert_assistant
  • backend/app/api/routes/responses.py uses get_assistant_by_id
  • backend/app/tests/crud/test_assistants.py covers insert_assistant
  • backend/app/tests/api/routes/test_assistants.py mocks and verifies fetch_assistant_from_openai
  • backend/app/tests/api/routes/test_responses.py mocks and verifies get_assistant_by_id

No changes needed here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
backend/app/tests/api/routes/test_assistants.py (1)

4-4: Remove unused import.

The import from app.main import app is not used in the test file.

-from app.main import app
backend/app/crud/assistants.py (1)

65-65: Fix inconsistent log message function names.

The log messages reference [insert_assistant] but the actual function is named sync_assistant. This creates confusion when debugging.

-            f"[insert_assistant] Assistant with ID {assistant_id} already exists in the database."
+            f"[sync_assistant] Assistant with ID {assistant_id} already exists in the database."
-        f"[insert_assistant] Successfully ingested assistant with ID {mask_string(assistant_id)}."
+        f"[sync_assistant] Successfully ingested assistant with ID {mask_string(assistant_id)}."

Also applies to: 111-111

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1296a0 and af6a40b.

📒 Files selected for processing (6)
  • backend/app/api/routes/assistants.py (1 hunks)
  • backend/app/crud/__init__.py (2 hunks)
  • backend/app/crud/assistants.py (2 hunks)
  • backend/app/tests/api/routes/test_assistants.py (1 hunks)
  • backend/app/tests/crud/test_assistants.py (1 hunks)
  • backend/app/utils.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/app/api/routes/assistants.py
  • backend/app/tests/crud/test_assistants.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/app/utils.py (1)
backend/app/crud/credentials.py (1)
  • get_provider_credential (90-118)
backend/app/tests/api/routes/test_assistants.py (1)
backend/app/tests/utils/openai.py (1)
  • mock_openai_assistant (10-39)
🪛 Ruff (0.12.2)
backend/app/crud/__init__.py

48-48: .assistants.get_assistant_by_id imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


49-49: .assistants.fetch_assistant_from_openai imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


50-50: .assistants.sync_assistant imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

backend/app/tests/api/routes/test_assistants.py

4-4: app.main.app imported but unused

Remove unused import: app.main.app

(F401)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/crud/__init__.py (1)

47-51: LGTM! New assistant functions properly exposed.

The import additions correctly expose the new fetch_assistant_from_openai and sync_assistant functions as part of the public CRUD interface, supporting the new assistant ingestion functionality.

backend/app/utils.py (1)

168-192: Well-implemented utility function.

The get_openai_client function correctly:

  • Validates OpenAI credentials exist for the org/project
  • Uses appropriate HTTP status codes (400 for missing credentials, 500 for client creation failure)
  • Provides clear error messages
  • Follows the existing codebase patterns
backend/app/tests/api/routes/test_assistants.py (1)

13-32: Test implementation looks good.

The test properly mocks the fetch_assistant_from_openai function and validates the expected response structure. The use of the mock_openai_assistant utility keeps the test clean and maintainable.

backend/app/crud/assistants.py (3)

30-48: Well-implemented OpenAI assistant fetching.

The function properly handles OpenAI API errors:

  • Uses 404 for NotFoundError (correct)
  • Uses 502 for OpenAIError (appropriate for upstream service errors)
  • Logs errors with masked assistant IDs for security

78-84: Vector store IDs extraction logic is correct.

The safe attribute checking with hasattr and proper fallback to empty list handles various OpenAI assistant configurations correctly.


86-92: Max results extraction logic is well-implemented.

The loop through tools to find file_search configuration with proper fallback to default value of 20 is correct and handles edge cases appropriately.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/app/crud/assistants.py (1)

86-92: Consider documenting the assumption about file_search tool precedence.

The code assumes that the first file_search tool found contains the relevant max_num_results configuration. While this is likely correct for typical use cases, consider adding a comment to document this assumption.

 max_num_results = 20
+# Use max_num_results from the first file_search tool found
 for tool in openai_assistant.tools or []:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af6a40b and 091f78d.

📒 Files selected for processing (2)
  • backend/app/crud/assistants.py (2 hunks)
  • backend/app/tests/api/routes/test_assistants.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • backend/app/tests/api/routes/test_assistants.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/crud/assistants.py (1)
backend/app/utils.py (1)
  • mask_string (156-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/crud/assistants.py (3)

30-49: LGTM! Well-implemented OpenAI API integration with proper error handling.

The function correctly handles OpenAI API exceptions and uses appropriate HTTP status codes. The use of masked logging for assistant IDs is good security practice.


78-84: Robust defensive programming for OpenAI API structure.

The nested hasattr checks provide good protection against potential changes in the OpenAI API structure. The fallback to empty list for vector_store_ids is appropriate.


94-113: Well-structured database operations with proper transaction handling.

The Assistant model creation, database persistence, and logging follow SQLModel best practices. The fallback logic for the name field (openai_assistant.name or openai_assistant.id) handles potential None values correctly.

@avirajsingh7 avirajsingh7 requested a review from nishika26 July 16, 2025 13:24
@AkhileshNegi AkhileshNegi merged commit b404bee into main Jul 16, 2025
3 checks passed
@AkhileshNegi AkhileshNegi deleted the feature/openai-assistant-syncc branch July 16, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate vector_store_id to vector_store_ids in openai_assistant Table OpenAI Assistants: Scrape Assistants

3 participants