-
Couldn't load subscription status.
- Fork 5
Add API to Ingest OpenAI Assistant and Migrate to Support Multiple Vector Store IDs #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
| { | ||
| "type": "file_search", | ||
| "vector_store_ids": [assistant.vector_store_id], | ||
| "vector_store_ids": assistant.vector_store_ids, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just confirm did u test if it comes in the same format and the API works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AkhileshNegi vector_store_id does not exist in db.
What do you mean by in same format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in the im sarying that vector_store_ids key expects value to be list, do we get list here and the API is working
backend/app/api/routes/assistant.py
Outdated
| vector_store_ids = [] | ||
| if assistant.tool_resources and hasattr(assistant.tool_resources, "file_search"): | ||
| file_search = assistant.tool_resources.file_search | ||
| if file_search and hasattr(file_search, "vector_store_ids"): | ||
| vector_store_ids = file_search.vector_store_ids or [] | ||
|
|
||
| max_num_results = 20 | ||
| for tool in assistant.tools or []: | ||
| if tool.type == "file_search": | ||
| file_search = getattr(tool, "file_search", None) | ||
| if file_search and hasattr(file_search, "max_num_results"): | ||
| max_num_results = file_search.max_num_results | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if there is better way to get nested key rather than going nested if
WalkthroughThe changes introduce support for associating multiple vector store IDs with assistants instead of a single ID. This involves schema migrations, model and API updates, new CRUD and route logic for ingesting assistants from OpenAI, and expanded seed and test data to accommodate multiple projects and assistants. New tests and utilities are added for the new ingestion flow. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as Assistants Route
participant CRUD as CRUD Layer
participant OpenAI as OpenAI API
participant DB as Database
Client->>API: POST /assistant/{assistant_id}/ingest
API->>CRUD: get_openai_client(session, org_id, proj_id)
API->>CRUD: fetch_assistant_from_openai(assistant_id, client)
OpenAI-->>API: OpenAI Assistant Data
API->>CRUD: sync_assistant(session, org_id, proj_id, openai_assistant)
CRUD->>DB: Store assistant with vector_store_ids
DB-->>CRUD: Assistant record
CRUD-->>API: Assistant record
API-->>Client: Success response with assistant data
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
backend/app/api/routes/responses.py (1)
128-128: LGTM: Correct migration to multiple vector store IDs.The changes properly handle the migration from single
vector_store_idto multiplevector_store_ids. The conditional check and direct list passing to the OpenAI API are correctly implemented.Also applies to: 132-132
backend/app/crud/__init__.py (1)
8-10: Remove duplicate import ofget_provider_credential.The function
get_provider_credentialis imported twice - here on line 9 and again on line 38. Remove this duplicate import to avoid confusion.- -from .credentials import get_provider_credential - +
🧹 Nitpick comments (3)
backend/app/models/assistants.py (1)
15-17: Update type annotation to use built-inlist.The model change correctly implements multiple vector store ID support. However, consider using the built-in
listtype instead ofListfrom typing for Python 3.9+ compatibility.- vector_store_ids: List[str] = Field( + vector_store_ids: list[str] = Field(backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_.py (1)
10-10: Remove unused import.The
sqlmodel.sql.sqltypesimport is not used in this migration.-import sqlmodel.sql.sqltypesbackend/app/crud/assistants.py (1)
74-81: Consider simplifying nested attribute checks.The nested attribute checking could be more concise using getattr with default values.
- vector_store_ids = [] - if openai_assistant.tool_resources and hasattr( - openai_assistant.tool_resources, "file_search" - ): - file_search = openai_assistant.tool_resources.file_search - if file_search and hasattr(file_search, "vector_store_ids"): - vector_store_ids = file_search.vector_store_ids or [] + vector_store_ids = [] + tool_resources = getattr(openai_assistant, "tool_resources", None) + if tool_resources: + file_search = getattr(tool_resources, "file_search", None) + if file_search: + vector_store_ids = getattr(file_search, "vector_store_ids", []) or []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_.py(1 hunks)backend/app/api/main.py(1 hunks)backend/app/api/routes/assistant.py(1 hunks)backend/app/api/routes/responses.py(1 hunks)backend/app/crud/__init__.py(2 hunks)backend/app/crud/assistants.py(2 hunks)backend/app/models/assistants.py(2 hunks)backend/app/tests/api/routes/test_responses.py(1 hunks)backend/app/utils.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/crud/assistants.py (3)
backend/app/models/assistants.py (1)
Assistant(24-33)backend/app/utils.py (3)
APIResponse(27-46)handle_openai_error(168-172)mask_string(156-165)backend/app/api/routes/responses.py (1)
handle_openai_error(22-26)
🪛 Ruff (0.11.9)
backend/app/models/assistants.py
15-15: Use list instead of List for type annotation
Replace with list
(UP006)
backend/app/crud/__init__.py
9-9: .credentials.get_provider_credential imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
48-48: .assistants.get_assistant_by_id imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
49-49: .assistants.fetch_assistant_from_openai imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
50-50: .assistants.insert_assistant imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_.py
10-10: sqlmodel.sql.sqltypes imported but unused
Remove unused import: sqlmodel.sql.sqltypes
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/utils.py (2)
13-13: LGTM: Clean import addition.The OpenAI import is properly placed and necessary for the new error handling function.
168-173: LGTM: Robust error handling implementation.The function properly extracts user-friendly error messages from OpenAI errors with appropriate fallback handling.
backend/app/api/main.py (1)
18-18: LGTM: Proper route integration.The assistant routes are correctly imported and included following the established pattern in the codebase.
Also applies to: 24-24
backend/app/tests/api/routes/test_responses.py (1)
83-83: LGTM: Test updated to match schema migration.The change from
vector_store_id = Nonetovector_store_ids = []correctly aligns with the migration to support multiple vector store IDs while maintaining the same test logic.backend/app/models/assistants.py (1)
4-5: LGTM: Necessary imports for PostgreSQL array support.The SQLAlchemy imports are correctly added to support the ARRAY column type for multiple vector store IDs.
backend/app/api/routes/assistant.py (1)
19-54: LGTM! Clean implementation following best practices.The route implementation properly:
- Uses dependency injection for database and user context
- Leverages CRUD functions for business logic separation
- Handles errors appropriately with meaningful status codes
- Uses the centralized
configure_openaiutility as suggested in past reviews
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/app/crud/assistants.py (1)
47-52: Fix return type annotation.The function is annotated to return
APIResponse[Assistant]but actually returnsAssistantdirectly.def insert_assistant( session: Session, organization_id: int, project_id: int, openai_assistant: OpenAIAssistant, -) -> APIResponse[Assistant]: +) -> Assistant:
🧹 Nitpick comments (5)
backend/app/tests/utils/utils.py (2)
4-4: Update type annotations to use modern Python syntax.According to static analysis,
typing.Typeis deprecated. Use the built-intypeinstead.-from typing import Optional, Type, TypeVar +from typing import Optional, TypeVar
62-83: Improve boolean comparisons and type annotations.The function logic is correct, but there are style improvements that align with Python best practices.
Apply these improvements:
-def get_project(session: Session, name: Optional[str] = None) -> Project: +def get_project(session: Session, name: str | None = None) -> Project: """ Retrieve an active project from the database. If a project name is provided, fetch the active project with that name. If no name is provided, fetch any random project. """ if name: statement = ( select(Project) - .where(Project.name == name, Project.is_active == True) + .where(Project.name == name, Project.is_active) .limit(1) ) else: - statement = select(Project).where(Project.is_active == True).limit(1) + statement = select(Project).where(Project.is_active).limit(1) project = session.exec(statement).first() if not project: raise ValueError("No active projects found") return projectbackend/app/tests/api/routes/test_assistants.py (3)
1-8: Remove unused imports.Several imports are not used in the test file and should be removed to keep the codebase clean.
-import pytest -from fastapi.testclient import TestClient -from sqlmodel import Session -from unittest.mock import MagicMock, patch -from app.main import app -from app.tests.utils.utils import random_email -from app.core.security import get_password_hash -from app.tests.utils.openai import mock_openai_assistant +import pytest +from fastapi.testclient import TestClient +from sqlmodel import Session +from unittest.mock import patch +from app.main import app +from app.tests.utils.openai import mock_openai_assistant
18-23: Fix inconsistent type hints and remove unused parameter.The
normal_user_api_key_headerparameter has inconsistent type hints between functions (str vs dict), and thedbparameter is unused.@patch("app.api.routes.assistants.fetch_assistant_from_openai") def test_ingest_assistant_success( mock_fetch_assistant, - db: Session, - normal_user_api_key_header: str, + normal_user_api_key_header: dict, ):
43-48: Remove unused parameter.The
dbparameter is not used in this test function.@patch("app.api.routes.assistants.configure_openai") def test_ingest_assistant_openai_not_configured( mock_configure_openai, - db: Session, normal_user_api_key_header: dict, ):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
backend/app/api/main.py(2 hunks)backend/app/api/routes/assistants.py(1 hunks)backend/app/crud/assistants.py(2 hunks)backend/app/seed_data/seed_data.json(1 hunks)backend/app/seed_data/seed_data.py(2 hunks)backend/app/tests/api/routes/test_assistants.py(1 hunks)backend/app/tests/api/routes/test_responses.py(2 hunks)backend/app/tests/crud/test_assistants.py(1 hunks)backend/app/tests/utils/openai.py(1 hunks)backend/app/tests/utils/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app/api/main.py
- backend/app/tests/api/routes/test_responses.py
🧰 Additional context used
🧬 Code Graph Analysis (4)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
get_user_by_email(34-37)backend/app/crud/api_key.py (1)
get_api_key_by_value(104-122)backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/models/project.py (1)
Project(28-49)
backend/app/api/routes/assistants.py (6)
backend/app/api/deps.py (2)
get_db(33-35)get_current_user_org_project(110-131)backend/app/core/util.py (1)
configure_openai(74-93)backend/app/crud/assistants.py (2)
fetch_assistant_from_openai(30-44)insert_assistant(47-109)backend/app/crud/credentials.py (1)
get_provider_credential(90-107)backend/app/models/user.py (1)
UserProjectOrg(68-69)backend/app/utils.py (2)
APIResponse(27-46)success_response(34-37)
backend/app/tests/api/routes/test_assistants.py (4)
backend/app/tests/utils/utils.py (1)
random_email(28-29)backend/app/core/security.py (1)
get_password_hash(93-103)backend/app/tests/utils/openai.py (1)
mock_openai_assistant(10-39)backend/app/tests/conftest.py (1)
db(27-41)
backend/app/tests/crud/test_assistants.py (4)
backend/app/tests/utils/openai.py (1)
mock_openai_assistant(10-39)backend/app/tests/utils/utils.py (1)
get_project(62-83)backend/app/crud/assistants.py (1)
insert_assistant(47-109)backend/app/tests/conftest.py (1)
db(27-41)
🪛 Ruff (0.11.9)
backend/app/tests/utils/utils.py
4-4: typing.Type is deprecated, use type instead
(UP035)
62-62: Use X | Y for type annotations
Convert to X | Y
(UP007)
72-72: Avoid equality comparisons to True; use if Project.is_active: for truth checks
Replace with Project.is_active
(E712)
76-76: Avoid equality comparisons to True; use if Project.is_active: for truth checks
Replace with Project.is_active
(E712)
backend/app/tests/utils/openai.py
12-12: Use X | Y for type annotations
Convert to X | Y
(UP007)
12-12: Do not use mutable data structures for argument defaults
Replace with None; initialize within function
(B006)
backend/app/tests/api/routes/test_assistants.py
4-4: unittest.mock.MagicMock imported but unused
Remove unused import: unittest.mock.MagicMock
(F401)
6-6: app.tests.utils.utils.random_email imported but unused
Remove unused import: app.tests.utils.utils.random_email
(F401)
7-7: app.core.security.get_password_hash imported but unused
Remove unused import: app.core.security.get_password_hash
(F401)
21-21: Unused function argument: db
(ARG001)
46-46: Unused function argument: db
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (14)
backend/app/seed_data/seed_data.py (2)
60-60: LGTM! Field update aligns with schema migration.The change from
vector_store_id: strtovector_store_ids: list[str]correctly reflects the database schema migration to support multiple vector store IDs per assistant.
264-264: LGTM! Constructor argument updated correctly.The assistant creation logic has been updated to use the new
vector_store_idsfield, maintaining consistency with the model changes.backend/app/seed_data/seed_data.json (4)
15-20: LGTM! Added multi-project support.The addition of the "Dalgo" project expands the seed data to support multiple projects, which aligns with the enhanced multi-project functionality introduced in this PR.
65-72: LGTM! Added credentials for the new project.The credential structure for the Dalgo project is consistent with the existing Glific credentials, providing proper OpenAI API key configuration for both projects.
80-80: LGTM! Updated to support multiple vector store IDs.The change from
vector_store_idtovector_store_idsas an array correctly reflects the database schema migration and enables support for multiple vector stores per assistant.Also applies to: 91-91
86-96: LGTM! Added project-specific assistant.The Dalgo assistant entry is properly structured and maintains consistency with the Glific assistant while being project-specific.
backend/app/api/routes/assistants.py (4)
1-16: LGTM! Clean imports and router setup.The imports are well-organized and the router configuration follows FastAPI conventions with appropriate prefix and tags.
19-28: LGTM! Proper route definition and dependency injection.The route definition follows FastAPI best practices with:
- Appropriate HTTP method (POST) for data creation
- Clear path parameter with type annotation and description
- Proper dependency injection for database session and user context
- Correct status code for resource creation (201)
32-44: LGTM! Robust credential handling and error management.The implementation properly:
- Retrieves provider credentials with organization and project context
- Configures the OpenAI client with proper error handling
- Returns meaningful HTTP 400 error when OpenAI is not configured
46-54: LGTM! Clean assistant ingestion workflow.The final workflow properly:
- Fetches the assistant from OpenAI using the configured client
- Inserts the assistant with proper organization and project association
- Returns a success response with the stored assistant data
backend/app/tests/utils/openai.py (1)
15-39: LGTM! Comprehensive mock assistant creation.The mock assistant creation is well-structured and includes all necessary fields to simulate a realistic OpenAI assistant object. The configurability of key parameters (assistant_id, vector_store_ids, max_num_results) makes it flexible for different test scenarios.
backend/app/crud/assistants.py (2)
30-44: LGTM - Well-implemented OpenAI integration.The function properly handles OpenAI API errors with appropriate logging and error handling. The use of masked assistant IDs in logs is a good security practice.
74-89: LGTM - Robust vector store ID extraction logic.The implementation correctly handles the migration from single vector store ID to multiple IDs, with proper null checks and fallback handling for both tool resources and tools.
backend/app/tests/crud/test_assistants.py (1)
10-116: Excellent comprehensive test coverage.This test class provides thorough coverage of the
insert_assistantfunction, including:
- Successful insertion with proper field mapping
- Duplicate prevention (409 error)
- Validation of required fields (400 error for missing instructions)
- Default value handling (name defaults to ID)
- Edge cases for missing vector stores and tool resources
The tests properly verify the migration from single
vector_store_idto multiplevector_store_idsand handle all the various data scenarios effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
backend/app/crud/assistants.py (1)
46-51: Fix return type annotation mismatch.The function is annotated to return
APIResponse[Assistant]but actually returnsAssistantdirectly on line 108.def insert_assistant( session: Session, organization_id: int, project_id: int, openai_assistant: OpenAIAssistant, -) -> APIResponse[Assistant]: +) -> Assistant:
🧹 Nitpick comments (2)
backend/app/crud/assistants.py (2)
73-79: Consider more robust vector store ID extraction.The current implementation uses multiple
hasattrchecks which may be fragile if the OpenAI API response structure changes. Consider using a try-catch approach for more robust handling.vector_store_ids = [] -if openai_assistant.tool_resources and hasattr( - openai_assistant.tool_resources, "file_search" -): - file_search = openai_assistant.tool_resources.file_search - if file_search and hasattr(file_search, "vector_store_ids"): - vector_store_ids = file_search.vector_store_ids or [] +try: + if (openai_assistant.tool_resources and + openai_assistant.tool_resources.file_search and + openai_assistant.tool_resources.file_search.vector_store_ids): + vector_store_ids = openai_assistant.tool_resources.file_search.vector_store_ids +except AttributeError: + vector_store_ids = []
81-87: Extract magic number to a constant.The default value
20formax_num_resultsshould be defined as a module-level constant for better maintainability.+DEFAULT_MAX_NUM_RESULTS = 20 + def insert_assistant( # ... existing parameters - max_num_results = 20 + max_num_results = DEFAULT_MAX_NUM_RESULTS
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/crud/assistants.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/crud/assistants.py (3)
1-13: LGTM: Clean imports and setup.The imports are well-organized and include all necessary dependencies for OpenAI integration and database operations.
30-43: LGTM: Robust error handling for OpenAI API calls.The function properly handles OpenAI API errors with appropriate logging (using masked assistant ID for privacy) and converts them to HTTP exceptions with meaningful error messages.
89-108: LGTM: Proper database operations and logging.The Assistant model creation, database operations, and success logging are implemented correctly. The mapping from OpenAI assistant to local model handles the new
vector_store_idsfield appropriately for the schema migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/tests/utils/utils.py (1)
4-4: Replace deprecatedTypeimport withtype.The static analysis tool correctly identifies that
typing.Typeis deprecated in favor of the built-intype.-from typing import Type, TypeVar +from typing import TypeVarAnd update line 57 accordingly:
-def get_non_existent_id(session: Session, model: Type[T]) -> int: +def get_non_existent_id(session: Session, model: type[T]) -> int:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py(1 hunks)backend/app/tests/api/routes/test_assistants.py(1 hunks)backend/app/tests/utils/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/tests/api/routes/test_assistants.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
get_user_by_email(34-37)backend/app/crud/api_key.py (1)
get_api_key_by_value(104-122)backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/models/project.py (1)
Project(28-49)
🪛 Ruff (0.11.9)
backend/app/tests/utils/utils.py
4-4: typing.Type is deprecated, use type instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (5)
backend/app/tests/utils/utils.py (2)
12-13: LGTM! Import additions support new functionality.The new imports for
get_api_key_by_value,APIKeyPublic, andProjectare properly organized and support the utility functions in this file.
62-84: LGTM! Well-implemented utility function with good error handling.The
get_projectfunction is well-designed with:
- Clear docstring explaining the behavior
- Proper optional parameter handling
- Appropriate database querying with filters
- Good error handling with descriptive ValueError
The function supports the new assistant ingestion workflows mentioned in the PR objectives and provides flexibility for both named and random project retrieval.
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py (3)
1-16: LGTM! Proper migration setup.The migration header follows Alembic conventions correctly with appropriate imports for PostgreSQL ARRAY support.
19-35: LGTM! Safe and correct upgrade migration.The upgrade function follows migration best practices:
- Adds the new column as nullable first for safety
- Correctly migrates existing non-NULL data to single-element arrays
- Drops the old column as the final step
The SQL logic properly handles the data type conversion from string to array.
26-32: LGTM! Correct SQL migration logic.The SQL migration logic correctly handles:
- PostgreSQL ARRAY type conversion
- Proper 1-based array indexing (
vector_store_ids[1])- NULL value safety with appropriate WHERE clauses
- Empty array validation with
array_length(vector_store_ids, 1) > 0The data migration preserves existing data in the upgrade direction and handles the array-to-single conversion appropriately in the downgrade.
Also applies to: 50-56
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/tests/utils/utils.py (1)
4-4: Update deprecated type annotation import.The static analysis tool correctly identifies that
typing.Typeis deprecated in favor of the built-intype(available since Python 3.9).Apply this diff to use the modern type annotation:
-from typing import Type, TypeVar +from typing import TypeVarAnd update the function signature on line 57:
-def get_non_existent_id(session: Session, model: Type[T]) -> int: +def get_non_existent_id(session: Session, model: type[T]) -> int:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py(1 hunks)backend/app/tests/utils/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
get_user_by_email(34-37)backend/app/crud/api_key.py (1)
get_api_key_by_value(104-122)backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/models/project.py (1)
Project(28-49)
🪛 Ruff (0.11.9)
backend/app/tests/utils/utils.py
4-4: typing.Type is deprecated, use type instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (2)
backend/app/tests/utils/utils.py (2)
12-13: LGTM! Import additions support new functionality.The new imports for
get_api_key_by_valueandProjectmodel are appropriate for the added utility function and align with the multi-project structure introduced in the codebase.
62-82: LGTM! Well-implemented utility function.The
get_projectfunction is well-designed with:
- Clear documentation explaining the behavior
- Proper error handling with descriptive error message
- Flexible parameter design (optional name filtering)
- Consistent coding patterns with other utility functions
- Correct SQLModel query syntax
This utility will be valuable for tests that need project context in the new multi-project structure.
backend/app/alembic/versions/f2589428c1d0_change_vector_store_id_to_vector_store_ids.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/app/tests/utils/utils.py (2)
4-4: Address the deprecatedtyping.Typeimport.The static analysis tool correctly identifies that
typing.Typeis deprecated in favor of the built-intypefor Python 3.9+.Apply this diff to use the modern import:
-from typing import Type, TypeVar +from typing import TypeVarAnd update the type annotation on line 61:
-def get_non_existent_id(session: Session, model: Type[T]) -> int: +def get_non_existent_id(session: Session, model: type[T]) -> int:
66-85: Clarify the "random" behavior in the documentation.The function implementation is solid and follows good practices with proper error handling. However, the docstring mentions "fetch any random project" which is misleading since the function returns the first project found by the database query, not a truly random one.
Apply this diff to improve the documentation accuracy:
- If no name is provided, fetch any random project. + If no name is provided, fetch any active project.The function correctly handles the optional name parameter, includes appropriate error handling, and aligns with the project's database patterns. The use case explained in past comments makes sense for test scenarios where you need an existing project reference.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/crud/__init__.py(3 hunks)backend/app/tests/utils/utils.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/tests/utils/utils.py (4)
backend/app/crud/user.py (1)
get_user_by_email(34-37)backend/app/crud/api_key.py (1)
get_api_key_by_value(104-122)backend/app/models/api_key.py (1)
APIKeyPublic(23-25)backend/app/models/project.py (1)
Project(28-49)
🪛 Ruff (0.12.2)
backend/app/crud/__init__.py
49-49: .assistants.get_assistant_by_id imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
50-50: .assistants.fetch_assistant_from_openai imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
51-51: .assistants.insert_assistant imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/tests/utils/utils.py
4-4: typing.Type is deprecated, use type instead
(UP035)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/tests/utils/utils.py (1)
12-13: LGTM! Import additions support new functionality.The new imports for
get_api_key_by_valueandProjectmodel are correctly added to support the newget_projectutility function and are properly organized.backend/app/crud/__init__.py (2)
8-8: LGTM: Code organization improvement.The blank line addition improves readability by better grouping the imports.
48-52: Assistant CRUD imports are utilized across the codebaseThe functions imported in
backend/app/crud/__init__.pyare in active use, including in API routes and tests:
backend/app/api/routes/assistants.pyusesfetch_assistant_from_openaiandinsert_assistantbackend/app/api/routes/responses.pyusesget_assistant_by_idbackend/app/tests/crud/test_assistants.pycoversinsert_assistantbackend/app/tests/api/routes/test_assistants.pymocks and verifiesfetch_assistant_from_openaibackend/app/tests/api/routes/test_responses.pymocks and verifiesget_assistant_by_idNo changes needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
backend/app/tests/api/routes/test_assistants.py (1)
4-4: Remove unused import.The import
from app.main import appis not used in the test file.-from app.main import appbackend/app/crud/assistants.py (1)
65-65: Fix inconsistent log message function names.The log messages reference
[insert_assistant]but the actual function is namedsync_assistant. This creates confusion when debugging.- f"[insert_assistant] Assistant with ID {assistant_id} already exists in the database." + f"[sync_assistant] Assistant with ID {assistant_id} already exists in the database."- f"[insert_assistant] Successfully ingested assistant with ID {mask_string(assistant_id)}." + f"[sync_assistant] Successfully ingested assistant with ID {mask_string(assistant_id)}."Also applies to: 111-111
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/app/api/routes/assistants.py(1 hunks)backend/app/crud/__init__.py(2 hunks)backend/app/crud/assistants.py(2 hunks)backend/app/tests/api/routes/test_assistants.py(1 hunks)backend/app/tests/crud/test_assistants.py(1 hunks)backend/app/utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app/api/routes/assistants.py
- backend/app/tests/crud/test_assistants.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/app/utils.py (1)
backend/app/crud/credentials.py (1)
get_provider_credential(90-118)
backend/app/tests/api/routes/test_assistants.py (1)
backend/app/tests/utils/openai.py (1)
mock_openai_assistant(10-39)
🪛 Ruff (0.12.2)
backend/app/crud/__init__.py
48-48: .assistants.get_assistant_by_id imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
49-49: .assistants.fetch_assistant_from_openai imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
50-50: .assistants.sync_assistant imported but unused; consider removing, adding to __all__, or using a redundant alias
(F401)
backend/app/tests/api/routes/test_assistants.py
4-4: app.main.app imported but unused
Remove unused import: app.main.app
(F401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (6)
backend/app/crud/__init__.py (1)
47-51: LGTM! New assistant functions properly exposed.The import additions correctly expose the new
fetch_assistant_from_openaiandsync_assistantfunctions as part of the public CRUD interface, supporting the new assistant ingestion functionality.backend/app/utils.py (1)
168-192: Well-implemented utility function.The
get_openai_clientfunction correctly:
- Validates OpenAI credentials exist for the org/project
- Uses appropriate HTTP status codes (400 for missing credentials, 500 for client creation failure)
- Provides clear error messages
- Follows the existing codebase patterns
backend/app/tests/api/routes/test_assistants.py (1)
13-32: Test implementation looks good.The test properly mocks the
fetch_assistant_from_openaifunction and validates the expected response structure. The use of themock_openai_assistantutility keeps the test clean and maintainable.backend/app/crud/assistants.py (3)
30-48: Well-implemented OpenAI assistant fetching.The function properly handles OpenAI API errors:
- Uses 404 for
NotFoundError(correct)- Uses 502 for
OpenAIError(appropriate for upstream service errors)- Logs errors with masked assistant IDs for security
78-84: Vector store IDs extraction logic is correct.The safe attribute checking with
hasattrand proper fallback to empty list handles various OpenAI assistant configurations correctly.
86-92: Max results extraction logic is well-implemented.The loop through tools to find file_search configuration with proper fallback to default value of 20 is correct and handles edge cases appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/app/crud/assistants.py (1)
86-92: Consider documenting the assumption about file_search tool precedence.The code assumes that the first
file_searchtool found contains the relevantmax_num_resultsconfiguration. While this is likely correct for typical use cases, consider adding a comment to document this assumption.max_num_results = 20 +# Use max_num_results from the first file_search tool found for tool in openai_assistant.tools or []:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/app/crud/assistants.py(2 hunks)backend/app/tests/api/routes/test_assistants.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- backend/app/tests/api/routes/test_assistants.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/app/crud/assistants.py (1)
backend/app/utils.py (1)
mask_string(156-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/crud/assistants.py (3)
30-49: LGTM! Well-implemented OpenAI API integration with proper error handling.The function correctly handles OpenAI API exceptions and uses appropriate HTTP status codes. The use of masked logging for assistant IDs is good security practice.
78-84: Robust defensive programming for OpenAI API structure.The nested
hasattrchecks provide good protection against potential changes in the OpenAI API structure. The fallback to empty list forvector_store_idsis appropriate.
94-113: Well-structured database operations with proper transaction handling.The Assistant model creation, database persistence, and logging follow SQLModel best practices. The fallback logic for the name field (
openai_assistant.name or openai_assistant.id) handles potential None values correctly.
Summary
This PR adds support for ingesting OpenAI Assistants into the platform and refactors the openai_assistant table to support multiple vector stores per assistant via a new vector_store_ids field.
Target issue is #277 and #272
Key Changes
New Route: /api/v1/assistant/{assistant_id}/ingest
Model Changes: Assistant
Replaces vector_store_id: str with vector_store_ids: List[str].
Uses PostgreSQL ARRAY(String) for storage.
Updated Alembic migration:
Adds vector_store_ids column.
Copies existing vector_store_id to vector_store_ids as a single-element array.
Drops the vector_store_id column.
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Summary by CodeRabbit
New Features
Bug Fixes
Tests
Database Migration