-
Notifications
You must be signed in to change notification settings - Fork 5
Refactor Collection Creation for no delay #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is 📢 Thoughts on this report? Let us know! |
backend/app/alembic/versions/75b5156a28fd_add_organization_id_project_id_status_.py
Outdated
Show resolved
Hide resolved
| "collection", "llm_service_id", existing_type=sa.VARCHAR(), nullable=True | ||
| ) | ||
| op.alter_column( | ||
| "collection", "llm_service_name", existing_type=sa.VARCHAR(), nullable=True | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these were already there in table why we need them again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are not being added again, they are getting altered from non-nullable to nullable columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok cool
| assistant = assistant_crud.create( | ||
| vector_store.id, **dict(request.extract_super_type(AssistantOptions)) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add log
backend/app/api/deps.py
Outdated
| ) | ||
|
|
||
|
|
||
| CurrentUserOrgproject = Annotated[UserProjectOrg, Depends(get_current_user_org_project)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While CurrentUserOrgProject is perfectly fine and clear, you could also consider a more concise alternative like CurrentUserContext or CurrentUserScope.
And UserProjectOrg as UserContext or UserScope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about using it this way but let's keep this for later
| api_key_headers: dict[str, str], | ||
| ): | ||
| user = get_user_from_api_key(db, api_key_headers) | ||
| collection = create_collection(db, user, status="processing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing test cases, make sure to delete any entries created in the database during the test.
We want to maintain a consistent and clean database state across all tests.
This issue is currently present throughout many of our test cases and needs to be addressed to ensure reliable and isolated testing.
May be you can refer this and use teardown function.
Ideally, we should seed necessary data at the start of the test (we can use seed_script), and any additional data created during the test should be deleted once it has been used.
backend/app/tests/utils/utils.py
Outdated
| return user.id | ||
|
|
||
|
|
||
| def get_real_api_key_headers(db: Session) -> dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, every time the get_real_api_key_headers function is called, it creates a new API key, project, and organization.
Instead, you can seed this data once at the beginning of the test session and reuse the same API key across test cases—similar to how normal_user_token and super_user_token are handled.
This approach reduces redundant setup, improves test performance, and ensures consistency across tests.
Summary
Target issue is #217
Refactored POST /collections/create to first insert the collection in the DB with status='processing' before triggering background tasks.
Async task now handles vector store + assistant creation and updates the collection with LLM details and status (success/failed).
Reused existing CollectionCrud and DocumentCollectionCrud methods for updates and associations.
NOTES
Logging few important parameters such as - INFO - Collection created: 425111c7-25a3-42d3-b373-c97202d9aad5 | Time: 7.514757871627808s | Files: 1 |Sizes:[137.57] KB |Types: ['txt']
To log file size(s) of files being uploaded, this logic was used to first calculate the file size and then the logic was added to core/cloud/storage.py