Skip to content

Conversation

@nishika26
Copy link
Collaborator

@nishika26 nishika26 commented Jun 19, 2025

Summary

Target issue is #217

  • Refactored POST /collections/create to first insert the collection in the DB with status='processing' before triggering background tasks.

  • Async task now handles vector store + assistant creation and updates the collection with LLM details and status (success/failed).

  • Reused existing CollectionCrud and DocumentCollectionCrud methods for updates and associations.

NOTES

  • Logging few important parameters such as - INFO - Collection created: 425111c7-25a3-42d3-b373-c97202d9aad5 | Time: 7.514757871627808s | Files: 1 |Sizes:[137.57] KB |Types: ['txt']

  • To log file size(s) of files being uploaded, this logic was used to first calculate the file size and then the logic was added to core/cloud/storage.py

@codecov
Copy link

codecov bot commented Jun 19, 2025

@nishika26 nishika26 changed the title routes and deps Refactor Collection Creation for no delay Jun 19, 2025
@nishika26 nishika26 self-assigned this Jun 19, 2025
@nishika26 nishika26 moved this to In Progress in Dev Priorities Jun 19, 2025
@nishika26 nishika26 linked an issue Jun 19, 2025 that may be closed by this pull request
@nishika26 nishika26 marked this pull request as ready for review June 19, 2025 15:12
@nishika26 nishika26 requested review from AkhileshNegi and avirajsingh7 and removed request for AkhileshNegi June 19, 2025 17:58
@AkhileshNegi AkhileshNegi removed the status in Dev Priorities Jun 20, 2025
Comment on lines 31 to 35
"collection", "llm_service_id", existing_type=sa.VARCHAR(), nullable=True
)
op.alter_column(
"collection", "llm_service_name", existing_type=sa.VARCHAR(), nullable=True
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were already there in table why we need them again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not being added again, they are getting altered from non-nullable to nullable columns

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool

Comment on lines 212 to 214
assistant = assistant_crud.create(
vector_store.id, **dict(request.extract_super_type(AssistantOptions))
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add log

)


CurrentUserOrgproject = Annotated[UserProjectOrg, Depends(get_current_user_org_project)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While CurrentUserOrgProject is perfectly fine and clear, you could also consider a more concise alternative like CurrentUserContext or CurrentUserScope.

And UserProjectOrg as UserContext or UserScope.

Copy link
Collaborator Author

@nishika26 nishika26 Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about using it this way but let's keep this for later

api_key_headers: dict[str, str],
):
user = get_user_from_api_key(db, api_key_headers)
collection = create_collection(db, user, status="processing")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing test cases, make sure to delete any entries created in the database during the test.
We want to maintain a consistent and clean database state across all tests.

This issue is currently present throughout many of our test cases and needs to be addressed to ensure reliable and isolated testing.

May be you can refer this and use teardown function.

Ideally, we should seed necessary data at the start of the test (we can use seed_script), and any additional data created during the test should be deleted once it has been used.

return user.id


def get_real_api_key_headers(db: Session) -> dict[str, str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, every time the get_real_api_key_headers function is called, it creates a new API key, project, and organization.

Instead, you can seed this data once at the beginning of the test session and reuse the same API key across test cases—similar to how normal_user_token and super_user_token are handled.

This approach reduces redundant setup, improves test performance, and ensures consistency across tests.

@AkhileshNegi AkhileshNegi merged commit 36650dd into main Jun 21, 2025
1 check passed
@AkhileshNegi AkhileshNegi deleted the bug/response_delay branch June 21, 2025 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Closed

Development

Successfully merging this pull request may close these issues.

resource key returned in the async api has a delay in creation

3 participants