Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Jun 16, 2025

This pull request introduces changes to enhance support for multi-lingual document collections in the Qdrant handler and its associated tests. The updates include the addition of a multi-lingual collection code constant, modifications to collection naming logic, and new test cases to validate these changes.

Enhancements for Multi-Lingual Support:

  • welearn_datastack/constants.py: Added a new constant QDRANT_MULTI_LINGUAL_CODE to represent multi-lingual collections.
  • welearn_datastack/modules/qdrant_handler.py: Updated the classify_documents_per_collection function to prioritize multi-lingual collection names using QDRANT_MULTI_LINGUAL_CODE, with a fallback to language-specific collection names if the multi-lingual collection does not exist.

Test Updates:

  • tests/qdrant_syncronizer/test_qdrant_handler.py:
    • Enhanced the FakeSlice class to accept a dynamic embedding_model_name parameter and added a unique id attribute for better traceability.
    • Added a new test case test_should_handle_multiple_slices_for_same_collection_with_multi_lingual_collection to verify the behavior of the multi-lingual collection handling logic.

Dependency Adjustments:

@lpi-tn lpi-tn requested review from Copilot and sandragjacinto June 16, 2025 14:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances multi-lingual support in the Qdrant handler by introducing a new multi-lingual collection constant, updating collection selection logic, and adding test cases to validate the behavior.

  • Introduced QDRANT_MULTI_LINGUAL_CODE constant in constants.py
  • Updated classify_documents_per_collection to prioritize multi-lingual collections with a fallback to language-specific names
  • Added new test case and improved FakeSlice in the test suite

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
welearn_datastack/constants.py Added QDRANT_MULTI_LINGUAL_CODE constant for multi-lingual support
welearn_datastack/modules/qdrant_handler.py Updated collection naming logic in classify_documents_per_collection
tests/qdrant_syncronizer/test_qdrant_handler.py Expanded FakeSlice and added test case to cover multi-lingual behavior
Comments suppressed due to low confidence (1)

welearn_datastack/modules/qdrant_handler.py:55

  • [nitpick] Consider using a consistent logging format throughout the function. Either use string placeholders with logger.error or f-strings consistently to improve readability and maintainability.
logger.error(f"Collection {collection_name} not found in Qdrant, {dslice.id} will be ignored",)

@lpi-tn lpi-tn merged commit 0a308b8 into main Jun 16, 2025
7 checks passed
@lpi-tn lpi-tn deleted the Feature/adapt-qs-to-multi-linguage branch June 16, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants