Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Jun 30, 2025

This pull request introduces several changes to improve the handling of embedding models and their corresponding Qdrant collections in the qdrant_handler module and associated test files. The most significant updates include transitioning from language-specific collection names to embedding model-specific naming, refactoring the classify_documents_per_collection function for clarity and efficiency, and updating the test cases to align with these changes.

Updates to embedding model handling:

  • Modified the classify_documents_per_collection function in welearn_datastack/modules/qdrant_handler.py to use a mapping (model_name_collection_name) for resolving collections based on embedding model names, replacing the previous language-based logic. This simplifies the logic and reduces redundancy.

Test case updates for embedding model-specific collections:

  • Updated collection names in tests/qdrant_syncronizer/test_qdrant_handler.py and tests/qdrant_syncronizer/test_qdrant_syncronizer.py to follow the new embedding model-specific naming convention (e.g., collection_welearn_en_embmodelcollection_welearn_en_english-embmodel). [1] [2]
  • Refactored test cases in test_qdrant_handler.py to include the embedding model name as a parameter for FakeSlice instances, ensuring proper alignment with the new naming logic. [1] [2] [3]
  • Introduced an EmbeddingModel instance in test_qdrant_syncronizer.py to simulate embedding model metadata and updated DocumentSlice instances to reference this model. [1] [2]

Additional improvements:

  • Added defaultdict import in welearn_datastack/modules/qdrant_handler.py to simplify the initialization of the ret dictionary in classify_documents_per_collection.
  • Incorporated Mock and EmbeddingModel imports in test files to support the new embedding model logic. [1] [2]

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the collection mapping logic in the qdrant_handler module and updates the related test cases to use embedding model-specific naming instead of language‐based collection names. Key changes include:

  • Replacing manual dictionary initialization with a defaultdict(set) in classify_documents_per_collection.
  • Creating a mapping from embedding model title to collection name based on the fourth component of the collection name.
  • Updating test cases in qdrant_syncronizer and qdrant_handler to use the new naming convention.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
welearn_datastack/modules/qdrant_handler.py Refactored classification logic with new mapping.
tests/qdrant_syncronizer/test_qdrant_syncronizer.py Updated collection names and test setup.
tests/qdrant_syncronizer/test_qdrant_handler.py Updated FakeSlice usage and expected results.
Comments suppressed due to low confidence (2)

tests/qdrant_syncronizer/test_qdrant_handler.py:134

  • The expected collection name for the multilingual case ('collection_welearn_mul_mulembmodel') does not align with the new mapping using the embedding model title; either update the test expectation or adjust the classification function to implement the intended fallback behavior.
            "collection_welearn_mul_mulembmodel": {doc_id1},

welearn_datastack/modules/qdrant_handler.py:48

  • With the removal of the fallback logic, slices that do not match any key in the mapping are simply skipped; if multilingual fallback is intended (as suggested by test expectations), consider reintroducing logic to derive the collection name from the document's language.
        except KeyError:

lpi-tn and others added 2 commits June 30, 2025 14:42
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lpi-tn lpi-tn merged commit a1fa09a into main Jun 30, 2025
7 checks passed
@lpi-tn lpi-tn deleted the Fix/modify-select-model-for-slices branch June 30, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants