select model for inserting slices according the model id #52

lpi-tn · 2025-06-30T12:24:53Z

This pull request introduces several changes to improve the handling of embedding models and their corresponding Qdrant collections in the qdrant_handler module and associated test files. The most significant updates include transitioning from language-specific collection names to embedding model-specific naming, refactoring the classify_documents_per_collection function for clarity and efficiency, and updating the test cases to align with these changes.

Updates to embedding model handling:

Modified the classify_documents_per_collection function in welearn_datastack/modules/qdrant_handler.py to use a mapping (model_name_collection_name) for resolving collections based on embedding model names, replacing the previous language-based logic. This simplifies the logic and reduces redundancy.

Test case updates for embedding model-specific collections:

Updated collection names in tests/qdrant_syncronizer/test_qdrant_handler.py and tests/qdrant_syncronizer/test_qdrant_syncronizer.py to follow the new embedding model-specific naming convention (e.g., collection_welearn_en_embmodel → collection_welearn_en_english-embmodel). [1] [2]
Refactored test cases in test_qdrant_handler.py to include the embedding model name as a parameter for FakeSlice instances, ensuring proper alignment with the new naming logic. [1] [2] [3]
Introduced an EmbeddingModel instance in test_qdrant_syncronizer.py to simulate embedding model metadata and updated DocumentSlice instances to reference this model. [1] [2]

Additional improvements:

Added defaultdict import in welearn_datastack/modules/qdrant_handler.py to simplify the initialization of the ret dictionary in classify_documents_per_collection.
Incorporated Mock and EmbeddingModel imports in test files to support the new embedding model logic. [1] [2]

Copilot

Pull Request Overview

This PR refactors the collection mapping logic in the qdrant_handler module and updates the related test cases to use embedding model-specific naming instead of language‐based collection names. Key changes include:

Replacing manual dictionary initialization with a defaultdict(set) in classify_documents_per_collection.
Creating a mapping from embedding model title to collection name based on the fourth component of the collection name.
Updating test cases in qdrant_syncronizer and qdrant_handler to use the new naming convention.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
welearn_datastack/modules/qdrant_handler.py	Refactored classification logic with new mapping.
tests/qdrant_syncronizer/test_qdrant_syncronizer.py	Updated collection names and test setup.
tests/qdrant_syncronizer/test_qdrant_handler.py	Updated FakeSlice usage and expected results.

Comments suppressed due to low confidence (2)

tests/qdrant_syncronizer/test_qdrant_handler.py:134

The expected collection name for the multilingual case ('collection_welearn_mul_mulembmodel') does not align with the new mapping using the embedding model title; either update the test expectation or adjust the classification function to implement the intended fallback behavior.

            "collection_welearn_mul_mulembmodel": {doc_id1},

welearn_datastack/modules/qdrant_handler.py:48

With the removal of the fallback logic, slices that do not match any key in the mapping are simply skipped; if multilingual fallback is intended (as suggested by test expectations), consider reintroducing logic to derive the collection name from the document's language.

        except KeyError:

welearn_datastack/modules/qdrant_handler.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

select model for inserting slices according the model id

91dbb40

lpi-tn requested review from Copilot, jmsevin and sandragjacinto June 30, 2025 12:24

Copilot AI reviewed Jun 30, 2025

View reviewed changes

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

sandragjacinto reviewed Jun 30, 2025

View reviewed changes

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

sandragjacinto approved these changes Jun 30, 2025

View reviewed changes

lpi-tn and others added 2 commits June 30, 2025 14:42

Update welearn_datastack/modules/qdrant_handler.py

2089896

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

linter

0d67f60

lpi-tn merged commit a1fa09a into main Jun 30, 2025
7 checks passed

lpi-tn deleted the Fix/modify-select-model-for-slices branch June 30, 2025 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

select model for inserting slices according the model id #52

select model for inserting slices according the model id #52

Uh oh!

lpi-tn commented Jun 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

select model for inserting slices according the model id #52

select model for inserting slices according the model id #52

Uh oh!

Conversation

lpi-tn commented Jun 30, 2025

Updates to embedding model handling:

Test case updates for embedding model-specific collections:

Additional improvements:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants