fix: handle documents without embedding model in classification #72

lpi-tn · 2025-11-18T14:52:59Z

This pull request enhances the handling of document slices that lack an embedding model, ensuring they are properly classified and flagged for traceability in the Qdrant synchronization workflow. It updates both the core logic and the associated tests to account for this scenario.

Improvements to document classification and error handling:

Updated the classify_documents_per_collection function in qdrant_handler.py to classify slices without an embedding model under the None collection key and log an error for each such slice. [1] [2]
Modified the return type of classify_documents_per_collection to include None as a possible key, reflecting the new error-handling logic.

Workflow enhancements for traceability:

In qdrant_syncronizer.py, added logic to flag and record document slices assigned to the None collection by creating a ProcessState entry for each, then removing them from further processing.

Test coverage improvements:

Added and updated tests in test_qdrant_handler.py to verify that slices without an embedding model are correctly assigned to the None collection and that the expected output includes this key in various scenarios. [1] [2] [3] [4]

Copilot

Pull Request Overview

This PR adds error handling for document slices that lack an embedding model during Qdrant synchronization. These slices are now classified under a None collection key, logged as errors, and flagged in the database for traceability before being excluded from further processing.

Key changes:

Modified classify_documents_per_collection to catch AttributeError when accessing embedding_model.title and classify affected slices under the None key
Added workflow logic to create ProcessState entries for documents in the None collection before removing them from processing
Updated all existing tests to expect the None key in the returned dictionary

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
welearn_datastack/modules/qdrant_handler.py	Added try-except block to handle missing embedding models and classify them under `None` collection key
welearn_datastack/nodes_workflow/QdrantSyncronizer/qdrant_syncronizer.py	Added logic to flag documents without collections via `ProcessState` entries before removing them
tests/qdrant_syncronizer/test_qdrant_handler.py	Added new test for None collection handling and updated existing tests to include empty `None` set

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

welearn_datastack/modules/qdrant_handler.py

welearn_datastack/nodes_workflow/QdrantSyncronizer/qdrant_syncronizer.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: handle documents without embedding model in classification

5c98cac

lpi-tn requested a review from Copilot November 18, 2025 14:52

Copilot AI reviewed Nov 18, 2025

View reviewed changes

welearn_datastack/modules/qdrant_handler.py Outdated Show resolved Hide resolved

welearn_datastack/nodes_workflow/QdrantSyncronizer/qdrant_syncronizer.py Show resolved Hide resolved

Update welearn_datastack/modules/qdrant_handler.py

a7c7191

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

sandragjacinto approved these changes Nov 18, 2025

View reviewed changes

lpi-tn merged commit 20f7051 into main Nov 18, 2025
7 checks passed

lpi-tn deleted the Fix/qdrant-syncronizer-no-embedding-model branch November 18, 2025 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: handle documents without embedding model in classification #72

fix: handle documents without embedding model in classification #72

Uh oh!

lpi-tn commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: handle documents without embedding model in classification #72

fix: handle documents without embedding model in classification #72

Uh oh!

Conversation

lpi-tn commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants