Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Nov 18, 2025

This pull request enhances the handling of document slices that lack an embedding model, ensuring they are properly classified and flagged for traceability in the Qdrant synchronization workflow. It updates both the core logic and the associated tests to account for this scenario.

Improvements to document classification and error handling:

  • Updated the classify_documents_per_collection function in qdrant_handler.py to classify slices without an embedding model under the None collection key and log an error for each such slice. [1] [2]
  • Modified the return type of classify_documents_per_collection to include None as a possible key, reflecting the new error-handling logic.

Workflow enhancements for traceability:

  • In qdrant_syncronizer.py, added logic to flag and record document slices assigned to the None collection by creating a ProcessState entry for each, then removing them from further processing.

Test coverage improvements:

  • Added and updated tests in test_qdrant_handler.py to verify that slices without an embedding model are correctly assigned to the None collection and that the expected output includes this key in various scenarios. [1] [2] [3] [4]

@lpi-tn lpi-tn requested a review from Copilot November 18, 2025 14:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds error handling for document slices that lack an embedding model during Qdrant synchronization. These slices are now classified under a None collection key, logged as errors, and flagged in the database for traceability before being excluded from further processing.

Key changes:

  • Modified classify_documents_per_collection to catch AttributeError when accessing embedding_model.title and classify affected slices under the None key
  • Added workflow logic to create ProcessState entries for documents in the None collection before removing them from processing
  • Updated all existing tests to expect the None key in the returned dictionary

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
welearn_datastack/modules/qdrant_handler.py Added try-except block to handle missing embedding models and classify them under None collection key
welearn_datastack/nodes_workflow/QdrantSyncronizer/qdrant_syncronizer.py Added logic to flag documents without collections via ProcessState entries before removing them
tests/qdrant_syncronizer/test_qdrant_handler.py Added new test for None collection handling and updated existing tests to include empty None set

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lpi-tn lpi-tn merged commit 20f7051 into main Nov 18, 2025
7 checks passed
@lpi-tn lpi-tn deleted the Fix/qdrant-syncronizer-no-embedding-model branch November 18, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants