fix: handle documents without embedding model in classification #72
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request enhances the handling of document slices that lack an embedding model, ensuring they are properly classified and flagged for traceability in the Qdrant synchronization workflow. It updates both the core logic and the associated tests to account for this scenario.
Improvements to document classification and error handling:
classify_documents_per_collectionfunction inqdrant_handler.pyto classify slices without an embedding model under theNonecollection key and log an error for each such slice. [1] [2]classify_documents_per_collectionto includeNoneas a possible key, reflecting the new error-handling logic.Workflow enhancements for traceability:
qdrant_syncronizer.py, added logic to flag and record document slices assigned to theNonecollection by creating aProcessStateentry for each, then removing them from further processing.Test coverage improvements:
test_qdrant_handler.pyto verify that slices without an embedding model are correctly assigned to theNonecollection and that the expected output includes this key in various scenarios. [1] [2] [3] [4]