-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description:
During the RAG (Retrieval-Augmented Generation) pipeline execution, multiple documents are being skipped or filtered out due to:
Empty embedding vectors (e.g., Documents 5310–5319).
Embedding size mismatch (e.g., VanillaChestLoot.java has size 0 but expected 2048).
we expect to add targeted repair functions. :-)
Log Excerpt:
2025-07-08 23:38:47,328 - INFO - api.data_pipeline - data_pipeline.py:804 - Loaded 16050 documents from existing database
2025-07-08 23:38:47,328 - INFO - api.rag - rag.py:421 - Loaded 16050 documents for retrieval
2025-07-08 23:38:47,331 - WARNING - api.rag - rag.py:335 - Document 5310 has empty embedding vector, skipping
2025-07-08 23:38:47,332 - WARNING - api.rag - rag.py:335 - Document 5311 has empty embedding vector, skipping
2025-07-08 23:38:47,332 - WARNING - api.rag - rag.py:335 - Document 5312 has empty embedding vector, skipping
2025-07-08 23:38:47,332 - WARNING - api.rag - rag.py:335 - Document 5313 has empty embedding vector, skipping
2025-07-08 23:38:47,332 - WARNING - api.rag - rag.py:335 - Document 5314 has empty embedding vector, skipping
2025-07-08 23:38:47,332 - WARNING - api.rag - rag.py:335 - Document 5315 has empty embedding vector, skipping
2025-07-08 23:38:47,333 - WARNING - api.rag - rag.py:335 - Document 5316 has empty embedding vector, skipping
2025-07-08 23:38:47,333 - WARNING - api.rag - rag.py:335 - Document 5317 has empty embedding vector, skipping
2025-07-08 23:38:47,333 - WARNING - api.rag - rag.py:335 - Document 5318 has empty embedding vector, skipping
2025-07-08 23:38:47,333 - WARNING - api.rag - rag.py:335 - Document 5319 has empty embedding vector, skipping
2025-07-08 23:38:47,337 - INFO - api.rag - rag.py:350 - Target embedding size: 2048 (found in 16040 documents)
2025-07-08 23:38:47,339 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaChestLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,339 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaChestLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,339 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaChestLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,339 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaEntityLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,339 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaEntityLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,340 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaEntityLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,340 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaEntityLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,340 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaFishingLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,340 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaGiftLoot.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,340 - WARNING - api.rag - rag.py:377 - Filtering out document 'src\main\java\net\minecraft\data\loot\packs\VanillaLootTableProvider.java' due to embedding size mismatch: 0 != 2048
2025-07-08 23:38:47,344 - INFO - api.rag - rag.py:384 - Embedding validation complete: 16040/16050 documents have valid embeddings
2025-07-08 23:38:47,344 - WARNING - api.rag - rag.py:390 - Filtered out 10 documents due to embedding issues
2025-07-08 23:38:47,344 - INFO - api.rag - rag.py:429 - Using 16040 documents with valid embeddings for retrieval