fix(rag): guard against None embeddings in LlamaIndex pipeline by kagura-agent · Pull Request #347 · HKUDS/DeepTutor

kagura-agent · 2026-04-20T05:13:02Z

Summary

Fixes #346 — RAG queries crash with TypeError: unsupported operand type(s) for *: 'NoneType' and 'float' when a stored embedding vector is None.

Root Cause

When an embedding provider returns {"embedding": null} for a chunk, two things go wrong:

_extract_embeddings_from_response uses item.get("embedding", []) — but dict.get() only returns the default when the key is absent, not when the value is explicitly None. So None passes through.
CustomEmbedding._get_text_embeddings trusts the result without validation, allowing None vectors to be stored in the index and crash np.dot at query time.

Fix

Two-layer defense:

Adapter layer (openai_compatible.py): Changed item.get("embedding", []) → item.get("embedding") or [] so explicit None values are caught.
Pipeline layer (llamaindex.py): Added post-embed validation in _get_text_embeddings — any None vectors are replaced with zero vectors and logged as errors. This prevents silent storage corruption regardless of which adapter is used.

Testing

Added unit test for None embedding extraction in test_extract_embeddings.py
All 22 extraction tests pass
All 45 passing embedding/RAG tests still pass (10 pre-existing failures due to missing async plugin — unrelated)

…#346) When an embedding provider returns null for a chunk's embedding vector, the None value gets stored in the vector index and causes a TypeError in LlamaIndex's similarity computation (np.dot with NoneType). Two-layer fix: 1. _extract_embeddings_from_response: use 'or []' instead of get(key, default) so explicit None values are caught (get() only uses the default when the key is absent, not when it's None). 2. CustomEmbedding._get_text_embeddings: validate the batch result and replace any None vectors with zero vectors, logging an error to surface the upstream issue. Closes HKUDS#346

- New `assets/releases/ver1-2-1.md` covering #348 (per-stage chat token limits), #349 (Regenerate across CLI/WS/Web UI), the regenerate UI harmony polish, and bug fixes #347 / #345 / #352. - README release-notes block updated to surface v1.2.1 above v1.2.0. Made-with: Cursor

pancacake merged commit 509d3ec into HKUDS:dev Apr 20, 2026
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rag): guard against None embeddings in LlamaIndex pipeline#347

fix(rag): guard against None embeddings in LlamaIndex pipeline#347
pancacake merged 1 commit intoHKUDS:devfrom
kagura-agent:fix/embedding-none-guard

kagura-agent commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kagura-agent commented Apr 20, 2026

Summary

Root Cause

Fix

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants