Problem
Running the setup notebooks that build the ChromaDB vector store fails with:
```
ValueError: Expected metadata value to be a str, int, float or bool, got None which is a NoneType in add.
```
Triggered during `ChromaDBWrapperClient.add_chunks_to_collection(chunks)` → `collection.add(metadatas=[chunk.metadata ...])`.
Root cause
In the `SentenceSplitterChunkingStrategy.to_ragchunks(...)` helper, each `RAGChunk`'s metadata is built as:
```python
metadata={
**node.metadata,
'relative_path': self._extract_relative_path(node.metadata['file_path'])
}
```
Two sources of `None`:
- `_extract_relative_path(...)` explicitly returns `None` when the regex doesn't match the `input_dir` prefix.
- LlamaIndex `Document` / `Node` metadata for markdown files routinely carries `None` values for fields that can't be inferred (e.g. `creation_date`, `last_modified_date` depending on filesystem / loader version).
Chroma's `validate_metadata` rejects any `None` value, so the entire batch `add` fails.
Affected notebooks
Both files share the same `SentenceSplitterChunkingStrategy` + `ChromaDBWrapperClient` code path:
- `labs/module2/notebooks/1_setup.ipynb` (cell-6 builds metadata, cell-9/11 calls `add_chunks_to_collection`)
- `labs/module3/notebooks/1_setup.ipynb` (same code)
Downstream labs (`module2/2_prompt_chaining` .. `6_evaluator_optimizer`, `module3/2_agent_memory` .. `4_agent_retrieval`) all expect the persisted `data/chroma` store this setup notebook creates, so the failure blocks most of modules 2 and 3 for students running from a clean clone.
Proposed fix
One-line strip of `None` values before handing metadata to Chroma, inside `to_ragchunks`:
```python
def to_ragchunks(self, nodes: List[Node]) -> List[RAGChunk]:
chunks = []
for node in nodes:
metadata = {
**node.metadata,
'relative_path': self.extract_relative_path(node.metadata['file_path'])
}
# Chroma rejects None metadata values; drop them.
metadata = {k: v for k, v in metadata.items() if v is not None}
chunks.append(RAGChunk(id=node.node_id, text=node.text, metadata=metadata))
return chunks
```
Alternatively, a tiny helper `_clean_metadata(m: dict) -> dict` could be added to `labs_common` so both module2 and module3 setup notebooks import it instead of duplicating the filter.
How to reproduce
- Fresh clone + `uv sync`.
- Open `labs/module2/notebooks/1_setup.ipynb`.
- Run cells top-to-bottom (the `%%bash` cell clones OpenSearch docs).
- The cell that does `chroma_os_docs_collection.add_chunks_to_collection(chunks)` raises the `ValueError`.
Scope
This is a pre-existing LlamaIndex ↔ Chroma compatibility issue, unrelated to the legacy-model-ID cleanup tracked in #50 / #51. Filing separately so the fix can be scoped cleanly.
Problem
Running the setup notebooks that build the ChromaDB vector store fails with:
```
ValueError: Expected metadata value to be a str, int, float or bool, got None which is a NoneType in add.
```
Triggered during `ChromaDBWrapperClient.add_chunks_to_collection(chunks)` → `collection.add(metadatas=[chunk.metadata ...])`.
Root cause
In the `SentenceSplitterChunkingStrategy.to_ragchunks(...)` helper, each `RAGChunk`'s metadata is built as:
```python
metadata={
**node.metadata,
'relative_path': self._extract_relative_path(node.metadata['file_path'])
}
```
Two sources of `None`:
Chroma's `validate_metadata` rejects any `None` value, so the entire batch `add` fails.
Affected notebooks
Both files share the same `SentenceSplitterChunkingStrategy` + `ChromaDBWrapperClient` code path:
Downstream labs (`module2/2_prompt_chaining` .. `6_evaluator_optimizer`, `module3/2_agent_memory` .. `4_agent_retrieval`) all expect the persisted `data/chroma` store this setup notebook creates, so the failure blocks most of modules 2 and 3 for students running from a clean clone.
Proposed fix
One-line strip of `None` values before handing metadata to Chroma, inside `to_ragchunks`:
```python
def to_ragchunks(self, nodes: List[Node]) -> List[RAGChunk]:
chunks = []
for node in nodes:
metadata = {
**node.metadata,
'relative_path': self.extract_relative_path(node.metadata['file_path'])
}
# Chroma rejects None metadata values; drop them.
metadata = {k: v for k, v in metadata.items() if v is not None}
chunks.append(RAGChunk(id=node.node_id, text=node.text, metadata=metadata))
return chunks
```
Alternatively, a tiny helper `_clean_metadata(m: dict) -> dict` could be added to `labs_common` so both module2 and module3 setup notebooks import it instead of duplicating the filter.
How to reproduce
Scope
This is a pre-existing LlamaIndex ↔ Chroma compatibility issue, unrelated to the legacy-model-ID cleanup tracked in #50 / #51. Filing separately so the fix can be scoped cleanly.