Summary
codanna documents index (with --collection X or --all) wipes chunks
belonging to every other configured collection in the same .codanna/index/.
Only one collection per index has searchable chunks at any time. --all does
not actually persist all collections — it iterates per-collection sequentially,
and each iteration deletes the previous one's chunks, so only the
last-iterated collection survives.
Environment
- codanna 0.9.19 (latest as of 2026-04-25)
- Linux 6.6.87.2-microsoft-standard-WSL2 (x86_64)
Reproduction
.codanna/settings.toml:
version = 1
index_path = ".codanna/index"
workspace_root = "/home/user/repo"
[documents]
enabled = true
[documents.collections.alpha]
paths = ["/path/to/alpha-docs/"]
patterns = ["**/*.md"]
[documents.collections.beta]
paths = ["/path/to/beta-docs/"]
patterns = ["**/*.md"]
Direct demonstration of the wipe:
$ codanna documents stats alpha
Chunks: 2236
$ codanna documents stats beta
Chunks: 0
$ codanna documents index --collection beta --no-progress
Indexing collection: beta
Files processed: 228
Chunks created: 6084
Chunks removed: 2236 ← removes alpha's chunks
$ codanna documents stats alpha
Chunks: 0 ← wiped
$ codanna documents stats beta
Chunks: 6084
--all exhibits the same pattern: each iteration wipes the previous, and only
the last-iterated collection retains chunks.
Expected
documents index --collection X should only operate on chunks tagged
collection_name == X. Chunks from other collections should be untouched.
documents index --all should leave every configured collection's chunks
intact.
Diagnostic data
After documents index --all, .codanna/index/documents/state.json:
file_states contains entries for only the last-iterated collection
collection_ids correctly registers all configured collection names
next_chunk_id is much larger than the surviving chunk count, suggesting
chunk IDs are allocated per collection but persistence retains only the
most recent
The tantivy schema has a collection_name field, but the chunk-pruning step
appears to delete by global state rather than scoping to the target collection.
Secondary issue
documents stats <coll> reports the same Files: count for every
collection — it appears to show the total state.json.file_states size, not
the per-collection count. Only Chunks: reflects per-collection truth.
Workaround (verified)
Configure a single collection per index with all source paths combined:
[documents.collections.docs]
paths = [
"/path/to/alpha-docs/",
"/path/to/beta-docs/",
]
patterns = ["**/*.md"]
Empirically verified: 278 files across two source paths produce 8320 chunks in
one collection, all searchable, with Chunks removed: 0. Loses the ability to
scope searches by collection name, but gives stable chunking until fixed.
Summary
codanna documents index(with--collection Xor--all) wipes chunksbelonging to every other configured collection in the same
.codanna/index/.Only one collection per index has searchable chunks at any time.
--alldoesnot actually persist all collections — it iterates per-collection sequentially,
and each iteration deletes the previous one's chunks, so only the
last-iterated collection survives.
Environment
Reproduction
.codanna/settings.toml:Direct demonstration of the wipe:
--allexhibits the same pattern: each iteration wipes the previous, and onlythe last-iterated collection retains chunks.
Expected
documents index --collection Xshould only operate on chunks taggedcollection_name == X. Chunks from other collections should be untouched.documents index --allshould leave every configured collection's chunksintact.
Diagnostic data
After
documents index --all,.codanna/index/documents/state.json:file_statescontains entries for only the last-iterated collectioncollection_idscorrectly registers all configured collection namesnext_chunk_idis much larger than the surviving chunk count, suggestingchunk IDs are allocated per collection but persistence retains only the
most recent
The tantivy schema has a
collection_namefield, but the chunk-pruning stepappears to delete by global state rather than scoping to the target collection.
Secondary issue
documents stats <coll>reports the sameFiles:count for everycollection — it appears to show the total
state.json.file_statessize, notthe per-collection count. Only
Chunks:reflects per-collection truth.Workaround (verified)
Configure a single collection per index with all source paths combined:
Empirically verified: 278 files across two source paths produce 8320 chunks in
one collection, all searchable, with
Chunks removed: 0. Loses the ability toscope searches by collection name, but gives stable chunking until fixed.