v0.4.0: Streaming EXISTS() filter — 2.5x faster than Qlik Sense
What's New
Streaming EXISTS() filter
Filter QVD rows during reading, not after loading the entire file into memory. 2.5x faster than Qlik Sense for filtered reads on large files.
Benchmark: 1.7 GB QVD, 87.6M rows → filter 2 values, select 3 of 8 columns → 20.4M rows
| Qlik Sense | qvdrs (streaming) | |
|---|---|---|
| Read + filter | ~28s | 7.1s |
| Total (→ QVD) | ~28s | 11.4s |
| Total (→ Parquet) | — | 15.5s |
| Speedup | 1× | 2.5× |
New features
QvdStreamReader::read_filtered()— streaming filter with column selectionExistsIndex::from_values()— build index from explicit value listQvdTable::subset_rows()— create sub-table from row indicesQvdTable::filter_by_values()— filter by column values at symbol level- CLI:
qvd-cli filter --column --values --select
How it works
- Opens QVD file as stream — loads only symbol tables (unique values, small)
- Pre-computes which symbol indices match the filter
- Reads index table in 64K-row chunks
- For each row: decodes only the filter column (e.g., 5 bits)
- For matching rows: decodes only selected columns
- Non-matching rows are skipped — no memory allocated
Memory: holds only matched rows (~20M) instead of all rows (~87M).