Skip to content

v0.4.0: Streaming EXISTS() filter — 2.5x faster than Qlik Sense

Choose a tag to compare

@bintocher bintocher released this 26 Mar 20:14
· 25 commits to main since this release

What's New

Streaming EXISTS() filter

Filter QVD rows during reading, not after loading the entire file into memory. 2.5x faster than Qlik Sense for filtered reads on large files.

Benchmark: 1.7 GB QVD, 87.6M rows → filter 2 values, select 3 of 8 columns → 20.4M rows

Qlik Sense qvdrs (streaming)
Read + filter ~28s 7.1s
Total (→ QVD) ~28s 11.4s
Total (→ Parquet) 15.5s
Speedup 2.5×

New features

  • QvdStreamReader::read_filtered() — streaming filter with column selection
  • ExistsIndex::from_values() — build index from explicit value list
  • QvdTable::subset_rows() — create sub-table from row indices
  • QvdTable::filter_by_values() — filter by column values at symbol level
  • CLI: qvd-cli filter --column --values --select

How it works

  1. Opens QVD file as stream — loads only symbol tables (unique values, small)
  2. Pre-computes which symbol indices match the filter
  3. Reads index table in 64K-row chunks
  4. For each row: decodes only the filter column (e.g., 5 bits)
  5. For matching rows: decodes only selected columns
  6. Non-matching rows are skipped — no memory allocated

Memory: holds only matched rows (~20M) instead of all rows (~87M).