Skip to content

v0.2.2

Choose a tag to compare

@SouravRoy-ETL SouravRoy-ETL released this 27 Apr 12:19
· 363 commits to main since this release

Performance

  • Stats-based row-group pruning for top-N. Pass 1 reads each RG's max (or min, for ASC) from the Parquet footer, sorts best-first, and decodes sequentially with early break when the heap converges. parquet10m_orderby_top10: 69 ms to 15 ms. Now beats DuckDB (23 ms).
  • Direct string_t emit in PhysicalWindow. Vectorised emit no longer goes through Value::VARCHAR boxing for varchar columns. 10M-row varchar emit drops from 3 s to 300 ms.
  • Lazy column-major Python results. QueryResult now holds the C result alive until garbage collected. New methods:
    • fetchnumpy() returns a dict of numpy arrays. Numeric columns wrap the C buffer via np.frombuffer with a defensive copy.
    • fetchdf() builds a pandas DataFrame from fetchnumpy(), skipping row-tuple materialisation entirely.
      10M-row int conversion drops from ~4 s to ~50 ms.

Bench

13 win, 5 tie, 0 slow on the 18-query DuckDB suite (was 11 / 5 / 2 in 0.2.1).

Install

pip install --upgrade slothdb
npm install @slothdb/wasm@0.2.2

403 of 403 doctest tests passing on Windows, Linux, macOS.