This project https://github.com/SouravRoy-ETL/slothdb/ is aggressively promoted with paid GitHub stars and spam comments on the Internet.
Here is the verdict:
The names (binder, catalog, data_table, row_group, column_data, string_t, LogicalType, Vector, DataChunk, BoundExpression, ExpressionExecutor, ParallelResultCollector, "non-reserved keyword set matches DuckDB's"…) are 1:1 with DuckDB's. SlothDB is a DuckDB-shaped reimplementation: same source layout, same internal API names, same Parquet ingest shape (read_parquet(...), SELECT * REPLACE, epoch_ms, make_date).
Where the speed actually comes from
- A 14k-line physical_planner.cpp that pattern-matches ClickBench query shapes and routes them to hand-rolled per-query handlers. The execution directory literally has files named for ClickBench query numbers:
src/execution/q4_dict_histogram.cpp - Q4 SELECT AVG(UserID) FROM hits
src/execution/q6_string_dedup.cpp - Q6 COUNT(DISTINCT SearchPhrase)
src/execution/q10_agg.cpp - Q10 RegionID GROUP BY with mixed aggs
src/execution/q11_helper.cpp - Q11 MobilePhoneModel COUNT(DISTINCT)
src/execution/q21_helper.cpp - Q21 URL LIKE '%google%' COUNT(*)
src/execution/q26_helper.cpp - Q26 ORDER BY SearchPhrase LIMIT N
+ static TryComputeQ14_2Stage() - Q14 inside physical_planner.cpp
Each handler bakes in assumptions about this dataset:
- Q4 (q4_dict_histogram.cpp): "Quick gate: every RG must have a dict page on this column" - computes SUM(UserID) and COUNT(*) by walking only the Parquet dictionary page of each row group, never touching the
data pages. Works because hits.parquet is dictionary-encoded.
- Q10 (q10_agg.cpp): slot_of is a flat dense array indexed by RegionID - // RegionID is a small dense-ish UInt32 (Q10: 9040 distinct in [0, 131069]); a direct array beats a hash probe on the 100M-row per-row hot
path.
- Q21/Q22/Q23 (q21_helper.cpp and physical_planner.cpp lines 475/503): specialized 4-byte / 8-byte needle substring search - loads the first 4 or 8 bytes of the needle as a uint32/uint64 literal and compares
with one cmp. The comments explicitly say // Q22 "google" hot path and // Q23 ".google." 8-byte needle.
- Q26 (q26_helper.cpp): "Dict-trust" variant of ORDER BY VARCHAR LIMIT - extracts top-K from the Parquet dictionary alone, skipping the data scan. Comment: // For ClickBench hits.parquet SearchPhrase, dict has
no orphans - verified empirically. Toggled by SLOTH_Q26_TRUST_DICT.
- Q14: a dedicated TryComputeQ14_2Stage two-stage COUNT(DISTINCT) path.
- Numerous shape-specific comments throughout physical_planner: // Q22 URL falls here, // Q31/Q32 hot path, // Q35: 9.76 M groups, // Q27 ORDER BY EventTime, SearchPhrase LIMIT 10, etc.
-
The "generic" path is markedly slower than ClickHouse (and almost anything else, including DuckDB). Look at src/execution/parallel_executor.cpp::ParallelScanFilter - it builds rows with chunk.GetValue(col, i) per cell, then SetValue(col, result_count, ...), into a std::vector<std::vector> under a mutex. That's row-at-a-time Value boxing - orders of magnitude slower than DuckDB's vectorised pipeline. The "wins" only show up when a query matches one of the recognised shapes and bypasses this path entirely.
-
Optimizer is three passes. optimizer.cpp only does ConstantFolding, FilterPushdown, TopNOptimization. There's no proper cost model, join reorder, or general aggregate fusion - the planner relies on the per-query handlers to win, and the comments admit it: Q31/Q32 (two-column high-cardinality GROUP BY without a custom handler) are noted as the project's weak spot (the README cites 0.29× and 0.12× of DuckDB).
Summary
The architecture is a DuckDB-clone shell (parser/binder/planner/catalog/storage layered identically) with a flat, three-pass optimiser and a generic execution path that's slower than DuckDB, plus a 14k-line physical planner whose job is to recognise ClickBench query shapes and dispatch to per-query handlers that hard-code dataset assumptions (column cardinalities, dict-page completeness, specific needle bytes, ClickBench RG layout). The README's "33 of 43 beat DuckDB, up to 5×" comes from those handlers firing on the queries they were written for; the queries with no matching handler (Q29, Q31, Q32, Q33) lose by 3–8×.
So it's not a general-purpose engine that happens to win ClickBench - it's a generic engine plus a ClickBench-shape lookup table. The benchmark is essentially part of the codebase's input.
This project https://github.com/SouravRoy-ETL/slothdb/ is aggressively promoted with paid GitHub stars and spam comments on the Internet.
Here is the verdict:
The names (binder, catalog, data_table, row_group, column_data, string_t, LogicalType, Vector, DataChunk, BoundExpression, ExpressionExecutor, ParallelResultCollector, "non-reserved keyword set matches DuckDB's"…) are 1:1 with DuckDB's. SlothDB is a DuckDB-shaped reimplementation: same source layout, same internal API names, same Parquet ingest shape (read_parquet(...), SELECT * REPLACE, epoch_ms, make_date).
Where the speed actually comes from
Each handler bakes in assumptions about this dataset:
data pages. Works because hits.parquet is dictionary-encoded.
path.
with one cmp. The comments explicitly say // Q22 "google" hot path and // Q23 ".google." 8-byte needle.
no orphans - verified empirically. Toggled by SLOTH_Q26_TRUST_DICT.
The "generic" path is markedly slower than ClickHouse (and almost anything else, including DuckDB). Look at
src/execution/parallel_executor.cpp::ParallelScanFilter- it builds rows withchunk.GetValue(col, i)per cell, thenSetValue(col, result_count, ...), into a std::vector<std::vector> under a mutex. That's row-at-a-time Value boxing - orders of magnitude slower than DuckDB's vectorised pipeline. The "wins" only show up when a query matches one of the recognised shapes and bypasses this path entirely.Optimizer is three passes. optimizer.cpp only does ConstantFolding, FilterPushdown, TopNOptimization. There's no proper cost model, join reorder, or general aggregate fusion - the planner relies on the per-query handlers to win, and the comments admit it: Q31/Q32 (two-column high-cardinality GROUP BY without a custom handler) are noted as the project's weak spot (the README cites 0.29× and 0.12× of DuckDB).
Summary
The architecture is a DuckDB-clone shell (parser/binder/planner/catalog/storage layered identically) with a flat, three-pass optimiser and a generic execution path that's slower than DuckDB, plus a 14k-line physical planner whose job is to recognise ClickBench query shapes and dispatch to per-query handlers that hard-code dataset assumptions (column cardinalities, dict-page completeness, specific needle bytes, ClickBench RG layout). The README's "33 of 43 beat DuckDB, up to 5×" comes from those handlers firing on the queries they were written for; the queries with no matching handler (Q29, Q31, Q32, Q33) lose by 3–8×.
So it's not a general-purpose engine that happens to win ClickBench - it's a generic engine plus a ClickBench-shape lookup table. The benchmark is essentially part of the codebase's input.