Skip to content

Replace SHA-512 with vendored portable BLAKE3 for chunk addresses#667

Merged
timsehn merged 7 commits into
masterfrom
perf/blake3-chunk-hash
May 6, 2026
Merged

Replace SHA-512 with vendored portable BLAKE3 for chunk addresses#667
timsehn merged 7 commits into
masterfrom
perf/blake3-chunk-hash

Conversation

@timsehn
Copy link
Copy Markdown
Collaborator

@timsehn timsehn commented Apr 28, 2026

Summary

sha512_transform was the second-biggest CPU hotspot in the profile (after the now-fixed mutmap qsort). prollyHashCompute hashes every emitted chunk; on write workloads it's a real fraction of total CPU. Replacing with portable BLAKE3.

Local sysbench result (the actual bench harness, 11-iter median)

File-backed writes (the parity target), master vs this branch:

Test Master (SHA-512) BLAKE3 (this PR) Δ
oltp_bulk_insert 1.35 1.23 −0.12
oltp_insert 1.69 1.38 −0.31
oltp_update_index 1.70 1.62 −0.08
oltp_update_non_index 1.57 1.48 −0.09
oltp_delete_insert 1.55 1.39 −0.16
oltp_write_only 1.64 1.36 −0.28
types_delete_insert 1.14 1.14 flat
oltp_read_write 1.30 1.26 −0.04
Average 1.49 1.36 −0.13 (−9%)

File-backed reads avg: 1.14 → 1.11.

First PR that's moved the actual sysbench numbers in a meaningful way at the workload shape that matters. oltp_insert and oltp_write_only each shaved 0.3 off the multiplier.

Microbench numbers

4 KB input, 200K iterations:

Hash ns/op MB/s ratio
SHA-512 (existing) 13,380 306 1.00×
BLAKE3 portable (this PR) 5,094 804 2.63×
BLAKE3 with SIMD (future) 2,441 1,678 5.48×

Hashing was ~30% of write CPU; replacing it with something 2.63× faster cuts that to ~12%, total work drops ~18% on a hash-bound workload (matches the 200K-row local timing).

Why portable, not SIMD

  • WASM / iOS / Android: no SSE/AVX/NEON intrinsics in the portable path, so the same source compiles unchanged on every target.
  • SIMD is a future PR — can be added behind compile-time flags without changing the on-disk format.
  • Even portable beats SHA-512 by 2.63×.

Why BLAKE3 specifically

  • 256-bit digest, truncated to 20 bytes (same truncation pattern the previous SHA-512 path used). Safe because BLAKE3's output is itself uniform.
  • Faster than BLAKE2b, faster than SHA-512, same collision properties as either for content addressing.

Vendored layout

ext/blake3/
  blake3.h                       (verbatim upstream)
  blake3_impl.h                  (SIMD decls stripped; portable only)
  blake3.c                       (verbatim upstream)
  blake3_portable.c              (verbatim upstream)
  blake3_dispatch_portable.c     (~30-line doltlite shim replacing
                                  upstream blake3_dispatch.c — no
                                  runtime CPU feature detection)

Upstream BLAKE3 1.8.5 (CC0/Apache-2.0 dual licensed; compatible with doltlite's Apache-2.0).

⚠️ BREAKING — on-disk format

CHUNK_STORE_VERSION 10 → 11. Chunk content addresses change (different hash function), so commit hashes for the same logical data are not the same. Existing 0.10.x DBs fail to open with the version-mismatch log message the version-10 bump introduced.

Followups

  • The dead sha512_hash / sha512_transform functions are left in place for now; clang's DCE strips them from the binary. Removal is cleanup.
  • BLAKE3 SIMD paths (SSE2/SSE4.1/AVX2/AVX-512/NEON) — could nearly double the win, behind a compile-time flag.

Verification

  • vc_oracle_merge_test: 41/41
  • vc_oracle_branch_test: 30/30
  • vc_oracle_diff_test: 41/41
  • chunk_distribution_test: 7/7
  • 10K + 2K branch + merge end-to-end OK
  • Local sysbench harness: file-backed write average 1.49 → 1.36

Test plan

  • CI passes (build + asan + oracle suite + crash recovery + WASM)
  • CI sysbench run reflects the local result

🤖 Generated with Claude Code

Profile showed sha512_transform as the second-biggest CPU hotspot
on write workloads (after the now-fixed mutmap qsort). prollyHashCompute
hashes every emitted chunk and is hot on every commit.

Microbench on a 4 KB input (representative chunk size):

  SHA-512 (existing):                    13,380 ns/op     306 MB/s
  vendored BLAKE3 (portable):             5,094 ns/op     804 MB/s   2.63x
  libblake3 (SIMD-accelerated):           2,441 ns/op   1,678 MB/s   5.48x

End-to-end timing on a 200K-row insert workload (10 runs):

  master (SHA-512):  median 1.170s  mean 1.182s
  branch (BLAKE3):   median 0.965s  mean 0.984s
  Δ:                 17.5% faster (median), 16.8% (mean)

Math reconciles: hashing was ~30% of write CPU; replacing it with
something 2.6x faster cuts that to ~12%, total work drops by ~18%.

Why portable, not SIMD:
  - WASM/iOS/Android targets: no SSE2/AVX/NEON intrinsics in the
    portable version, so a single source set compiles unchanged on
    every target. SIMD paths are a follow-up: they can be added
    behind compile-time flags without changing the format.
  - Even the portable path beats SHA-512 by 2.6x. The SIMD ceiling
    would be another 2x on top.

Why BLAKE3 specifically:
  - 256-bit digest (we truncate to 20 bytes, same as SHA-512 was
    truncated). Truncation is collision-safe because BLAKE3's output
    is itself a uniform digest.
  - Faster than SHA-512, faster than BLAKE2b, comparable security
    properties to either for content addressing.

What's vendored:
  - ext/blake3/blake3.h          — public API (verbatim from upstream)
  - ext/blake3/blake3_impl.h     — internal types/macros, with the
                                   SIMD declarations stripped (we
                                   only use the portable path)
  - ext/blake3/blake3.c          — high-level hasher (verbatim)
  - ext/blake3/blake3_portable.c — portable BLAKE3 round (verbatim)
  - ext/blake3/blake3_dispatch_portable.c — minimal dispatch shim
                                  that calls the portable functions
                                  directly. Replaces upstream
                                  blake3_dispatch.c, which does
                                  runtime CPU feature detection.
                                  ~30 lines.

Upstream is ~1.8 (CC0/Apache-2.0 dual licensed), suitable for
vendoring under doltlite's Apache-2.0.

BREAKING — on-disk format:

  CHUNK_STORE_VERSION 10 -> 11. Chunk content addresses change
  (different hash function over the same bytes), so commit hashes
  for the same logical data are not the same. Existing 0.10.x DBs
  fail to open with the SQLITE_NOTADB + format-mismatch log message
  the version-10 bump introduced.

The dead sha512_hash and sha512_transform functions are left in
place for now; clang's DCE strips them. Removal is a follow-up
cleanup.

Verified:
  - vc_oracle_merge_test:  41/41
  - vc_oracle_branch_test: 30/30
  - vc_oracle_diff_test:   41/41
  - chunk_distribution_test: 7/7
  - 10K + 2K branch + merge end-to-end OK

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 28, 2026

Sysbench-Style Benchmark: Doltlite vs SQLite

In-Memory

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 48,000 54,016 1.13
oltp_range_select 32,000 40,992 1.28
oltp_sum_range 12,992 16,032 1.23
oltp_order_range 5,024 6,016 1.20
oltp_distinct_range 6,944 8,000 1.15
oltp_index_scan 5,984 8,992 1.50
select_random_points 22,016 29,024 1.32
select_random_ranges 8,000 9,024 1.13
covering_index_scan 11,008 21,984 2.00
groupby_scan 51,968 59,968 1.15
index_join 5,984 8,992 1.50
index_join_scan 2,976 4,992 1.68
types_table_scan 11,008 12,992 1.18
table_scan 116,000 132,992 1.15
oltp_read_only 246,048 276,992 1.13
Average 1.31

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 25,984 36,000 1.39
oltp_insert 19,008 26,016 1.37
oltp_update_index 45,024 84,000 1.87
oltp_update_non_index 32,000 56,032 1.75
oltp_delete_insert 43,008 67,008 1.56
oltp_write_only 18,016 31,008 1.72
types_delete_insert 24,000 32,000 1.33
oltp_read_write 123,040 173,024 1.41
Average 1.55

File-Backed

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 136,000 88,992 0.65
oltp_range_select 42,976 44,000 1.02
oltp_sum_range 20,992 21,024 1.00
oltp_order_range 6,016 6,976 1.16
oltp_distinct_range 7,008 8,960 1.28
oltp_index_scan 15,008 12,992 0.87
select_random_points 32,992 35,008 1.06
select_random_ranges 16,992 13,024 0.77
covering_index_scan 21,024 26,976 1.28
groupby_scan 54,016 60,992 1.13
index_join 10,976 12,000 1.09
index_join_scan 4,000 5,984 1.50
types_table_scan 12,000 13,984 1.17
table_scan 116,992 134,976 1.15
oltp_read_only 330,048 331,008 1.00
Average 1.08

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 31,008 38,016 1.23
oltp_insert 22,016 28,000 1.27
oltp_update_index 48,000 88,992 1.85
oltp_update_non_index 35,968 59,008 1.64
oltp_delete_insert 44,960 70,976 1.58
oltp_write_only 21,984 33,984 1.55
types_delete_insert 25,984 32,032 1.23
oltp_read_write 132,000 175,008 1.33
Average 1.46

10000 rows, single CLI invocation per test, workload-only timing via SQL timestamps.

Performance Ceiling Check (2x)

All tests within ceilings.

Tim and others added 5 commits April 28, 2026 15:51
Compliance + correctness scaffolding for the vendored BLAKE3 portable
reference. Splits into:

  - ext/blake3/LICENSE — dual Apache 2.0 (with LLVM exception) and
    CC0 1.0 license texts, the form upstream BLAKE3 ships under.
    Required by Apache 2.0 §4(a) for redistribution. Includes
    upstream copyright notice (Jack O'Connor and Samuel Neves, 2019)
    and version pointer (1.8.5).

  - ext/blake3/README.md — provenance: which upstream commit, what
    files we vendored verbatim, what we modified (blake3_impl.h SIMD
    decls stripped) and what we wrote ourselves
    (blake3_dispatch_portable.c). Required by Apache 2.0 §4(b).

  - SPDX-License-Identifier headers on all five files in ext/blake3/.
    Upstream-derived files carry "Apache-2.0 WITH LLVM-exception OR
    CC0-1.0"; the doltlite-original dispatch shim carries plain
    Apache-2.0 with our own copyright.

  - test/blake3_kat_test.sh — Known-Answer Test against the vendored
    portable implementation. Six vectors:
      * full 32-byte BLAKE3 of empty input (canonical spec vector)
      * prollyHashCompute (20-byte truncation) of empty, "abc",
        1024 B (1 chunk), 4096 B (4 chunks), 16384 B (16 chunks).
        The chunk-multiple cases exercise BLAKE3's tree-mode where
        compress_subtree_wide kicks in.
    Reference values were computed against upstream libblake3
    (the SIMD-accelerated build that's authoritative-by-construction)
    and cross-checked against the canonical spec vector. All 6 pass
    locally.

  - .github/workflows/test.yml — wires the KAT test into the
    build-and-test job so a future change accidentally breaking the
    hash output (e.g. reckless modification of blake3_portable.c)
    fails CI loudly.

  - LICENSE.md — new section under "non-public-domain code" listing
    the BLAKE3 vendor and pointing at ext/blake3/LICENSE.

No code changes to the BLAKE3 implementation itself — files are
identical to the previous commit, just with SPDX headers prepended.
The KAT confirms the vendored bytes still hash correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that prollyHashCompute uses BLAKE3, the ~90-line in-tree
SHA-512 implementation (sha512_rotr, K512 table, sha512_transform,
sha512_hash, etc.) is unreachable. clang's DCE was already dropping
it from the binary; remove from source so the next reader doesn't
have to wonder which hash is in use.

Net change: -90 lines from prolly_hash.c. No behavior change —
verified via the KAT (6/6 still pass) and the merge oracle (41/41).
Linux's gcc/clang doesn't auto-link libm; macOS does. The KAT
harness links prolly_hash.o which references prollyWeibullCheck →
expm1, so the link was failing on Linux CI with:

  undefined reference to 'expm1'

Add -lm. Local build is unchanged (macOS already finds it).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Sysbench-Style Benchmark (TEXT PK): Doltlite vs SQLite

Companion to the classic Sysbench-Style Benchmark. Every workload here
runs against tables with a 32-char hex TEXT PRIMARY KEY (UUID-shaped).

File-backed sections gated at 3× — in-memory
and autocommit reads are reporting-only.

In-Memory

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 57,024 64,992 1.14
oltp_range_select 35,008 44,032 1.26
oltp_sum_range 22,016 20,992 0.95
oltp_order_range 4,992 6,016 1.21
oltp_distinct_range 6,016 8,000 1.33
oltp_index_scan 7,008 10,016 1.43
select_random_points 28,000 32,032 1.14
select_random_ranges 9,952 12,000 1.21
covering_index_scan 12,000 27,008 2.25
groupby_scan 40,992 48,992 1.20
index_join 10,016 14,016 1.40
index_join_scan 2,976 5,024 1.69
types_table_scan 11,008 13,024 1.18
table_scan 11,008 12,992 1.18
oltp_read_only 240,000 291,968 1.22
Average 1.32

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 3,008 4,000 1.33
oltp_insert 23,008 32,032 1.39
oltp_update_index 55,008 85,024 1.55
oltp_update_non_index 42,016 55,008 1.31
oltp_delete_insert 43,008 68,000 1.58
oltp_write_only 20,000 30,976 1.55
types_delete_insert 24,992 31,008 1.24
oltp_read_write 151,008 229,024 1.52
Average 1.43

File-Backed

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 183,040 103,040 0.56
oltp_range_select 43,008 49,024 1.14
oltp_sum_range 24,000 24,992 1.04
oltp_order_range 5,984 7,008 1.17
oltp_distinct_range 8,000 9,024 1.13
oltp_index_scan 16,032 12,992 0.81
select_random_points 44,992 40,992 0.91
select_random_ranges 29,024 15,968 0.55
covering_index_scan 21,024 30,016 1.43
groupby_scan 40,992 48,992 1.20
index_join 14,016 17,024 1.21
index_join_scan 3,968 4,992 1.26
types_table_scan 11,968 13,984 1.17
table_scan 12,000 13,984 1.17
oltp_read_only 368,032 408,032 1.11
Average 1.06

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 6,016 5,984 0.99
oltp_insert 26,048 36,000 1.38
oltp_update_index 56,992 91,008 1.60
oltp_update_non_index 45,024 58,976 1.31
oltp_delete_insert 44,992 71,968 1.60
oltp_write_only 20,992 32,000 1.52
types_delete_insert 29,952 32,032 1.07
oltp_read_write 167,968 196,000 1.17
Average 1.33

File-Backed (autocommit)

Each statement runs as its own transaction — exposes per-commit
fixed costs that the wrapped-in-BEGIN/COMMIT tests amortize away.

Reads

Reads have no commit cost; these are the same SQL files as the
File-Backed Reads section, included here for symmetry and to
catch any per-statement overhead doltlite pays on the read path.

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 136,992 104,000 0.76
oltp_range_select 43,968 48,032 1.09
oltp_sum_range 32,992 36,960 1.12
oltp_order_range 9,024 10,016 1.11
oltp_distinct_range 8,000 9,952 1.24
oltp_index_scan 15,008 16,000 1.07
select_random_points 40,992 44,000 1.07
select_random_ranges 30,016 22,976 0.77
covering_index_scan 20,992 35,968 1.71
groupby_scan 43,040 48,992 1.14
index_join 16,992 21,952 1.29
index_join_scan 4,032 7,008 1.74
types_table_scan 12,992 14,016 1.08
table_scan 14,016 16,032 1.14
oltp_read_only 391,040 433,024 1.11
Average 1.16

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert_ac 161,952 132,960 0.82
oltp_insert_ac 149,984 138,016 0.92
oltp_update_index_ac 157,024 150,016 0.96
oltp_update_non_index_ac 137,024 135,008 0.99
oltp_delete_insert_ac 150,016 153,984 1.03
oltp_write_only_ac 145,024 146,016 1.01
types_delete_insert_ac 128,000 134,976 1.05
oltp_read_write_ac 154,976 160,000 1.03
Average 0.98

1000 rows, single CLI invocation per test, workload-only timing via SQL timestamps.

Performance Ceiling Check (3x)

All tests within ceilings.

Sysbench-Style Benchmark (BLOB PK): Doltlite vs SQLite

Companion to the classic Sysbench-Style Benchmark. Every workload here
runs against tables with a 16-byte big-endian BLOB PRIMARY KEY.

File-backed sections gated at 3× — in-memory
and autocommit reads are reporting-only.

In-Memory

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 56,000 64,960 1.16
oltp_range_select 35,008 44,992 1.29
oltp_sum_range 14,016 20,000 1.43
oltp_order_range 5,984 7,008 1.17
oltp_distinct_range 7,008 8,000 1.14
oltp_index_scan 7,008 9,984 1.42
select_random_points 28,000 37,024 1.32
select_random_ranges 9,024 12,000 1.33
covering_index_scan 12,992 25,984 2.00
groupby_scan 40,000 48,000 1.20
index_join 10,016 13,984 1.40
index_join_scan 2,976 5,024 1.69
types_table_scan 11,040 12,992 1.18
table_scan 10,976 13,984 1.27
oltp_read_only 271,968 276,000 1.01
Average 1.33

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 3,968 4,992 1.26
oltp_insert 24,032 32,032 1.33
oltp_update_index 53,984 84,000 1.56
oltp_update_non_index 41,984 56,992 1.36
oltp_delete_insert 42,016 69,024 1.64
oltp_write_only 20,000 30,016 1.50
types_delete_insert 24,992 31,040 1.24
oltp_read_write 136,992 194,016 1.42
Average 1.41

File-Backed

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 141,984 126,976 0.89
oltp_range_select 44,000 48,992 1.11
oltp_sum_range 25,984 25,024 0.96
oltp_order_range 6,016 7,008 1.16
oltp_distinct_range 7,968 8,000 1.00
oltp_index_scan 15,008 12,992 0.87
select_random_points 50,016 40,992 0.82
select_random_ranges 18,976 16,032 0.84
covering_index_scan 20,992 30,016 1.43
groupby_scan 42,016 48,000 1.14
index_join 12,992 18,048 1.39
index_join_scan 3,008 5,984 1.99
types_table_scan 12,992 13,984 1.08
table_scan 12,000 14,016 1.17
oltp_read_only 350,048 404,992 1.16
Average 1.13

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 5,984 5,984 1.00
oltp_insert 25,024 35,008 1.40
oltp_update_index 54,976 84,960 1.55
oltp_update_non_index 43,008 56,992 1.33
oltp_delete_insert 44,032 68,992 1.57
oltp_write_only 21,024 31,008 1.47
types_delete_insert 28,992 32,000 1.10
oltp_read_write 159,008 197,024 1.24
Average 1.33

File-Backed (autocommit)

Each statement runs as its own transaction — exposes per-commit
fixed costs that the wrapped-in-BEGIN/COMMIT tests amortize away.

Reads

Reads have no commit cost; these are the same SQL files as the
File-Backed Reads section, included here for symmetry and to
catch any per-statement overhead doltlite pays on the read path.

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 154,016 103,008 0.67
oltp_range_select 44,000 60,000 1.36
oltp_sum_range 24,960 34,016 1.36
oltp_order_range 6,976 7,008 1.00
oltp_distinct_range 8,992 8,992 1.00
oltp_index_scan 15,008 12,992 0.87
select_random_points 36,000 39,968 1.11
select_random_ranges 18,048 15,968 0.88
covering_index_scan 20,992 30,016 1.43
groupby_scan 41,056 48,992 1.19
index_join 13,024 16,032 1.23
index_join_scan 3,968 4,992 1.26
types_table_scan 14,016 13,984 1.00
table_scan 12,000 14,016 1.17
oltp_read_only 357,024 428,000 1.20
Average 1.12

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert_ac 134,016 122,976 0.92
oltp_insert_ac 144,000 137,984 0.96
oltp_update_index_ac 147,008 157,056 1.07
oltp_update_non_index_ac 146,016 136,992 0.94
oltp_delete_insert_ac 135,968 158,016 1.16
oltp_write_only_ac 144,992 150,976 1.04
types_delete_insert_ac 131,008 132,032 1.01
oltp_read_write_ac 154,016 161,024 1.05
Average 1.02

1000 rows, single CLI invocation per test, workload-only timing via SQL timestamps.

Performance Ceiling Check (3x)

All tests within ceilings.

Sysbench-Style Benchmark (composite PK): Doltlite vs SQLite

Companion to the classic Sysbench-Style Benchmark. Every workload here
runs against tables with a 2-column INTEGER PRIMARY KEY(a, b) WITHOUT ROWID.

File-backed sections gated at 3× — in-memory
and autocommit reads are reporting-only.

In-Memory

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 59,968 78,016 1.30
oltp_range_select 43,008 56,000 1.30
oltp_sum_range 20,992 28,960 1.38
oltp_order_range 5,984 7,008 1.17
oltp_distinct_range 7,968 8,992 1.13
oltp_index_scan 4,960 4,992 1.01
select_random_points 46,976 55,008 1.17
select_random_ranges 12,000 15,008 1.25
covering_index_scan 12,992 25,024 1.93
groupby_scan 45,984 54,016 1.17
index_join 12,032 17,024 1.41
index_join_scan 2,976 6,016 2.02
types_table_scan 11,040 12,992 1.18
table_scan 11,008 14,016 1.27
oltp_read_only 260,992 384,032 1.47
Average 1.34

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 4,000 4,992 1.25
oltp_insert 27,008 30,976 1.15
oltp_update_index 59,008 83,968 1.42
oltp_update_non_index 50,976 58,016 1.14
oltp_delete_insert 44,960 68,000 1.51
oltp_write_only 20,992 30,016 1.43
types_delete_insert 24,032 31,968 1.33
oltp_read_write 155,008 248,992 1.61
Average 1.35

File-Backed

Reads

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 163,040 145,024 0.89
oltp_range_select 51,040 59,040 1.16
oltp_sum_range 51,008 32,992 0.65
oltp_order_range 7,008 8,000 1.14
oltp_distinct_range 12,032 9,024 0.75
oltp_index_scan 12,000 7,008 0.58
select_random_points 56,992 71,968 1.26
select_random_ranges 22,016 35,008 1.59
covering_index_scan 20,992 28,000 1.33
groupby_scan 45,984 55,008 1.20
index_join 17,024 19,008 1.12
index_join_scan 4,000 6,016 1.50
types_table_scan 12,992 13,984 1.08
table_scan 12,000 15,008 1.25
oltp_read_only 403,040 444,992 1.10
Average 1.11

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert 5,984 6,016 1.01
oltp_insert 26,976 36,992 1.37
oltp_update_index 61,984 85,984 1.39
oltp_update_non_index 52,000 64,960 1.25
oltp_delete_insert 45,984 70,016 1.52
oltp_write_only 22,016 31,968 1.45
types_delete_insert 24,992 32,032 1.28
oltp_read_write 157,024 220,992 1.41
Average 1.33

File-Backed (autocommit)

Each statement runs as its own transaction — exposes per-commit
fixed costs that the wrapped-in-BEGIN/COMMIT tests amortize away.

Reads

Reads have no commit cost; these are the same SQL files as the
File-Backed Reads section, included here for symmetry and to
catch any per-statement overhead doltlite pays on the read path.

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 140,992 108,992 0.77
oltp_range_select 52,032 88,992 1.71
oltp_sum_range 32,000 32,992 1.03
oltp_order_range 7,008 8,000 1.14
oltp_distinct_range 8,000 9,952 1.24
oltp_index_scan 12,992 7,936 0.61
select_random_points 54,976 60,992 1.11
select_random_ranges 21,952 20,000 0.91
covering_index_scan 21,024 27,040 1.29
groupby_scan 46,016 55,008 1.20
index_join 16,992 20,000 1.18
index_join_scan 4,000 5,984 1.50
types_table_scan 12,000 13,024 1.09
table_scan 12,000 14,016 1.17
oltp_read_only 435,008 408,000 0.94
Average 1.13

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert_ac 138,944 137,984 0.99
oltp_insert_ac 142,976 140,000 0.98
oltp_update_index_ac 144,000 154,016 1.07
oltp_update_non_index_ac 127,968 131,008 1.02
oltp_delete_insert_ac 138,016 141,984 1.03
oltp_write_only_ac 135,008 143,008 1.06
types_delete_insert_ac 132,032 134,976 1.02
oltp_read_write_ac 154,976 163,008 1.05
Average 1.03

1000 rows, single CLI invocation per test, workload-only timing via SQL timestamps.

Performance Ceiling Check (3x)

All tests within ceilings.

Sysbench-Style Benchmark (autocommit): Doltlite vs SQLite

Moved out of the classic benchmark job so per-commit costs report separately.

File-Backed (autocommit)

Each statement runs as its own transaction — exposes per-commit
fixed costs that the wrapped-in-BEGIN/COMMIT tests amortize away.

Reads

Reads have no commit cost; these are the same SQL files as the
File-Backed Reads section, included here for symmetry and to
catch any per-statement overhead doltlite pays on the read path.

Test SQLite (us) Doltlite (us) Multiplier
oltp_point_select 130,976 107,040 0.82
oltp_range_select 40,992 45,984 1.12
oltp_sum_range 21,024 21,024 1.00
oltp_order_range 6,016 7,008 1.16
oltp_distinct_range 7,008 8,960 1.28
oltp_index_scan 15,008 12,992 0.87
select_random_points 33,024 48,000 1.45
select_random_ranges 17,024 13,984 0.82
covering_index_scan 20,992 26,016 1.24
groupby_scan 52,032 61,984 1.19
index_join 10,976 12,000 1.09
index_join_scan 3,968 6,016 1.52
types_table_scan 12,000 13,984 1.17
table_scan 119,008 135,008 1.13
oltp_read_only 351,008 353,984 1.01
Average 1.12

Writes

Test SQLite (us) Doltlite (us) Multiplier
oltp_bulk_insert_ac 131,008 124,992 0.95
oltp_insert_ac 139,008 146,016 1.05
oltp_update_index_ac 142,976 164,000 1.15
oltp_update_non_index_ac 131,008 129,056 0.99
oltp_delete_insert_ac 140,000 148,992 1.06
oltp_write_only_ac 144,992 143,040 0.99
types_delete_insert_ac 128,992 125,024 0.97
oltp_read_write_ac 148,000 166,976 1.13
Average 1.04

10000 rows, single CLI invocation per test, workload-only timing via SQL timestamps.

The hand-crafted manifest in Guard 22 (sparse >2GiB open test) hard-
coded VERSION=10. The BLAKE3 chunk-hash change bumped CHUNK_STORE_VERSION
to 11, so the test built a v10 manifest that this branch's csReadManifest
correctly rejects with SQLITE_NOTADB ("file is not a database").

Parse CHUNK_STORE_VERSION out of src/chunk_store.h at test time and pass
it to the perl manifest writer. The test now tracks format-version bumps
without further edits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsehn timsehn merged commit 52d3c07 into master May 6, 2026
8 checks passed
mannyrivera2010 pushed a commit to mannyrivera2010/doltlite that referenced this pull request May 7, 2026
Vendors the upstream 1.8.5 SIMD source set (SSE2/SSE4.1/AVX2/AVX-512
on x86, NEON on aarch64) and replaces the portable-only shim with
upstream's runtime CPU-feature dispatcher. main.mk picks which SIMD
.c files to compile by inspecting `$(B.cc) -dumpmachine`, so wasm32
cross-compiles still produce a portable-only binary.

Followup from dolthub#667. Microbench on Apple M-series shows
NEON at 2.2x portable (770 MB/s -> 1700 MB/s for 16 KB inputs);
x86 should see a similar lift via AVX-512 / AVX2.

Per-file -msse2/-msse4.1/-mavx2/-mavx512f -mavx512vl flags are
scoped to the matching .c files; the rest of the tree builds at the
baseline ISA. Runtime dispatch never calls a backend the CPU
doesn't advertise.

Test plan
- bash test/blake3_kat_test.sh -> 6/6 KAT vectors pass on the NEON
  build, byte-identical to portable
- 10K-row insert + branch + merge smoke test produces correct row
  count and commit count
- Representative oracle suites (branch/diff/merge/log/commit_ancestors)
  pass with no regressions vs master

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsehn timsehn deleted the perf/blake3-chunk-hash branch May 12, 2026 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant