Skip to content

Benchmarks

Petrus Pradella edited this page Jun 26, 2026 · 3 revisions

Benchmarks

What this page covers: a like-for-like comparison of every backend running the same workload (bulk + single-op writes, point reads, indexed queries, full scans, offset vs keyset pagination, count/versions, a concurrent read/write phase, and a heap-footprint sample) at two dataset sizes, plus a per-backend recommendation and the caveats you must read before trusting any number.

⚠️ Read the caveats first. These are numbers from one machine, with the server backends on Docker over localhost. Each value is the median of 3 measured iterations (after 1 warm-up), which removes most cold-start noise — but treat them as relative guidance, not absolute SLAs.

How it's measured

The numbers come from the opt-in benchmarkSuite() in AbstractStorageStressTest (tag benchmark), which times every phase with System.nanoTime() on a fresh storage per iteration and reports the median. Run it per backend:

# one backend, default config (sizes 1000,10000 · 3 iterations · 1 warm-up · concurrency 8)
.\gradlew :core:test -PrunBenchmark --tests "*H2StorageStressTest.benchmarkSuite"

Tunables (system properties): -Dbench.sizes=1000,10000,100000, -Dbench.iterations=5, -Dbench.warmups=2, -Dbench.concurrency=16. Each run also writes core/build/benchmarks/<backend>.csv.

💡 100k+ is opt-in via -Dbench.sizes. It's fine on the fast backends, but the file backends do O(N) full scans per query and rewrite-on-write, so 100k there is very slow — don't run it casually.


Throughput @ 10k records (ops/second, higher is better)

Backend Bulk insert Single save Bulk update delete find by id Concurrent r/w (8 threads)
InMemory 390,552 330,524 310,800 635,526 866,476 475,975
H2 (embedded) 105,043 26,509 47,354 36,459 52,005 98,201
MongoDB 15,733 1,335 15,601 491 1,541 8,803
PostgreSQL 1,758 583 1,738 589 1,862 3,608
MariaDB 795 1,591 815 826 1,687 6,767
LocalFile 1,836 1,232 1,959 4,576 7,637 3,893
GroupedFile 1,933 1,318 1,833 3,057 8,620 3,223

Bulk insert = full 10k via saveAll batches of 1000. Single save/delete are a 200-op sample; find/update/concurrent are 1000/1000/2000-op samples. Concurrent r/w partitions keys per thread (no write conflicts) — it measures throughput under load + pooling, not conflict handling.

Latency @ 10k records (milliseconds, lower is better)

Backend count() findMany (1k) versions (1k) Indexed query Full scan (all) Offset page (deep) Keyset page
InMemory 0.01 1.16 0.12 8.4 12.3 17.8 20.3
H2 (embedded) 0.33 5.39 4.22 8.9 10.0 4.4 5.7
MongoDB 4.06 10.7 5.67 45.8 69.5 14.6 9.7
PostgreSQL 1.69 4.49 2.71 12.2 20.2 10.7 2.9
MariaDB 2.76 6.75 4.86 16.8 24.7 34.7 5.0
LocalFile 7.78 20.8 92.8 660 498 548 454
GroupedFile 1,103 21.5 76.8 1,007 1,055 963 994

What the numbers say

  • Keyset pagination beats offset on indexed servers — by a lot. Deep offset paging scans and discards the prefix; keyset (queryAfter) seeks. MariaDB: 34.7 ms → 5.0 ms (~7×), PostgreSQL: 10.7 ms → 2.9 ms (~3.7×), Mongo 14.6 → 9.7. On the scan-based backends (InMemory, LocalFile, GroupedFile) the two are equal — there's no index to seek, so both scan. Use keyset for deep pages on SQL/Mongo.
  • Batch your writes. On durable/networked backends, per-op writes pay a per-op commit/round-trip: PostgreSQL does 1,758 ops/s bulk but only 583 single save/s; Mongo 15,733 bulk vs 1,335 single. Embedded backends barely care. saveAll is not just convenience — it's the difference.
  • Concurrency + pooling lifts the SQL servers. MariaDB does 1,687 single-threaded find/s but 6,767 ops/s under 8 concurrent threads (the HikariCP pool, capped at 5 here, parallelises the round-trips). Embedded scales hugely (InMemory 866k → still 476k mixed r/w; H2 ~98k).
  • File backends: great by key, terrible by query. find by id is fast (LocalFile 7,637, GroupedFile 8,620 ops/s — a direct file read), but every query is a full scan: LocalFile ~0.5–0.7 s, GroupedFile ~1 s at 10k. They have no real index.
  • count() is not always cheap. It's ~instant on InMemory/SQL and fast on LocalFile (counts files), but GroupedFile pays ~1.1 s — it must parse the whole group. Know your backend before polling counts.
  • MongoDB bulk writes are the server highlight (~15.7k insert and update ops/s), but its single delete is slow (491 ops/s) — prefer batched writes there.

Scaling 1k → 10k

The full-scan vs indexed gap is the headline. Indexed-query latency (ms) as the dataset grows 10×:

Backend query @ 1k query @ 10k growth
MariaDB 2.4 16.8 ~7×
PostgreSQL 2.3 12.2 ~5×
H2 2.0 8.9 ~4×
LocalFile 61.9 660 ~11× (linear scan)
GroupedFile 97.0 1,007 ~10× (linear scan)

File backends grow linearly with the data (O(N) scan); SQL grows sub-linearly (index + result size). The gap widens with every record — file backends are for small or key-addressed collections.

Memory (heap delta after loading 10k)

Only the in-process backends hold the data in the JVM heap: InMemory ≈ 10.4 MB, H2 ≈ 10.1 MB for 10k TestPlayers. LocalFile/GroupedFile keep ~2 MB transient buffers; the server backends store off-heap/remote (≈ 0 heap). Strong-referenced caches (the manager layer) add to this — see Cache Policies & Freshness. (Heap sampling uses a GC hint; treat it as indicative.)


Pick-a-backend cheat sheet

Backend Reach for it when… Avoid when…
InMemory tests, ephemeral caches, tiny hot sets you need persistence
H2 (embedded) single-process apps, dev, small/medium persistent data with no server multiple processes share the data, or you outgrow one machine
MongoDB large, write-heavy, document data; multiple instances; bulk ingestion you do many tiny single deletes, or need SQL semantics
PostgreSQL the relational default — balanced writes/reads, keyset paging shines, full feature set (tx, optimistic locking, LISTEN/NOTIFY change feed) you can't run a server
MySQL / MariaDB ubiquity / existing infra; read- and concurrency-heavy via pooling bulk-insert throughput is critical (slowest here), or you need the push change-feed (not in v1)
LocalFile human-readable, hand-editable per-entity files; small, key-addressed collections the collection is large or queried (full-scan reads)
GroupedFile small datasets grouped one-file-per-key large data, frequent count(), or query-heavy use (full scans + parse)

See Choosing a Backend for the feature-by-feature (non-performance) comparison.


Caveats — please read

  1. One machine, one config. Windows 11, JDK 25, default pool sizes (SQL pool max 5), default Docker limits. The ranking and orders of magnitude are the takeaway, not the milliseconds.
  2. Server backends ran on Docker over localhost. MariaDB / PostgreSQL / MongoDB pay a loopback round-trip per call; a co-located prod DB might pay less, a remote one more.
  3. Median of 3 (after 1 warm-up). Better than a single cold run, but it's a median, not p95/p99 — it won't show tail latency or GC spikes.
  4. Single-op samples are small (200–1000 ops) to keep file backends tractable; the concurrent phase partitions keys per thread, so it does not exercise write-conflict handling.
  5. Reproduce it yourself: -PrunBenchmark (see How it's measured) and compare the CSVs — absolute numbers are environment-specific.

Possible further work

The benchmark now covers warm-up + median, per-op latency, concurrency, multiple sizes, pagination (offset vs keyset), count/versions/findMany, a memory sample, and CSV export. Natural next steps:

  • Percentiles (p95/p99), not just the median, to surface tail latency.
  • A contended workload (many threads hitting the same keys) to measure optimistic-lock conflict and retry cost — the current concurrent phase deliberately avoids conflicts.
  • Remote (non-localhost) servers and 100k+ as routine sizes, to model production latency.
  • JSON export alongside CSV, for dashboards/regression tracking over time.

See also

Clone this wiki locally