Benchmarks

What this page covers: a like-for-like comparison of every backend running the same workload (bulk + single-op writes, point reads, indexed queries, full scans, offset vs keyset pagination, count/versions, a concurrent read/write phase, and a heap-footprint sample) at two dataset sizes, plus a per-backend recommendation and the caveats you must read before trusting any number.

⚠️ Read the caveats first. These are numbers from one machine, with the server backends on Docker over localhost. Each value is the median of 3 measured iterations (after 1 warm-up), which removes most cold-start noise — but treat them as relative guidance, not absolute SLAs.

How it's measured

The numbers come from the opt-in benchmarkSuite() in AbstractStorageStressTest (tag benchmark), which times every phase with System.nanoTime() on a fresh storage per iteration and reports the median. Run it per backend:

# one backend, default config (sizes 1000,10000 · 3 iterations · 1 warm-up · concurrency 8)
.\gradlew :core:test -PrunBenchmark --tests "*H2StorageStressTest.benchmarkSuite"

Tunables (system properties): -Dbench.sizes=1000,10000,100000, -Dbench.iterations=5, -Dbench.warmups=2, -Dbench.concurrency=16. Each run also writes core/build/benchmarks/<backend>.csv.

💡 100k+ is opt-in via -Dbench.sizes. It's fine on the fast backends, but the file backends do O(N) full scans per query and rewrite-on-write, so 100k there is very slow — don't run it casually.

Throughput @ 10k records (ops/second, higher is better)

Backend	Bulk insert	Single `save`	Bulk update	`delete`	`find` by id	Concurrent r/w (8 threads)
InMemory	390,552	330,524	310,800	635,526	866,476	475,975
H2 (embedded)	105,043	26,509	47,354	36,459	52,005	98,201
MongoDB	15,733	1,335	15,601	491	1,541	8,803
PostgreSQL	1,758	583	1,738	589	1,862	3,608
MariaDB	795	1,591	815	826	1,687	6,767
LocalFile	1,836	1,232	1,959	4,576	7,637	3,893
GroupedFile	1,933	1,318	1,833	3,057	8,620	3,223

Bulk insert = full 10k via saveAll batches of 1000. Single save/delete are a 200-op sample; find/update/concurrent are 1000/1000/2000-op samples. Concurrent r/w partitions keys per thread (no write conflicts) — it measures throughput under load + pooling, not conflict handling.

Latency @ 10k records (milliseconds, lower is better)

Backend	`count()`	`findMany` (1k)	`versions` (1k)	Indexed query	Full scan (`all`)	Offset page (deep)	Keyset page
InMemory	0.01	1.16	0.12	8.4	12.3	17.8	20.3
H2 (embedded)	0.33	5.39	4.22	8.9	10.0	4.4	5.7
MongoDB	4.06	10.7	5.67	45.8	69.5	14.6	9.7
PostgreSQL	1.69	4.49	2.71	12.2	20.2	10.7	2.9
MariaDB	2.76	6.75	4.86	16.8	24.7	34.7	5.0
LocalFile	7.78	20.8	92.8	660	498	548	454
GroupedFile	1,103	21.5	76.8	1,007	1,055	963	994

What the numbers say

Keyset pagination beats offset on indexed servers — by a lot. Deep offset paging scans and discards the prefix; keyset (queryAfter) seeks. MariaDB: 34.7 ms → 5.0 ms (~7×), PostgreSQL: 10.7 ms → 2.9 ms (~3.7×), Mongo 14.6 → 9.7. On the scan-based backends (InMemory, LocalFile, GroupedFile) the two are equal — there's no index to seek, so both scan. Use keyset for deep pages on SQL/Mongo.
Batch your writes. On durable/networked backends, per-op writes pay a per-op commit/round-trip: PostgreSQL does 1,758 ops/s bulk but only 583 single save/s; Mongo 15,733 bulk vs 1,335 single. Embedded backends barely care. saveAll is not just convenience — it's the difference.
Concurrency + pooling lifts the SQL servers. MariaDB does 1,687 single-threaded find/s but 6,767 ops/s under 8 concurrent threads (the HikariCP pool, capped at 5 here, parallelises the round-trips). Embedded scales hugely (InMemory 866k → still 476k mixed r/w; H2 ~98k).
File backends: great by key, terrible by query. find by id is fast (LocalFile 7,637, GroupedFile 8,620 ops/s — a direct file read), but every query is a full scan: LocalFile ~0.5–0.7 s, GroupedFile ~1 s at 10k. They have no real index.
count() is not always cheap. It's ~instant on InMemory/SQL and fast on LocalFile (counts files), but GroupedFile pays ~1.1 s — it must parse the whole group. Know your backend before polling counts.
MongoDB bulk writes are the server highlight (~15.7k insert and update ops/s), but its single delete is slow (491 ops/s) — prefer batched writes there.

Scaling 1k → 10k

The full-scan vs indexed gap is the headline. Indexed-query latency (ms) as the dataset grows 10×:

Backend	query @ 1k	query @ 10k	growth
MariaDB	2.4	16.8	~7×
PostgreSQL	2.3	12.2	~5×
H2	2.0	8.9	~4×
LocalFile	61.9	660	~11× (linear scan)
GroupedFile	97.0	1,007	~10× (linear scan)

File backends grow linearly with the data (O(N) scan); SQL grows sub-linearly (index + result size). The gap widens with every record — file backends are for small or key-addressed collections.

Memory (heap delta after loading 10k)

Only the in-process backends hold the data in the JVM heap: InMemory ≈ 10.4 MB, H2 ≈ 10.1 MB for 10k TestPlayers. LocalFile/GroupedFile keep ~2 MB transient buffers; the server backends store off-heap/remote (≈ 0 heap). Strong-referenced caches (the manager layer) add to this — see Cache Policies & Freshness. (Heap sampling uses a GC hint; treat it as indicative.)

Pick-a-backend cheat sheet

Backend	Reach for it when…	Avoid when…
InMemory	tests, ephemeral caches, tiny hot sets	you need persistence
H2 (embedded)	single-process apps, dev, small/medium persistent data with no server	multiple processes share the data, or you outgrow one machine
MongoDB	large, write-heavy, document data; multiple instances; bulk ingestion	you do many tiny single deletes, or need SQL semantics
PostgreSQL	the relational default — balanced writes/reads, keyset paging shines, full feature set (tx, optimistic locking, `LISTEN/NOTIFY` change feed)	you can't run a server
MySQL / MariaDB	ubiquity / existing infra; read- and concurrency-heavy via pooling	bulk-insert throughput is critical (slowest here), or you need the push change-feed (not in v1)
LocalFile	human-readable, hand-editable per-entity files; small, key-addressed collections	the collection is large or queried (full-scan reads)
GroupedFile	small datasets grouped one-file-per-key	large data, frequent `count()`, or query-heavy use (full scans + parse)

See Choosing a Backend for the feature-by-feature (non-performance) comparison.

Caveats — please read

One machine, one config. Windows 11, JDK 25, default pool sizes (SQL pool max 5), default Docker limits. The ranking and orders of magnitude are the takeaway, not the milliseconds.
Server backends ran on Docker over localhost. MariaDB / PostgreSQL / MongoDB pay a loopback round-trip per call; a co-located prod DB might pay less, a remote one more.
Median of 3 (after 1 warm-up). Better than a single cold run, but it's a median, not p95/p99 — it won't show tail latency or GC spikes.
Single-op samples are small (200–1000 ops) to keep file backends tractable; the concurrent phase partitions keys per thread, so it does not exercise write-conflict handling.
Reproduce it yourself: -PrunBenchmark (see How it's measured) and compare the CSVs — absolute numbers are environment-specific.

Possible further work

The benchmark now covers warm-up + median, per-op latency, concurrency, multiple sizes, pagination (offset vs keyset), count/versions/findMany, a memory sample, and CSV export. Natural next steps:

Percentiles (p95/p99), not just the median, to surface tail latency.
A contended workload (many threads hitting the same keys) to measure optimistic-lock conflict and retry cost — the current concurrent phase deliberately avoids conflicts.
Remote (non-localhost) servers and 100k+ as routine sizes, to model production latency.
JSON export alongside CSV, for dashboards/regression tracking over time.

Benchmarks

Benchmarks

How it's measured

Throughput @ 10k records (ops/second, higher is better)

Latency @ 10k records (milliseconds, lower is better)

What the numbers say

Scaling 1k → 10k

Memory (heap delta after loading 10k)

Pick-a-backend cheat sheet

Caveats — please read

Possible further work

See also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EveryDatabase

Clone this wiki locally