Skip to content

Benchmarks

Petrus Pradella edited this page Jun 26, 2026 · 3 revisions

Benchmarks

What this page covers: a like-for-like throughput comparison of every backend running the same 10,000-record stress scenario (bulk insert → full CRUD → the complete index/query matrix → bulk update → deletes), plus a per-backend recommendation and the caveats you must read before trusting any of these numbers.

⚠️ Read the caveats first. These are single-run, no-warmup numbers from one machine, with the server backends on Docker over localhost. Treat them as relative guidance ("Postgres out-inserted MariaDB ~2× here", "file backends full-scan queries"), not as absolute SLAs.

The scenario is the shared AbstractStorageStressTest — one cumulative test per backend, run with:

.\gradlew :core:test --tests "*StorageStressTest"   # all backends (needs Docker for the server ones)

Results (10,000 records)

Each backend ran the identical scenario. Times are wall-clock milliseconds straight from the suite's report; the throughput columns are computed from those raw times (see the ops/s note).

Write throughput

Backend Insert 10k Bulk saveAll (10,334) Update 1,001 Delete 101 Total run
InMemory 178 ms · ~56,000/s 89 ms · ~116,000/s 9 ms 3 ms 0.66 s
H2 (embedded) 318 ms · ~31,000/s 247 ms · ~42,000/s 20 ms 16 ms 1.04 s
MongoDB 956 ms · ~10,500/s 890 ms · ~11,600/s 110 ms 288 ms 3.11 s
PostgreSQL 5,612 ms · ~1,780/s 5,530 ms · ~1,870/s 545 ms 181 ms 12.30 s
MariaDB 12,365 ms · ~810/s 12,267 ms · ~840/s 1,185 ms 140 ms 26.46 s
LocalFile 5,904 ms · ~1,690/s 12,226 ms · ~850/s 2,181 ms 1,006 ms 35.88 s
GroupedFile 5,431 ms · ~1,840/s 23,661 ms · ~440/s 5,063 ms 5,982 ms 69.49 s

Read / query latency

The suite runs 27 indexed queries (score / boolean / world / compound-AND / timestamp ranges). Lower is better.

Backend Avg per query Notes
InMemory 5.3 ms in-memory Map index
H2 (embedded) 5.4 ms real B-tree index
PostgreSQL 6.1 ms real B-tree index
MariaDB 7.7 ms real B-tree index
MongoDB 18.0 ms native index; localhost round-trip per query
LocalFile 500.6 ms ⚠️ full scan — deserializes every file, every query
GroupedFile 1,004.3 ms ⚠️ full scan — parses the whole group, every query

What the numbers say

  • Embedded backends win on raw speed. InMemory and H2 have no network and (InMemory) no fsync, so they're an order of magnitude ahead. H2 is the fastest persistent option here.
  • MongoDB is the strongest server backend for writes. Its bulkWrite pushed ~10–12k records/s — far ahead of the SQL servers in this run — while keeping indexed queries fast.
  • PostgreSQL clearly out-wrote MariaDB here (~2× on bulk insert: 5.6 s vs 12.4 s). Both have fast indexed reads; the gap is on write throughput.
  • Real indexes matter enormously for queries. SQL and Mongo answer indexed queries in single-digit to ~18 ms because they maintain B-tree _idx_ columns/fields. The file backends have no real index — every query is a full scan (500 ms / 1,000 ms per query at 10k), and that gap grows linearly with the dataset.
  • GroupedFile is the slowest at scale. Its write/delete rewrite the whole group file, so saveAll (23.7 s) and the 101 deletes (~59 ms each) dominate. It's built for small grouped datasets, not 10k-row churn.

Pick-a-backend cheat sheet

Backend Reach for it when… Avoid when…
InMemory tests, ephemeral caches, tiny hot sets you need persistence
H2 (embedded) single-process apps, dev, small/medium persistent data with no server multiple processes share the data, or you outgrow one machine
MongoDB large, write-heavy, document-shaped data; multiple instances you need SQL/relational semantics (tx/change-streams need a replica set)
PostgreSQL the relational server default — strong writes and indexed reads, full feature set (tx, optimistic locking, LISTEN/NOTIFY change feed) you can't run a server
MySQL / MariaDB ubiquity / existing infra, moderate volumes bulk-write throughput is critical (slower here) or you need the push change-feed (not in v1)
LocalFile human-readable, hand-editable per-entity files; small config-like collections the collection is large or queried (full-scan reads)
GroupedFile small datasets where one file per group is convenient write-heavy or large data (every write rewrites the group)

See Choosing a Backend for the feature-by-feature (non-performance) comparison.


Caveats — please read

  1. Single run, no JVM warmup. No JIT warm-up or repeated iterations; cold-start noise inflates the fast backends especially. Don't compare two backends that are within ~2× of each other here.
  2. Server backends ran on Docker over localhost. MariaDB / PostgreSQL / MongoDB pay a loopback round-trip per call that a co-located production DB might not, and a remote one would pay more.
  3. One machine, one config. Windows 11, JDK 25, default pool sizes, default Docker resource limits. Absolute numbers are environment-specific — the ranking and orders of magnitude are the takeaway, not the milliseconds.
  4. 10k records. File-backend query cost is O(N) (full scan); the SQL/Mongo advantage on reads widens at larger datasets and narrows at tiny ones.
  5. The suite's printed ops/s is clamped and misleading for fast backends. It computes records / max(1.0, seconds), so any sub-second phase reports ops/s == record count (e.g. a 178 ms insert prints "10,000 ops/s"). The throughput figures on this page are recomputed from the raw millisecond timings instead.

Worth adding to the benchmark (future work)

While compiling this page, a few gaps in AbstractStorageStressTest stood out — candidates to make the benchmark more trustworthy and more representative:

  1. Fix the ops/s clamp — use System.nanoTime() and stop flooring the denominator at 1 s, so fast backends report real throughput instead of the record count.
  2. Warm-up + multiple iterations, reporting the median, to remove JIT/cold-cache noise.
  3. Per-record latency for single save() / find() (not just batch saveAll) — exposes the transaction-per-op and per-file-fsync cost that batching hides.
  4. Concurrent workload (parallel readers/writers) to exercise the async API, connection pooling and the virtual-thread executor — closer to real server load.
  5. Multiple dataset sizes (1k / 10k / 100k) to show the O(N) full-scan vs O(log N) index curve instead of asserting it.
  6. find / findMany throughput in the summary (currently only a phase-percentage line).
  7. Pagination, count(), and versions() (the cache-sync poll) throughput — all are real hot paths not covered today.
  8. Memory footprint sampling for InMemory and the manager cache.
  9. Machine-readable export (CSV/JSON) so runs can be diffed for regressions over time.

See also

Clone this wiki locally