Skip to content

Benchmarks

Petrus Pradella edited this page Jun 26, 2026 · 3 revisions

Benchmarks

What this page covers: a like-for-like throughput comparison of every backend running the same 10,000-record stress scenario (bulk insert → full CRUD → the complete index/query matrix → bulk update → deletes), plus a per-backend recommendation and the caveats you must read before trusting any of these numbers.

⚠️ Read the caveats first. These are single-run, no-warmup numbers from one machine, with the server backends on Docker over localhost. Treat them as relative guidance ("Postgres out-inserted MariaDB ~2× here", "file backends full-scan queries"), not as absolute SLAs.

The scenario is the shared AbstractStorageStressTest — one cumulative test per backend, run with:

.\gradlew :core:test --tests "*StorageStressTest"   # all backends (needs Docker for the server ones)

Results (10,000 records)

Each backend ran the identical scenario. Both the millisecond times and the ops/second throughput come straight from the suite's report, which times each phase with System.nanoTime().

Write throughput

Backend Insert 10k Bulk saveAll (10,334) Update 1,001 Delete 101 Total run
InMemory 162 ms · ~62,000/s 80 ms · ~129,000/s 9 ms 2 ms 0.64 s
H2 (embedded) 297 ms · ~34,000/s 243 ms · ~42,000/s 20 ms 12 ms 0.97 s
MongoDB 814 ms · ~12,000/s 896 ms · ~11,500/s 96 ms 246 ms 2.83 s
PostgreSQL 5,558 ms · ~1,800/s 5,554 ms · ~1,860/s 558 ms 179 ms 12.26 s
MariaDB 12,362 ms · ~810/s 12,366 ms · ~840/s 1,185 ms 143 ms 26.53 s
LocalFile 5,165 ms · ~1,940/s 12,165 ms · ~850/s 2,162 ms 979 ms 34.17 s
GroupedFile 5,497 ms · ~1,820/s 22,746 ms · ~450/s 4,806 ms 5,674 ms 68.39 s

Read / query latency

The suite runs 27 indexed queries (score / boolean / world / compound-AND / timestamp ranges). Lower is better.

Backend Avg per query Notes
H2 (embedded) 5.1 ms real B-tree index
InMemory 5.8 ms in-memory Map index
PostgreSQL 5.9 ms real B-tree index
MariaDB 7.1 ms real B-tree index
MongoDB 16.5 ms native index; localhost round-trip per query
LocalFile 472 ms ⚠️ full scan — deserializes every file, every query
GroupedFile 1,013 ms ⚠️ full scan — parses the whole group, every query

What the numbers say

  • Embedded backends win on raw speed. InMemory and H2 have no network and (InMemory) no fsync, so they're an order of magnitude ahead. H2 is the fastest persistent option here.
  • MongoDB is the strongest server backend for writes. Its bulkWrite pushed ~10–12k records/s — far ahead of the SQL servers in this run — while keeping indexed queries fast.
  • PostgreSQL clearly out-wrote MariaDB here (~2× on bulk insert: 5.6 s vs 12.4 s). Both have fast indexed reads; the gap is on write throughput.
  • Real indexes matter enormously for queries. SQL and Mongo answer indexed queries in single-digit to ~18 ms because they maintain B-tree _idx_ columns/fields. The file backends have no real index — every query is a full scan (500 ms / 1,000 ms per query at 10k), and that gap grows linearly with the dataset.
  • GroupedFile is the slowest at scale. Its write/delete rewrite the whole group file, so saveAll (22.7 s) and the 101 deletes (~56 ms each) dominate. It's built for small grouped datasets, not 10k-row churn.

Pick-a-backend cheat sheet

Backend Reach for it when… Avoid when…
InMemory tests, ephemeral caches, tiny hot sets you need persistence
H2 (embedded) single-process apps, dev, small/medium persistent data with no server multiple processes share the data, or you outgrow one machine
MongoDB large, write-heavy, document-shaped data; multiple instances you need SQL/relational semantics (tx/change-streams need a replica set)
PostgreSQL the relational server default — strong writes and indexed reads, full feature set (tx, optimistic locking, LISTEN/NOTIFY change feed) you can't run a server
MySQL / MariaDB ubiquity / existing infra, moderate volumes bulk-write throughput is critical (slower here) or you need the push change-feed (not in v1)
LocalFile human-readable, hand-editable per-entity files; small config-like collections the collection is large or queried (full-scan reads)
GroupedFile small datasets where one file per group is convenient write-heavy or large data (every write rewrites the group)

See Choosing a Backend for the feature-by-feature (non-performance) comparison.


Caveats — please read

  1. Single run, no JVM warmup. No JIT warm-up or repeated iterations; cold-start noise inflates the fast backends especially. Don't compare two backends that are within ~2× of each other here.
  2. Server backends ran on Docker over localhost. MariaDB / PostgreSQL / MongoDB pay a loopback round-trip per call that a co-located production DB might not, and a remote one would pay more.
  3. One machine, one config. Windows 11, JDK 25, default pool sizes, default Docker resource limits. Absolute numbers are environment-specific — the ranking and orders of magnitude are the takeaway, not the milliseconds.
  4. 10k records. File-backend query cost is O(N) (full scan); the SQL/Mongo advantage on reads widens at larger datasets and narrows at tiny ones.

Worth adding to the benchmark (future work)

The ops/s clamp has since been fixed — the suite now times every phase with System.nanoTime() and reports throughput directly (no 1-second floor), which is why the numbers above are taken straight from its report. Remaining candidates to make the benchmark more trustworthy and representative:

  1. Warm-up + multiple iterations, reporting the median, to remove JIT/cold-cache noise.
  2. Per-record latency for single save() / find() (not just batch saveAll) — exposes the transaction-per-op and per-file-fsync cost that batching hides.
  3. Concurrent workload (parallel readers/writers) to exercise the async API, connection pooling and the virtual-thread executor — closer to real server load.
  4. Multiple dataset sizes (1k / 10k / 100k) to show the O(N) full-scan vs O(log N) index curve instead of asserting it.
  5. find / findMany throughput in the summary (currently only a phase-percentage line).
  6. Pagination, count(), and versions() (the cache-sync poll) throughput — all are real hot paths not covered today.
  7. Memory footprint sampling for InMemory and the manager cache.
  8. Machine-readable export (CSV/JSON) so runs can be diffed for regressions over time.

See also

Clone this wiki locally