Benchmark suite for table unions #694

recursion-ninja · 2025-04-28T12:20:58Z

Benchmarks for table unions.

Overview

Here is a summary of the table union benchmark. Note that all function calls are made throught the Database.LSMTree.Simple API.

Phase 1: Setup

The benchmark will setup an initial set of tables to be unioned together during "Phase 2." The number of tables create is user specified via the --tableCount command line option with a default value of 10 tables.

The size of each generated table is the same and is user specified via the --initial-size command line option with a default value of 1_000_000 entries. Each created table has $initial-size insertions operations performed on it before being written out to disk as a snapshot. The $initial-size$ inserted entries in each table are randomly selected from the range following range:

$$\left[\quad 0,\quad 2 * initialSize \quad\right)$$

Additionally, the directory in which to isolate the benchmark environment is specified via the --bench-dir command line option, with a default of _union_wp16. The table snapshots are saved here along with the benchmarked measurements from Phase 2.

Phase 2: Measurement

When generating measurements for the table unions, the benchmark will reload the snapshots of the tables generated in Phase 1 from disk. Subsequently, the tables will be "incrementally unioned" together.

Once the tables have been loaded and the union initiated, a series of "lookup batches" will be performed. A lookup batch involves performing a large number key lookups on the incrementally unioned table and then calculating the number of lookup operations per second. The measurement series consists of 200 batch lookups.

First, 50 batch lookups are performed without supplying any credits to the unioned table. This establishes a base-line performance picture. Indices [-50, 0] measure lookups to the unioned table with 100% of the debt remaining.

Subsequently, 100 more batch looukps are performed. Before each of these 100 batch lookups, a fixed number of credits are supplied to the incremental union table. The number of credits supplied remain constant between each batch lookup for the entire series of measurements. The series of measurements allows reasoning able table performance over time as the tables debt decreases (at a uniform rate). The number of credits supplied before each lookup batch is 1% of the total starting debt. After 100 steps, 100% of the debt will be paid off. Indices [1, 100] measure lookups to the unioned table with as the remaining debt decreases.

Finally, 50 concluding batch looukps are performed. Since no debt is remaining, no credits are supplied. Rather these meausrments create a "post-payoff" performance picture. Indices [101, 150] measure lookups to the unioned table with 1% of the debt remaining.

The general benchmark format is as follows:

do
measurements <- LSM.withSession (rootDir gopts) $ \session ->
  withLatencyHandle $ \h -> do
    tables <- forM [ 1 .. tableCount gopts ] $ do
      LSM.openTableFromSnapshot <...>

    LSM.withIncrementalUnions tables $ \table -> do
      -- Before payoff picture
      forM [-50 .. 0] $ \step -> do
        measureLookups $ table <...>

      -- During payoff picture
      forM [1 .. 100] $ \step -> do
        LSM.supplyUnionCredits table credits
        measureLookups $ table <...>

      -- After payoff picture
      forM [101 .. 150] $ \step -> do
        measureLookups $ table <...>

outputResults $ analyze measurements

An informative performance plot of the benchmark measurements is generated and placed in the benchmark's directory.

recursion-ninja · 2025-04-28T12:24:53Z

Surprisingly, the performance gets worse when the union table debt reaches 0 (see the red arrow)!

bench/macro/lsm-tree-bench-union.hs

recursion-ninja · 2025-05-04T20:20:52Z

The benchmarks have been reworked. No insert/delete/updates occur during the measurements; only lookups. The output plot has been re-rendered for clarity. The axes are now labeled with units and depict the aggregated lookup time of each batch. Per the suggestion of @dcoutts, a pre- and post- payoff performance picture is generated along side the performance as credits are supplied and the debt is repaid.

jorisdral

The output looks nice, but I have some suggestions for changes. I've also pushed a few commits to the branch that remove features that I don't think the benchmark needs

lsm-tree.cabal

bench/macro/lsm-tree-bench-union.hs

scripts/test-cabal-docspec.sh

cabal.project.release

bench/macro/lsm-tree-bench-unions.hs

bench/macro/lsm-tree-bench-union.hs

scripts/test-cabal-docspec.sh

cabal.project.release

bench/macro/lsm-tree-bench-unions.hs

recursion-ninja · 2025-05-23T15:44:47Z

The computation is there still

Strange, that's not what I'm seeing on my machine. I pushed changes again. Maybe that will remove it.

So that haddock will render the module header. Apparently if a benchmark suite has no `other-modules`, then `cabal haddock --haddock-benchmark` won't bother rendering haddocks. With the main unions benchmark code now in a different module from the `Main` module, we get proper haddocks.

The major changes include: * Measure the performance of all lookups batches together, instead of each batch separately * Use gnuplot instead of charts There are some cascading changes but the spirit of the benchmark stays the same.

jorisdral

I've taken the liberty to apply changes, see the last commit message. I'm approving my own changes here (shame!), but it's not code in public libraries so I'm not letting it go through another review round by someone else. Comments are still welcome, but then I'll apply them in a follow-up PR

Thanks @recursion-ninja for the work

recursion-ninja requested review from dcoutts, jorisdral, mheinzel and wenkokke as code owners April 28, 2025 12:20

jorisdral reviewed Apr 28, 2025

View reviewed changes

bench/macro/lsm-tree-bench-union.hs Outdated Show resolved Hide resolved

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch from f5e1248 to 0d8238e Compare April 28, 2025 13:27

recursion-ninja marked this pull request as draft April 28, 2025 14:16

recursion-ninja commented Apr 28, 2025

View reviewed changes

bench/macro/lsm-tree-bench-union.hs Outdated Show resolved Hide resolved

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch from dc2cc99 to 6617a65 Compare May 4, 2025 20:05

recursion-ninja marked this pull request as ready for review May 4, 2025 21:20

recursion-ninja changed the title ~~WIP: Benchmark suite for table unions~~ Benchmark suite for table unions May 4, 2025

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch 4 times, most recently from b704902 to 625c1c3 Compare May 12, 2025 13:18

recursion-ninja enabled auto-merge May 12, 2025 13:20

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch from fb1a253 to f0ed570 Compare May 16, 2025 18:38

jorisdral force-pushed the recursion-ninja/benchmark-union-merge branch from 60c6365 to 71f5fc1 Compare May 19, 2025 10:39

jorisdral requested changes May 19, 2025

View reviewed changes

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch 2 times, most recently from 8c162e3 to 092fdd1 Compare May 22, 2025 22:55

recursion-ninja requested a review from jorisdral May 22, 2025 22:56

jorisdral reviewed May 23, 2025

View reviewed changes

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch from 092fdd1 to 8e42e98 Compare May 23, 2025 15:43

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch 3 times, most recently from d0a90c2 to 2c4719c Compare May 26, 2025 17:15

recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch 2 times, most recently from d236827 to 798fb2e Compare June 2, 2025 13:07

Adding benchmark for table unions

f5f2a19

jorisdral force-pushed the recursion-ninja/benchmark-union-merge branch from 798fb2e to f5f2a19 Compare June 16, 2025 07:37

jorisdral force-pushed the recursion-ninja/benchmark-union-merge branch from 1e19d48 to 85af54b Compare June 17, 2025 09:26

Refactor the unions benchmark

fbda941

The major changes include: * Measure the performance of all lookups batches together, instead of each batch separately * Use gnuplot instead of charts There are some cascading changes but the spirit of the benchmark stays the same.

jorisdral force-pushed the recursion-ninja/benchmark-union-merge branch from 85af54b to fbda941 Compare June 17, 2025 09:53

jorisdral approved these changes Jun 17, 2025

View reviewed changes

recursion-ninja added this pull request to the merge queue Jun 17, 2025

Merged via the queue into main with commit 4d1876e Jun 17, 2025
30 checks passed

recursion-ninja deleted the recursion-ninja/benchmark-union-merge branch June 17, 2025 11:08

Benchmark suite for table unions #694

Benchmark suite for table unions #694

Uh oh!

Conversation

recursion-ninja commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks for table unions.

Overview

Phase 1: Setup

Phase 2: Measurement

Uh oh!

recursion-ninja commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

recursion-ninja commented May 4, 2025

Uh oh!

jorisdral left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

recursion-ninja commented May 23, 2025

Uh oh!

jorisdral left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

recursion-ninja commented Apr 28, 2025 •

edited

Loading

recursion-ninja commented Apr 28, 2025 •

edited

Loading

jorisdral left a comment •

edited

Loading