Skip to content

Conversation

@dcoutts
Copy link
Collaborator

@dcoutts dcoutts commented Jun 17, 2025

This now gets real parallel speedups on the WP8 benchmark in pipelined
mode.

On my laptop, we get:

  • non-pipelined mode: 86.5k
  • before: pipelined mode (2 cores): 92.2k
  • after: pipelined mode (2 cores): 120.0k

In part this is because pipelined mode on 1 core is a regression: 70.1k
because it has to do strictly more work, and it avoids doing any
batching which normally improves performance.

The crucial thing is minimising batching of merge work, so that we get better parallel work balance. To do this we expose a new MergeBatchSize in the TableConfig and allow overriding it in the TableConfigOverride.

Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, no comments on the general idea. Just some smaller comments to resolve

Comment on lines +429 to +439
-- TODO: the thresholds for doing merge work should be different for each level,
-- and ideally all-pairs co-prime.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the thresholds also have some relationship with the size of update batches? Even if the thresholds are co-prime, if the update batch is large enough then we could hit all thresholds at the same time

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true of course. But the update batch size is only known dynamically and it can change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ideally doing a big batch of updates would not re-synchronise the counters relative to their thresholds.

@dcoutts dcoutts force-pushed the dcoutts/wp8-bench-pipelined-3 branch 2 times, most recently from 25a5a16 to a5ee0ba Compare June 19, 2025 10:38
@dcoutts dcoutts requested a review from jorisdral June 19, 2025 10:38
@dcoutts dcoutts force-pushed the dcoutts/wp8-bench-pipelined-3 branch from a5ee0ba to f2f42c3 Compare June 19, 2025 12:53
As a result, we keep around golden files for old snapshot versions as newer ones
are created. The upside of this is that we could test the backwards compatiblity
of the snapshot codec: run versioned decoders on the golden files for older
snapshot versions and check that there are no errors. I've not implemented such
a test yet, but this would be one step in that direction.
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I have a slightly altered version of this branch on jdral/wp8-bench-pipelined-3, that maybe you could look at. The change I make there is that we keep the golden files for both V0 and V1 around, so that at some point we could test backwards compatibility of versioned decoders. See the second commit. Since our implementation only encodes in the current snapshot version, we'd need those golden files to check backwards compatibility. If you agree with this change, then we could port those commits to this branch

This would provide a minimal diff between your branch and mine:

git diff origin/dcoutts/wp8-bench-pipelined-3 origin/jdral/wp8-bench-pipelined-3

dcoutts added 4 commits June 26, 2025 15:10
But don't yet actually change the serialisation format. This is partly
just to demonstrate to ourselves how to do it, so there's a pattern to
follow in future.

Doing this highlights that we cannot generally match on the version, and
should only do so in places where the format is actually different
between versions. Otherwise we would have to duplicate too much code.
Previously it was hard coded to be the same as the write buffer size.

Document what it means as a new tunable parameter.

Setting this low (1) is important for getting good parallel work balance
on the pipelined WP8 benchmark. It is a crucial change that makes the
pipelined version actually improve performance. Previously it would only
get about a 5 to 10% improvement.
And add MergeBatchSize to TableConfigOverride.
This now gets real parallel speedups on the WP8 benchmark in pipelined
mode.

On my laptop, we get:
* non-pipelined mode:                86.5k
* before: pipelined mode (2 cores):  92.2k
* after:  pipelined mode (2 cores): 120.0k

In part this is because pipelined mode on 1 core is a regression: 70.1k
because it has to do strictly more work, and it avoids doing any
batching which normally improves performance.
@jorisdral jorisdral force-pushed the dcoutts/wp8-bench-pipelined-3 branch from f2f42c3 to 09f06ae Compare June 26, 2025 13:26
@jorisdral
Copy link
Collaborator

RE my last comment, I've now incorporated the changes I had on the jdral/wp8-bench-pipelined3 branch into this PR with approval from @dcoutts. Now I'll go ahead with merging this PR

@jorisdral jorisdral enabled auto-merge June 26, 2025 13:55
@jorisdral jorisdral added this pull request to the merge queue Jun 26, 2025
Merged via the queue into main with commit d87f9fa Jun 26, 2025
30 checks passed
@jorisdral jorisdral deleted the dcoutts/wp8-bench-pipelined-3 branch June 26, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants