-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(pageserver): better k-merge implementation for tiered compaction #7760
base: main
Are you sure you want to change the base?
Conversation
7d01f21
to
f07f673
Compare
No tests were run or test report is not availableTest coverage report is not availableThe comment gets automatically updated with the latest test results
a40eee3 at 2024-05-21T15:23:36.826Z :recycle: |
There is a reproducer for the assertion in #7758, but it doesn't reproduce it all the time. Could you try running that test a couple of times (maybe 10 times) to see if the error is really gone? As for the next function, it doesn't return an |
Before a layer is loaded, the layer's key_lsn() is the key, lsn from the layer's starting key. But after loading, it's the key,lsn of the first key-value pair that's actually in the layer. That's not necessarily the same thing! In particular, with L0 layers, the start of the layer's key range is always 0, but the first key-value pair almost certainly has a higher key. Consider the following simplified example. I'm leaving out the LSNs, they are not important for this. Imagine you're merging two layers: Layer A: key range 0-10. It contains a single key, 8 The code as written would incorrectly return keys 8, 5. This function is clearly a more tricky it seems at first, so maybe we should move the code to a separate source file, and also add some unit tests with some simple mock layer implementation, with examples like the above. |
well, at least it's easier to write complex logic in async fn instead of manually maintain the state machine for async stream 😛 will add some test cases tmr |
yup the old code beautifully handles this case: if it finds an unloaded layer as the first one, it only means that there is the potential to yield the first element from it. It puts the just loaded layer back into the heap, and now the order depends on the first actual element. |
…tion Signed-off-by: Alex Chi Z <chi@neon.tech>
I've been looking at the previous k-merge algorithm and couldn't find anything wrong. Put this PR on hold and focus on solving other issues first. |
Problem
The original k-merge implementation might be buggy and is hard to reason about. This pull request rewrites the k-merge implementation to make it easier to read and potentially avoids the bug that keys are not ordered.
ref #7703
Summary of changes
Adds
LayerIterator
andMergeIterator
. They providekey_lsn
,next
, andis_end
APIs.Checklist before requesting a review
Checklist before merging