-
Notifications
You must be signed in to change notification settings - Fork 487
[persist] Lower some thresholds for persist in CI #32598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4707684 to
c7f6434
Compare
|
For the record, the full CI history: https://buildkite.com/materialize/nightly/builds?branch=bkirwi%3Aci-tune The first couple runs included the CI tuning, but not the bugfix, rebased on a recent and old release. (To check that it wasn't finding a recently introduced bug.) The next two runs include the bugfix, and the failures have disappeared. |
| "16", | ||
| "1000", | ||
| ] | ||
| self.flags_with_values["persist_compaction_memory_bound_bytes"] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above we limit the size of blobs to 1 MiB, 16 MiB, and 128 MiB. Should those limits be reflected here too? Assuming that we're tuning these to fit a very limited number of Parts/blobs in-memory at a given time during compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "reflected" - think we should add some smaller sizes here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I wasn't totally clear here. It seems like the values here for persist_compaction_memory_bound_bytes try to align closely with 1, 4, and 8 blobs, given the default value for persist_blob_target_size. Just wondering if it would be helpful to set smaller values here so when the target blob size is say 16 MiB we still fit only 1, 4, and 8 blobs in the compaction memory bound.
Thinking this through a bit more, adding 64MiB (67108864) here might be interesting because even when the target blob size is small it's still an aggressive memory bound on compaction and can exercise the case when a single blob is larger than our entire bound. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
We'd like to do randomized testing, but that could make this assertion fire very easily. We already have code to report when the limit's not high enough but still make progress, so let's lean on that instead.
It seems that, when compaction is tuned to more frequently generate multiple runs, it's possible to see the retraction of the data before its insertion in this loop. Consolidating means that we'll get a reasonable snapshot of the data even when timestamps have been advanced.
def-
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No complaints from QA!
txn-walbug, which could be triggered when data had undergone compaction but wasn't globally consolidated. (Which can happen when memory limits are too small to compact all parts in one go.)Motivation
Turns out we're missing some test coverage in this area!
Tips for reviewer
I'd love a review from a
txn-walexpert, since I'm not that familiar with the code there.Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.