-
Notifications
You must be signed in to change notification settings - Fork 487
lgalloc disk usage limiter #32602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lgalloc disk usage limiter #32602
Conversation
bkirwi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry, started commenting on this before I realized it was still in draft! Submitting since it's already typed out, but feel free to ignore.)
82f4a0b to
1bde917
Compare
| warn!( | ||
| disk_usage, | ||
| disk_limit, "lgalloc disk utilization exceeded configured limits", | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rustfmt showing some bad taste here...
|
Tested this in my staging env and confirmed:
|
8ad653d to
724049f
Compare
|
Rebased and retriggered nightly: https://buildkite.com/materialize/nightly/builds/12336 (since Hetzner is still struggling with aarch64 availability) |
Also use information from less important jobs
This commit adds a limiter task that periodically confirms that lgalloc's current disk usage is below some configured limit and terminates the process otherwise. The disk limit is calculated by applying a configurable factor to the process' memory limit. There is also a burst option, allowing the process to use more than the configured disk limit for a time. The default limits are configured to match the current production reality: The disk limit is twice the memory limit and bursting is disabled. Signed-off-by: Moritz Hoffmann <mh@materialize.com>
These flags can potentially make tests more unstable, by reducing their available disk size, so randomly changing them seems like a recipe for flaky tests.
This commit adds metrics reporting the lgalloc limiter's current disk limit, disk usage, and burst budget. Having these will be useful in production, where debug logging is usually switched off.
antiguru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
|
Hm... Some feature benchmark regressions. I think we need to move the lgalloc limit checker into a separate thread so the I/O it's doing doesn't interfere with other tokio tasks. |
Did you have a chance to look at the metric to see how long its invocations took? We should have enough tokio tasks to not interfere, but who knows! |
Just checked in my staging env and it seems like the max invocation time is always 512us... which is also the smallest bucket in the histogram :D The benchmark regression has also gone away after a retry, so seems fine to merge. |
This PR adds a limiter task that periodically confirms that lgalloc's current disk usage is below some configured limit and terminates the process otherwise.
The disk limit is calculated by applying a configurable factor to the process' memory limit. There is also a burst option, allowing the process to use more than the configured disk limit for a time.
The default limits are configured to match the current production reality: The disk limit is twice the memory limit and bursting is disabled.
Motivation
Closes MaterializeInc/database-issues/issues/9306
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.