New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission: adjust token computation during WAL failover #120135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add tests if this looks ok.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @jbowens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me so far
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @sumeerbhola)
pkg/util/admission/io_load_listener.go
line 785 at r1 (raw file):
// once the primary is considered healthy, say within 10s. So a disk stall // in the primary that lasts 30s, will cause WAL failover for ~40s, and a // disk stall for 1s will cause failover for ~11s. The latter (11s) is short
In the former case, are we worried that an extended failover will cause a backlog of flushes that will then hit L0 all at the same time once we failback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I will ping when tests are ready.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @jbowens)
pkg/util/admission/io_load_listener.go
line 785 at r1 (raw file):
Previously, aadityasondhi (Aaditya Sondhi) wrote…
In the former case, are we worried that an extended failover will cause a backlog of flushes that will then hit L0 all at the same time once we failback?
If we have a backlog of immutable memtables they are all flushed in one flush, so will add a single sub-level.
During WAL failover, possibly caused by a disk stall in the primary location, flushes and compactions can also be stalled. This can cause admission control to compute artificially low token counts for compaction bandwidth out of L0 (if L0 has elevated score), and flush tokens (which are meant to prevent memtable write stalls). The solution outlined here detects WAL failover by looking at increases in the pebble metric WAL.Failover.SecondaryWriteDuration. If an increase happened in the last 15s interval (the token computation interval), the current flush and compaction bytes are ignored for the purpose of smoothing and therefore ignored for computing tokens. For regular work, the previous smoothed compaction tokens continue to be used, and flush tokens are unlimited. For elastic work, the tokens are reduced to near zero. An alternative is to allow unlimited tokens during the stall, but it runs the risk of over-admitting. We allow this alternative to be configured by changing the cluster setting admission.wal.failover.unlimited_tokens.enabled to true. Informs cockroachdb/pebble#3230 Informs CRDB-35401 Epic: none Release note (ops change): The cluster setting admission.wal.failover.unlimited_tokens.enabled can be set to true to cause unlimited admission tokens during WAL failover. This should not be changed without consulting admission control team since the default, which preserves the token counts from the preceding non-WAL-failover interval, is expected to be safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test is ready
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @jbowens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens)
bors r=aadityasondhi |
Build failed (retrying...): |
Build succeeded: |
During WAL failover, possibly caused by a disk stall in the primary location, flushes and compactions can also be stalled. This can cause admission control to compute artificially low token counts for compaction bandwidth out of L0 (if L0 has elevated score), and flush tokens (which are meant to prevent memtable write stalls).
The solution outlined here detects WAL failover by looking at increases in the pebble metric WAL.Failover.SecondaryWriteDuration. If an increase happened in the last 15s interval (the token computation interval), the current flush and compaction bytes are ignored for the purpose of smoothing and therefore ignored for computing tokens. For regular work, the previous smoothed compaction tokens continue to be used, and flush tokens are unlimited. For elastic work, the tokens are reduced to near zero. An alternative is to allow unlimited tokens during the stall, but it runs the risk of over-admitting. We allow this alternative to be configured by changing the cluster setting
admission.wal.failover.unlimited_tokens.enabled to true.
Informs cockroachdb/pebble#3230
Informs CRDB-35401
Epic: none
Release note (ops change): The cluster setting
admission.wal.failover.unlimited_tokens.enabled can be set to true to cause unlimited admission tokens during WAL failover. This should not be changed without consulting admission control team since the default, which preserves the token counts from the preceding non-WAL-failover interval, is expected to be safer.