Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission: adjust token computation during WAL failover #120135

Merged
merged 1 commit into from Mar 14, 2024

Conversation

sumeerbhola
Copy link
Collaborator

During WAL failover, possibly caused by a disk stall in the primary location, flushes and compactions can also be stalled. This can cause admission control to compute artificially low token counts for compaction bandwidth out of L0 (if L0 has elevated score), and flush tokens (which are meant to prevent memtable write stalls).

The solution outlined here detects WAL failover by looking at increases in the pebble metric WAL.Failover.SecondaryWriteDuration. If an increase happened in the last 15s interval (the token computation interval), the current flush and compaction bytes are ignored for the purpose of smoothing and therefore ignored for computing tokens. For regular work, the previous smoothed compaction tokens continue to be used, and flush tokens are unlimited. For elastic work, the tokens are reduced to near zero. An alternative is to allow unlimited tokens during the stall, but it runs the risk of over-admitting. We allow this alternative to be configured by changing the cluster setting
admission.wal.failover.unlimited_tokens.enabled to true.

Informs cockroachdb/pebble#3230

Informs CRDB-35401

Epic: none

Release note (ops change): The cluster setting
admission.wal.failover.unlimited_tokens.enabled can be set to true to cause unlimited admission tokens during WAL failover. This should not be changed without consulting admission control team since the default, which preserves the token counts from the preceding non-WAL-failover interval, is expected to be safer.

@sumeerbhola sumeerbhola requested a review from a team as a code owner March 8, 2024 15:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add tests if this looks ok.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @jbowens)

Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me so far

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)

Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @sumeerbhola)


pkg/util/admission/io_load_listener.go line 785 at r1 (raw file):

	// once the primary is considered healthy, say within 10s. So a disk stall
	// in the primary that lasts 30s, will cause WAL failover for ~40s, and a
	// disk stall for 1s will cause failover for ~11s. The latter (11s) is short

In the former case, are we worried that an extended failover will cause a backlog of flushes that will then hit L0 all at the same time once we failback?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I will ping when tests are ready.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @jbowens)


pkg/util/admission/io_load_listener.go line 785 at r1 (raw file):

Previously, aadityasondhi (Aaditya Sondhi) wrote…

In the former case, are we worried that an extended failover will cause a backlog of flushes that will then hit L0 all at the same time once we failback?

If we have a backlog of immutable memtables they are all flushed in one flush, so will add a single sub-level.

During WAL failover, possibly caused by a disk stall in the primary
location, flushes and compactions can also be stalled. This can cause
admission control to compute artificially low token counts for compaction
bandwidth out of L0 (if L0 has elevated score), and flush tokens (which
are meant to prevent memtable write stalls).

The solution outlined here detects WAL failover by looking at increases
in the pebble metric WAL.Failover.SecondaryWriteDuration. If an increase
happened in the last 15s interval (the token computation interval), the
current flush and compaction bytes are ignored for the purpose of
smoothing and therefore ignored for computing tokens. For regular work,
the previous smoothed compaction tokens continue to be used, and flush
tokens are unlimited. For elastic work, the tokens are reduced to near
zero. An alternative is to allow unlimited tokens during the stall, but
it runs the risk of over-admitting. We allow this alternative to be
configured by changing the cluster setting
admission.wal.failover.unlimited_tokens.enabled to true.

Informs cockroachdb/pebble#3230

Informs CRDB-35401

Epic: none

Release note (ops change): The cluster setting
admission.wal.failover.unlimited_tokens.enabled can be set to true to
cause unlimited admission tokens during WAL failover. This should not
be changed without consulting admission control team since the default,
which preserves the token counts from the preceding non-WAL-failover
interval, is expected to be safer.
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test is ready

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi and @jbowens)

Copy link
Collaborator

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jbowens)

@sumeerbhola
Copy link
Collaborator Author

bors r=aadityasondhi

@craig
Copy link
Contributor

craig bot commented Mar 14, 2024

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 14, 2024

@craig craig bot merged commit db58abe into cockroachdb:master Mar 14, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants