Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requestbatcher,intentresolver: default to no inflight backpressure #109899

Merged
merged 1 commit into from Sep 6, 2023

Conversation

sumeerbhola
Copy link
Collaborator

For point and ranged intent resolution, the concurrency is already controlled by the number of goroutines that are waiting to resolve intents. The one exception is when there are a huge number of point intents being resolved by a single waiter (which should be very rare since if those intents are from a single txn, they should be ranged intents), but in that case we have already paid the memory cost of buffering these intents, so we may as well send the intent resolution RPCs.

In case we have overlooked something, inflight backpressure can be restored via a cluster setting.

This change is a prerequisite simplification for doing admission control on intent resolution. We don't want an inflight RPC limit to be reached due to admission control being active on a node/store, starving out intent resolution requests for other stores. There is also the async task limit of IntentResolver.sem, but many cases that hit that limit fallback to synchronous waiting (allowSyncProcessing=true), so that limit is not considered harmful.

Informs #97108

Epic: CRDB-25458

Release note (ops change): A cluster setting is introduced (kv.intent_resolver.batcher.in_flight_backpressure_limit.enabled), that defaults to false, and decides whether an in-flight RPC limit is enforced on intent resolution RPCs.

@sumeerbhola sumeerbhola requested review from nvanbenschoten and a team September 1, 2023 16:08
@sumeerbhola sumeerbhola requested a review from a team as a code owner September 1, 2023 16:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context on this change, see the comment thread in https://docs.google.com/document/d/1N2v\_LdkJFzEVxft6qDQFvaGs1q\_HiCnW9BSLji1rd98/edit?disco=AAAA4Tv79DQ

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: except for the question about whether IntentResolverTestingKnobs.InFlightBackpressureLimit should still exist.

Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/intentresolver/intent_resolver.go line 137 at r1 (raw file):

	MaxIntentResolutionBatchIdle time.Duration

	Settings *cluster.Settings

nit: consider grouping this with the other dependency fields. Maybe right below Stopper.


pkg/kv/kvserver/intentresolver/intent_resolver.go line 260 at r1 (raw file):

	inFlightLimit := inFlightLimitProvider{
		settings:                         c.Settings,
		testingInFlightBackpressureLimit: c.TestingKnobs.InFlightBackpressureLimit,

Does it make sense to keep this testing knob if the mechanism is effectively removed? Are tests that use it fooling themself into thinking this backpressure still exists? Are tests that use it needed anymore?


pkg/kv/kvserver/intentresolver/intent_resolver.go line 1118 at r1 (raw file):

// (defaultTaskLimit), and the workload goroutines. Each waiter produces work
// for a single range, and that work can typically be batched into a single
// RPC (since requestbatchere.Config.MaxMsgsPerBatch is quite generous). In

s/requestbatchere/requestbatcher/


pkg/internal/client/requestbatcher/batcher_test.go line 221 at r1 (raw file):

	canReply <- struct{}{}
	canReply <- struct{}{}
	// Now consume all the responses on sendChan

nit: punctuation missing in a few of these comments.

For point and ranged intent resolution, the concurrency is already
controlled by the number of goroutines that are waiting to resolve
intents. The one exception is when there are a huge number of point
intents being resolved by a single waiter (which should be very rare
since if those intents are from a single txn, they should be ranged
intents), but in that case we have already paid the memory cost of
buffering these intents, so we may as well send the intent resolution
RPCs.

In case we have overlooked something, inflight backpressure can be
restored via a cluster setting.

This change is a prerequisite simplification for doing admission control
on intent resolution. We don't want an inflight RPC limit to be reached
due to admission control being active on a node/store, starving out
intent resolution requests for other stores. There is also the async
task limit of IntentResolver.sem, but many cases that hit that limit
fallback to synchronous waiting (allowSyncProcessing=true), so that
limit is not considered harmful.

Informs cockroachdb#97108

Epic: CRDB-25458

Release note (ops change): A cluster setting is introduced
(kv.intent_resolver.batcher.in_flight_backpressure_limit.enabled),
that defaults to false, and decides whether an in-flight RPC limit is
enforced on intent resolution RPCs.
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/intentresolver/intent_resolver.go line 137 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider grouping this with the other dependency fields. Maybe right below Stopper.

Done


pkg/kv/kvserver/intentresolver/intent_resolver.go line 260 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does it make sense to keep this testing knob if the mechanism is effectively removed? Are tests that use it fooling themself into thinking this backpressure still exists? Are tests that use it needed anymore?

We still need that mechanism to be tested as long as it is reasonably possible for someone to turn this back on. And there is the txn record GC path that has an inflight backpressure limit by default.

For the 2 tests that use this, I've added:

// TODO(sumeer): this test clogs up batched intent resolution via an inflight  
// backpressure limit, which by default in no longer limited. But an inflight  
// backpressure limit does exist for GC of txn records. This test should  
// continue to exist until we have production experience with no inflight  
// backpressure for intent resolution. And after that we should create an  
// equivalent test for inflight backpressure for GC of txn records and remove  
// this test.

pkg/kv/kvserver/intentresolver/intent_resolver.go line 1118 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/requestbatchere/requestbatcher/

Done


pkg/internal/client/requestbatcher/batcher_test.go line 221 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: punctuation missing in a few of these comments.

Done

@sumeerbhola
Copy link
Collaborator Author

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Sep 6, 2023

Build succeeded:

@craig craig bot merged commit f75f543 into cockroachdb:master Sep 6, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants