New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
requestbatcher,intentresolver: default to no inflight backpressure #109899
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more context on this change, see the comment thread in https://docs.google.com/document/d/1N2v\_LdkJFzEVxft6qDQFvaGs1q\_HiCnW9BSLji1rd98/edit?disco=AAAA4Tv79DQ
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
993b793
to
a6e9796
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for the question about whether IntentResolverTestingKnobs.InFlightBackpressureLimit
should still exist.
Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/intentresolver/intent_resolver.go
line 137 at r1 (raw file):
MaxIntentResolutionBatchIdle time.Duration Settings *cluster.Settings
nit: consider grouping this with the other dependency fields. Maybe right below Stopper
.
pkg/kv/kvserver/intentresolver/intent_resolver.go
line 260 at r1 (raw file):
inFlightLimit := inFlightLimitProvider{ settings: c.Settings, testingInFlightBackpressureLimit: c.TestingKnobs.InFlightBackpressureLimit,
Does it make sense to keep this testing knob if the mechanism is effectively removed? Are tests that use it fooling themself into thinking this backpressure still exists? Are tests that use it needed anymore?
pkg/kv/kvserver/intentresolver/intent_resolver.go
line 1118 at r1 (raw file):
// (defaultTaskLimit), and the workload goroutines. Each waiter produces work // for a single range, and that work can typically be batched into a single // RPC (since requestbatchere.Config.MaxMsgsPerBatch is quite generous). In
s/requestbatchere/requestbatcher/
pkg/internal/client/requestbatcher/batcher_test.go
line 221 at r1 (raw file):
canReply <- struct{}{} canReply <- struct{}{} // Now consume all the responses on sendChan
nit: punctuation missing in a few of these comments.
For point and ranged intent resolution, the concurrency is already controlled by the number of goroutines that are waiting to resolve intents. The one exception is when there are a huge number of point intents being resolved by a single waiter (which should be very rare since if those intents are from a single txn, they should be ranged intents), but in that case we have already paid the memory cost of buffering these intents, so we may as well send the intent resolution RPCs. In case we have overlooked something, inflight backpressure can be restored via a cluster setting. This change is a prerequisite simplification for doing admission control on intent resolution. We don't want an inflight RPC limit to be reached due to admission control being active on a node/store, starving out intent resolution requests for other stores. There is also the async task limit of IntentResolver.sem, but many cases that hit that limit fallback to synchronous waiting (allowSyncProcessing=true), so that limit is not considered harmful. Informs cockroachdb#97108 Epic: CRDB-25458 Release note (ops change): A cluster setting is introduced (kv.intent_resolver.batcher.in_flight_backpressure_limit.enabled), that defaults to false, and decides whether an in-flight RPC limit is enforced on intent resolution RPCs.
a6e9796
to
849f94c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/intentresolver/intent_resolver.go
line 137 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider grouping this with the other dependency fields. Maybe right below
Stopper
.
Done
pkg/kv/kvserver/intentresolver/intent_resolver.go
line 260 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does it make sense to keep this testing knob if the mechanism is effectively removed? Are tests that use it fooling themself into thinking this backpressure still exists? Are tests that use it needed anymore?
We still need that mechanism to be tested as long as it is reasonably possible for someone to turn this back on. And there is the txn record GC path that has an inflight backpressure limit by default.
For the 2 tests that use this, I've added:
// TODO(sumeer): this test clogs up batched intent resolution via an inflight
// backpressure limit, which by default in no longer limited. But an inflight
// backpressure limit does exist for GC of txn records. This test should
// continue to exist until we have production experience with no inflight
// backpressure for intent resolution. And after that we should create an
// equivalent test for inflight backpressure for GC of txn records and remove
// this test.
pkg/kv/kvserver/intentresolver/intent_resolver.go
line 1118 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/requestbatchere/requestbatcher/
Done
pkg/internal/client/requestbatcher/batcher_test.go
line 221 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: punctuation missing in a few of these comments.
Done
bors r=nvanbenschoten |
Build succeeded: |
For point and ranged intent resolution, the concurrency is already controlled by the number of goroutines that are waiting to resolve intents. The one exception is when there are a huge number of point intents being resolved by a single waiter (which should be very rare since if those intents are from a single txn, they should be ranged intents), but in that case we have already paid the memory cost of buffering these intents, so we may as well send the intent resolution RPCs.
In case we have overlooked something, inflight backpressure can be restored via a cluster setting.
This change is a prerequisite simplification for doing admission control on intent resolution. We don't want an inflight RPC limit to be reached due to admission control being active on a node/store, starving out intent resolution requests for other stores. There is also the async task limit of IntentResolver.sem, but many cases that hit that limit fallback to synchronous waiting (allowSyncProcessing=true), so that limit is not considered harmful.
Informs #97108
Epic: CRDB-25458
Release note (ops change): A cluster setting is introduced (kv.intent_resolver.batcher.in_flight_backpressure_limit.enabled), that defaults to false, and decides whether an in-flight RPC limit is enforced on intent resolution RPCs.