-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: integrate admission control for SQL=>KV and KV=>SQL #65614
Conversation
Very cool! What happened to the throughput (queries per second) in that experiment? |
I didn't run with one setting for the whole experiment, but eyeballing the log, it looks like it dropped from ~6300 ops/s to ~4990 ops/s. This drop percentage is much more than the cpu decrease -- I don't have a good answer for it. It is possible that there are background activities that were falling behind earlier and are now consuming more cpu time, so the cpu decrease is not as extreme.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posting some comments on the util
part.
Reviewed 5 of 22 files at r1, 2 of 8 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)
pkg/util/admission/granter.go, line 25 at r1 (raw file):
) var KVSlotAdjusterOverloadMultiplier = settings.RegisterIntSetting(
[nit] maybe OverloadThreshold, and use "the number of runnable goroutines per CPU" in the description
pkg/util/admission/granter.go, line 437 at r1 (raw file):
cpuLoadListener CPULoadListener numProcs int
[nit] could use a comment explaining how it is set and how it is used.
Things would be more clear if we just initialized this to GOMAXPROCS when we create the GrantCoordinator. We don't expect this to change dynamically, and tolerating that here adds unnecessary complication.
pkg/util/admission/granter.go, line 443 at r1 (raw file):
// grantChainActive indicates whether a grant chain is active. If active, // grantChainNum is the number of that chain. If !active, grantChainNum
"number of that chain" doesn't help much, what does the number represent?
Also, given that it it used in interfaces, maybe it would be a good idea to make it a separate type and document it there.
pkg/util/admission/granter.go, line 792 at r1 (raw file):
// grantChainActive=false. func (coord *GrantCoordinator) tryGrant() { // grantChainNum is the already active grant chain, or the one we will
[nit] this comment is confusing (we are not using grantChainNum here)
pkg/util/admission/granter.go, line 829 at r1 (raw file):
grantBurstCount++ if startingChain && grantBurstCount == grantBurstLimit { log.Infof(context.Background(), "granter: grant chain started at index %d", coord.grantChainIndex)
Aren't these going to be super frequent logs? Maybe we want to put them behind a debug
flag? (we can leave it to true for now, I realize this is all disabled by default)
pkg/util/admission/granter.go, line 844 at r1 (raw file):
grantBurstCount) } break OuterLoop
[nit] might be cleaner to do coord.grantChainNum++
in the if above, and just return
here.
pkg/util/admission/work_queue.go, line 42 at r1 (raw file):
false).WithPublic() var admissionControlEnabledSettings [numWorkKinds]*settings.BoolSetting
[nit]
.. = {
KVWork: KVAdmissionControlEnabled,
SQLKVResponseWork: SQLKVResponseWorkAdmissionControlEnabled,
}
pkg/util/admission/work_queue.go, line 495 at r1 (raw file):
priority WorkPriority createTime int64 ch chan uint64
[nit] document what goes in the channel
pkg/util/goschedstats/runnable.go, line 83 at r1 (raw file):
var callbackInfo struct { mu syncutil.Mutex cb RunnableCountCallback
Consider storing the callback in anatomic.Value
instead of using a mutex.
pkg/util/goschedstats/runnable.go, line 94 at r1 (raw file):
// a smoothed value (say over 1s) should be done only after thorough load // testing under adversarial load conditions (specifically, we need to react // quickly to large drops in runnable due to blocking on IO, so that we don't
[not for this change] We may be able to do some kind of smart smoothing to make this signal react faster to decreases and slower to increases.
Perhaps an exponentially weighted harmonic mean instead of arithmetic mean (the harmonic mean is biased towards the smaller values). This can have some theoretical footing if we think as our real signal to be not the number of goroutines but the inverse of that - i.e. what fraction of a CPU does a running goroutine get.
pkg/util/goschedstats/runnable.go, line 97 at r1 (raw file):
// waste cpu -- a workload that fluctuates rapidly between being IO bound and // cpu bound could stress the usage of a smoothed signal). func RegisterRunnableCountCallback(cb RunnableCountCallback) {
[nit] I would rename to SetRunnableCountCallback
to make it more clear that it "overwrites" any previously set callback.
pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):
strings.Contains(stack, "goroutine in C code") || strings.Contains(stack, "runtime.CPUProfile") || // The admission control package creates a small number of goroutines
Won't these add up when a binary is running a lot of unit tests? (e.g. logictests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 22 files at r1, 3 of 8 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)
pkg/kv/db.go, line 264 at r2 (raw file):
// SQL code that all uses kv.DB. // TODO(sumeer): find a home for this in the SQL layer. SQLKVResponseAdmissionQ *admission.WorkQueue
Maybe this belongs in DBContext
?
pkg/kv/txn.go, line 140 at r3 (raw file):
txn.admissionHeader = roachpb.AdmissionHeader{ Priority: int32(admission.NormalPri), CreateTime: timeutil.Now().UnixNano(),
Would it make sense to get the time from inside the admission control package, and only if we actually have to queue the operation?
pkg/server/node.go, line 925 at r2 (raw file):
} var enabled bool if n.admissionQ != nil {
[nit] This if
should be around all the new code.
pkg/server/server.go, line 435 at r3 (raw file):
registry.AddMetricStruct(metrics[i]) } goschedstats.RegisterRunnableCountCallback(gcoord.CPULoad)
This will break for unit tests that set up multi-node clusters, or unit tests that run in parallel with each other.
pkg/util/admission/work_queue.go, line 188 at r2 (raw file):
// waitingWork.ch. // Admit is called when requesting admission for some work.
[nit] Explain the enabled flag and the expectation that AdmittedWorkDone
cannot be called if enabled=false (and it must be called if enabled=true for specific work types)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR. I addressed some of the comments. I'll come back to the rest soon.
I'm unclear about the chart catalog (TestChartCatalogMetrics is failing because the admission metrics are not registered there) -- do we really need every metric in there? I don't want to add an admin UI page for the admission metrics yet. And the Organization
field is not very clear to me -- seems like each slice is a hierarchy, but the "runnable goroutines" metric you recently added shows up under "runtime", while here is it organized as Organization: [][]string{{Process, "Server", "go"}}
.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)
pkg/kv/txn.go, line 140 at r3 (raw file):
Previously, RaduBerinde wrote…
Would it make sense to get the time from inside the admission control package, and only if we actually have to queue the operation?
We want the CreateTime
to be assigned at txn creation time, so that earlier txns get preference in getting their work scheduled (within the same tenant and priority).
pkg/server/server.go, line 435 at r3 (raw file):
Previously, RaduBerinde wrote…
This will break for unit tests that set up multi-node clusters, or unit tests that run in parallel with each other.
I've changed this to allow registration of multiple callbacks, and added unregistration.
pkg/util/admission/granter.go, line 829 at r1 (raw file):
Previously, RaduBerinde wrote…
Aren't these going to be super frequent logs? Maybe we want to put them behind a
debug
flag? (we can leave it to true for now, I realize this is all disabled by default)
forgot to remove these temporary ones. I've done so now.
pkg/util/goschedstats/runnable.go, line 83 at r1 (raw file):
Previously, RaduBerinde wrote…
Consider storing the callback in an
atomic.Value
instead of using a mutex.
this is now a slice.
pkg/util/goschedstats/runnable.go, line 97 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] I would rename to
SetRunnableCountCallback
to make it more clear that it "overwrites" any previously set callback.
it now no longer overwrites
pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):
Previously, RaduBerinde wrote…
Won't these add up when a binary is running a lot of unit tests? (e.g. logictests)
I suppose yes, if all the tests in a package are run using the same go runtime -- is that true?
I've switched to using the stopper
that is passed to NewServer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything needs to be in chart_catalog.go
, so it can be selected for a custom chart in the UI. Not very sure of the correct organization
The charts that actually show up in the UI by default are defined elsewhere (under ui/
).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)
pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):
Previously, sumeerbhola wrote…
I suppose yes, if all the tests in a package are run using the same go runtime -- is that true?
I've switched to using thestopper
that is passed toNewServer
.
Yes, they all run from testting.M.Run()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, and @RaduBerinde)
pkg/kv/db.go, line 264 at r2 (raw file):
Previously, RaduBerinde wrote…
Maybe this belongs in
DBContext
?
kv.DBContext
? That just shifts things around in KV land.
I guess you are saying that until we find a proper home for this, DBContext
is more principled than directly in DB
, yes?
pkg/server/node.go, line 925 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] This
if
should be around all the new code.
Done
pkg/util/admission/granter.go, line 25 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] maybe OverloadThreshold, and use "the number of runnable goroutines per CPU" in the description
Done
pkg/util/admission/granter.go, line 437 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] could use a comment explaining how it is set and how it is used.
Things would be more clear if we just initialized this to GOMAXPROCS when we create the GrantCoordinator. We don't expect this to change dynamically, and tolerating that here adds unnecessary complication.
I've added a comment. I don't see the complexity argument -- this is rather trivial and we've built the rest to handle changes in GOMAXPROCS in case we do vertical scaling in cloud, or customers do it for their deployments.
pkg/util/admission/granter.go, line 443 at r1 (raw file):
"number of that chain" doesn't help much, what does the number represent?
Also, given that it it used in interfaces, maybe it would be a good idea to make it a separate type and document it there.
It's an ID. I've renamed to grantChainID
everywhere and also added a grantChainID
type and the noGrantChain
value.
pkg/util/admission/granter.go, line 792 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] this comment is confusing (we are not using grantChainNum here)
Removed
pkg/util/admission/granter.go, line 844 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] might be cleaner to do
coord.grantChainNum++
in the if above, and justreturn
here.
I think it is less bug-prone this way since the "no grant chain" cases are all handled in one place. I've added a code comment above OuterLoop
and a code comment below stating the invariant.
pkg/util/admission/work_queue.go, line 42 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit]
.. = { KVWork: KVAdmissionControlEnabled, SQLKVResponseWork: SQLKVResponseWorkAdmissionControlEnabled, }
Done
pkg/util/admission/work_queue.go, line 495 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] document what goes in the channel
Done
pkg/util/admission/work_queue.go, line 188 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] Explain the enabled flag and the expectation that
AdmittedWorkDone
cannot be called if enabled=false (and it must be called if enabled=true for specific work types)
Done. And updated the usage comments here and in doc.go
pkg/util/goschedstats/runnable.go, line 94 at r1 (raw file):
Previously, RaduBerinde wrote…
[not for this change] We may be able to do some kind of smart smoothing to make this signal react faster to decreases and slower to increases.
Perhaps an exponentially weighted harmonic mean instead of arithmetic mean (the harmonic mean is biased towards the smaller values). This can have some theoretical footing if we think as our real signal to be not the number of goroutines but the inverse of that - i.e. what fraction of a CPU does a running goroutine get.
Ack
pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):
Previously, RaduBerinde wrote…
Yes, they all run from
testting.M.Run()
.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really great!
Reviewed 3 of 22 files at r1, 2 of 8 files at r2, 2 of 4 files at r3, 15 of 15 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, and @sumeerbhola)
pkg/kv/db.go, line 264 at r2 (raw file):
Previously, sumeerbhola wrote…
kv.DBContext
? That just shifts things around in KV land.
I guess you are saying that until we find a proper home for this,DBContext
is more principled than directly inDB
, yes?
Not sure, I just noticed that DBContext contains the stopper which is the closest thing to this conceptually, but maybe that wasn't a very principled choice either.
pkg/server/server.go, line 436 at r4 (raw file):
} cbID := goschedstats.RegisterRunnableCountCallback(gcoord.CPULoad) gcoord.RunCallbackOnClose(func() {
Seems odd that we Register here but relegate the Unregister on to gcoord
. Why not just set up the closer directly?
stopper.AddCloser(stop.CloserFn(func() {
goschedstats.UnregisterRunnableCountCallback(cbID)
})
pkg/util/admission/granter.go, line 714 at r4 (raw file):
coord.granters[SQLSQLResponseWork].(*tokenGranter).refillBurstTokens() if coord.grantChainActive { coord.grantChainID++
[nit] This could use an explanation. We are basically "expiring" any outstanding grant chains, right? Well, we're actually setting them up to be expired when we start another grant chain? Maybe point in a comment to the logic in continueGrantChain
too, as they are logically coupled.
pkg/util/admission/granter.go, line 793 at r4 (raw file):
coord.mu.Lock() defer coord.mu.Unlock() if coord.grantChainID > grantChainID && coord.grantChainActive {
[nit] this could use a comment, it's not immediately clear why we're comparing IDs like that. I guess that coord.grantChainID
will never be less than grantChainID
, so it would be more clear if we did if coord.grantChainActive && coord.grantChainID != grantChainID
?
pkg/util/admission/granter.go, line 818 at r4 (raw file):
grantBurstLimit := coord.numProcs multiplier := 1 if coord.settings != nil {
It seems odd that we tolerate having no settings and the behavior when there are no settings is questionable - wouldn't we want the default value of 8 here instead of 1?
It would be cleaner to always require settings (we can use MakeTestingClusterSettings()
in tests).
pkg/util/admission/granter.go, line 853 at r4 (raw file):
case grantFailLocal: localDone = true }
[nit] we should probably panic here for other values, or we may end up with an infinite loop if there's a bug (e.g. if we add another result value in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
It'd be interesting to understand the overhead of this system when the slots are turned way up (such that nothing is being throttled whatsoever but the data structures are being used) vs. master.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @sumeerbhola)
I agree. The tpcc roachtest which finds the max # warehouses is a good one to try. |
- Add roachpb.AdmissionHeader used for KV admission control. - Initialize this for NewTxnWithSteppingEnabled that is used by SQL. Other paths will bypass KV admission control. - Node.Batch does server-side admission control for KV. - sql.tableWriterBase and row.txnKVFetcher do KV=>SQL response admission control since these are the shared read and write paths. - Cluster settings to enable admission control. - RegisterRunnableCountCallback is used to register for unsmoothed load information, as used in earlier experiments. - The grant chaining in the admission package was causing the runnable count to be too low. Grant batching is introduced, that goes along with chaining. Additionally, we also immediately terminate and start a new grant chain, since the lag in asking for termination and waiting for it to happen was also preventing runnable counts from getting to a higher value. These changes were evaluated using kv50/enc=false/nodes=1/conc=8192 which causes CPU overload in the absence of admission control: CPU > 96%, runnable goroutines per cpu > 120, node heartbeat latency > 1s, clock offset +- 100ms. After turning on admission control, runnable goroutines drop to < 40, and cpu utilization to 90-93%, node heartbeat latency ~200ms, clock offset to +- 1ms. Queuing is transferred to the WorkQueue, with p50 queueing latency for KV work at 200ms and KV=>SQL response work at 1.1s. Since the CPU utilization decreases, there is an increase in SQL execution latency since we are leaving more unused CPU. Without further increasing the runnable count, which would defeat the purpose of admission control, we cannot get to a higher utilization. It is likely that this CPU utilization decrease is more pronounced because of the small units of work in the KV workoad Release note (ops change): Node-level admission control that considers the cpu resource is introduced for KV request processing, and response processing (in SQL) for KV responses. This admission control can be enabled using admission.kv.enabled and admission.sql_kv_response.enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @RaduBerinde)
pkg/kv/db.go, line 264 at r2 (raw file):
Previously, RaduBerinde wrote…
Not sure, I just noticed that DBContext contains the stopper which is the closest thing to this conceptually, but maybe that wasn't a very principled choice either.
I've let this be for now.
pkg/server/server.go, line 436 at r4 (raw file):
Previously, RaduBerinde wrote…
Seems odd that we Register here but relegate the Unregister on to
gcoord
. Why not just set up the closer directly?stopper.AddCloser(stop.CloserFn(func() { goschedstats.UnregisterRunnableCountCallback(cbID) })
I looked for something like CloserFn
and didn't notice it, and didn't want to define a new type just for this. Thanks for the pointer.
Go is sometimes so crazily wonderful -- treating a function as a type that can have a method!
And then I remember it is a garbage collected language.
pkg/util/admission/granter.go, line 714 at r4 (raw file):
We are basically "expiring" any outstanding grant chains, right? Well, we're actually setting them up to be expired when we start another grant chain? Maybe point in a comment to the logic in continueGrantChain too, as they are logically coupled.
We are allowed to start a new grant chain immediately. And yes we are setting the existing one(s) to expire.
I've moved this to a terminateGrantChain
method with a comment since the same logic was used in 2 places. And added a comment in continueGrantChain
.
pkg/util/admission/granter.go, line 793 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] this could use a comment, it's not immediately clear why we're comparing IDs like that. I guess that
coord.grantChainID
will never be less thangrantChainID
, so it would be more clear if we didif coord.grantChainActive && coord.grantChainID != grantChainID
?
This was a bug: it is now simply `coord.grantChainID != grantChainID
pkg/util/admission/granter.go, line 818 at r4 (raw file):
Previously, RaduBerinde wrote…
It seems odd that we tolerate having no settings and the behavior when there are no settings is questionable - wouldn't we want the default value of 8 here instead of 1?
It would be cleaner to always require settings (we can use
MakeTestingClusterSettings()
in tests).
Done. I was using 1 for the test.
pkg/util/admission/granter.go, line 853 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] we should probably panic here for other values, or we may end up with an infinite loop if there's a bug (e.g. if we add another result value in the future)
Done
bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)
Build failed: |
I think the build failures have been fixed by #65745 |
bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
@sumeerbhola out of curiosity, why did you decide to include the |
I thought I maybe polluting |
by SQL. Other paths will bypass KV admission control.
response admission control since these are the shared
read and write paths.
unsmoothed load information, as used in earlier
experiments.
the runnable count to be too low. Grant batching is
introduced, that goes along with chaining. Additionally,
we also immediately terminate and start a new grant chain,
since the lag in asking for termination and waiting for
it to happen was also preventing runnable counts from
getting to a higher value.
These changes were evaluated using kv50/enc=false/nodes=1/conc=8192
which causes CPU overload in the absence of admission control:
CPU > 96%, runnable goroutines per cpu > 120, node heartbeat
latency > 1s, clock offset +- 100ms.
After turning on admission control, runnable goroutines
drop to < 40, and cpu utilization to 90-93%, node heartbeat
latency ~200ms, clock offset to +- 1ms.
Queuing is transferred to the WorkQueue, with p50 queueing
latency for KV work at 200ms and KV=>SQL response work
at 1.1s.
Since the CPU utilization decreases, there is an increase
in SQL execution latency since we are leaving more unused
CPU. Without further increasing the runnable count, which
would defeat the purpose of admission control, we cannot
get to a higher utilization. It is likely that this CPU
utilization decrease is more pronounced because of the
small units of work in the KV workoad
Release note (ops change): Node-level admission control that
considers the cpu resource is introduced for KV request
processing, and response processing (in SQL) for KV responses.
This admission control can be enabled using
admission.kv.enabled and admission.sql_kv_response.enabled.
Screenshots with admission control turned on at 14:57