Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: integrate admission control for SQL=>KV and KV=>SQL #65614

Merged
merged 1 commit into from
May 27, 2021

Conversation

sumeerbhola
Copy link
Collaborator

  • Add roachpb.AdmissionHeader used for KV admission control.
  • Initialize this for NewTxnWithSteppingEnabled that is used
    by SQL. Other paths will bypass KV admission control.
  • Node.Batch does server-side admission control for KV.
  • sql.tableWriterBase and row.txnKVFetcher do KV=>SQL
    response admission control since these are the shared
    read and write paths.
  • Cluster settings to enable admission control.
  • RegisterRunnableCountCallback is used to register for
    unsmoothed load information, as used in earlier
    experiments.
  • The grant chaining in the admission package was causing
    the runnable count to be too low. Grant batching is
    introduced, that goes along with chaining. Additionally,
    we also immediately terminate and start a new grant chain,
    since the lag in asking for termination and waiting for
    it to happen was also preventing runnable counts from
    getting to a higher value.

These changes were evaluated using kv50/enc=false/nodes=1/conc=8192
which causes CPU overload in the absence of admission control:
CPU > 96%, runnable goroutines per cpu > 120, node heartbeat
latency > 1s, clock offset +- 100ms.
After turning on admission control, runnable goroutines
drop to < 40, and cpu utilization to 90-93%, node heartbeat
latency ~200ms, clock offset to +- 1ms.
Queuing is transferred to the WorkQueue, with p50 queueing
latency for KV work at 200ms and KV=>SQL response work
at 1.1s.
Since the CPU utilization decreases, there is an increase
in SQL execution latency since we are leaving more unused
CPU. Without further increasing the runnable count, which
would defeat the purpose of admission control, we cannot
get to a higher utilization. It is likely that this CPU
utilization decrease is more pronounced because of the
small units of work in the KV workoad

Release note (ops change): Node-level admission control that
considers the cpu resource is introduced for KV request
processing, and response processing (in SQL) for KV responses.
This admission control can be enabled using
admission.kv.enabled and admission.sql_kv_response.enabled.

Screenshots with admission control turned on at 14:57

Screen Shot 2021-05-24 at 11 03 38 AM

Screen Shot 2021-05-24 at 11 03 21 AM

Screen Shot 2021-05-24 at 11 03 49 AM

Screen Shot 2021-05-24 at 11 05 58 AM

Screen Shot 2021-05-24 at 11 04 05 AM

Screen Shot 2021-05-24 at 11 04 37 AM

Screen Shot 2021-05-24 at 11 04 21 AM

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member

Very cool! What happened to the throughput (queries per second) in that experiment?

@sumeerbhola
Copy link
Collaborator Author

What happened to the throughput (queries per second) in that experiment?

I didn't run with one setting for the whole experiment, but eyeballing the log, it looks like it dropped from ~6300 ops/s to ~4990 ops/s. This drop percentage is much more than the cpu decrease -- I don't have a good answer for it. It is possible that there are background activities that were falling behind earlier and are now consuming more cpu time, so the cpu decrease is not as extreme.
Somewhat related to this is the question of why CPU utilization drops at all. I've added a TODO in kvSlotAdjuster.CPULoad

		// TODO(sumeer): despite the additive decrease and high multiplier value,
		// the metric showed some drops from 40 slots to 1 slot on a kv50 overload
		// workload. It was not accompanied by a drop in runnable count per proc,
		// so it is suggests that the drop in slots should not be causing cpu
		// under-utilization, but one cannot be sure. Experiment with a smoothed
		// signal or other ways to prevent a fast drop.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting some comments on the util part.

Reviewed 5 of 22 files at r1, 2 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)


pkg/util/admission/granter.go, line 25 at r1 (raw file):

)

var KVSlotAdjusterOverloadMultiplier = settings.RegisterIntSetting(

[nit] maybe OverloadThreshold, and use "the number of runnable goroutines per CPU" in the description


pkg/util/admission/granter.go, line 437 at r1 (raw file):

	cpuLoadListener      CPULoadListener

	numProcs int

[nit] could use a comment explaining how it is set and how it is used.

Things would be more clear if we just initialized this to GOMAXPROCS when we create the GrantCoordinator. We don't expect this to change dynamically, and tolerating that here adds unnecessary complication.


pkg/util/admission/granter.go, line 443 at r1 (raw file):

	// grantChainActive indicates whether a grant chain is active. If active,
	// grantChainNum is the number of that chain. If !active, grantChainNum

"number of that chain" doesn't help much, what does the number represent?

Also, given that it it used in interfaces, maybe it would be a good idea to make it a separate type and document it there.


pkg/util/admission/granter.go, line 792 at r1 (raw file):

// grantChainActive=false.
func (coord *GrantCoordinator) tryGrant() {
	// grantChainNum is the already active grant chain, or the one we will

[nit] this comment is confusing (we are not using grantChainNum here)


pkg/util/admission/granter.go, line 829 at r1 (raw file):

					grantBurstCount++
					if startingChain && grantBurstCount == grantBurstLimit {
						log.Infof(context.Background(), "granter: grant chain started at index %d", coord.grantChainIndex)

Aren't these going to be super frequent logs? Maybe we want to put them behind a debug flag? (we can leave it to true for now, I realize this is all disabled by default)


pkg/util/admission/granter.go, line 844 at r1 (raw file):

						grantBurstCount)
				}
				break OuterLoop

[nit] might be cleaner to do coord.grantChainNum++ in the if above, and just return here.


pkg/util/admission/work_queue.go, line 42 at r1 (raw file):

	false).WithPublic()

var admissionControlEnabledSettings [numWorkKinds]*settings.BoolSetting

[nit]

.. = {
KVWork: KVAdmissionControlEnabled,
SQLKVResponseWork: SQLKVResponseWorkAdmissionControlEnabled,
}

pkg/util/admission/work_queue.go, line 495 at r1 (raw file):

	priority   WorkPriority
	createTime int64
	ch         chan uint64

[nit] document what goes in the channel


pkg/util/goschedstats/runnable.go, line 83 at r1 (raw file):

var callbackInfo struct {
	mu syncutil.Mutex
	cb RunnableCountCallback

Consider storing the callback in anatomic.Value instead of using a mutex.


pkg/util/goschedstats/runnable.go, line 94 at r1 (raw file):

// a smoothed value (say over 1s) should be done only after thorough load
// testing under adversarial load conditions (specifically, we need to react
// quickly to large drops in runnable due to blocking on IO, so that we don't

[not for this change] We may be able to do some kind of smart smoothing to make this signal react faster to decreases and slower to increases.

Perhaps an exponentially weighted harmonic mean instead of arithmetic mean (the harmonic mean is biased towards the smaller values). This can have some theoretical footing if we think as our real signal to be not the number of goroutines but the inverse of that - i.e. what fraction of a CPU does a running goroutine get.


pkg/util/goschedstats/runnable.go, line 97 at r1 (raw file):

// waste cpu -- a workload that fluctuates rapidly between being IO bound and
// cpu bound could stress the usage of a smoothed signal).
func RegisterRunnableCountCallback(cb RunnableCountCallback) {

[nit] I would rename to SetRunnableCountCallback to make it more clear that it "overwrites" any previously set callback.


pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):

			strings.Contains(stack, "goroutine in C code") ||
			strings.Contains(stack, "runtime.CPUProfile") ||
			// The admission control package creates a small number of goroutines

Won't these add up when a binary is running a lot of unit tests? (e.g. logictests)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 22 files at r1, 3 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)


pkg/kv/db.go, line 264 at r2 (raw file):

	// SQL code that all uses kv.DB.
	// TODO(sumeer): find a home for this in the SQL layer.
	SQLKVResponseAdmissionQ *admission.WorkQueue

Maybe this belongs in DBContext ?


pkg/kv/txn.go, line 140 at r3 (raw file):

	txn.admissionHeader = roachpb.AdmissionHeader{
		Priority:   int32(admission.NormalPri),
		CreateTime: timeutil.Now().UnixNano(),

Would it make sense to get the time from inside the admission control package, and only if we actually have to queue the operation?


pkg/server/node.go, line 925 at r2 (raw file):

	}
	var enabled bool
	if n.admissionQ != nil {

[nit] This if should be around all the new code.


pkg/server/server.go, line 435 at r3 (raw file):

		registry.AddMetricStruct(metrics[i])
	}
	goschedstats.RegisterRunnableCountCallback(gcoord.CPULoad)

This will break for unit tests that set up multi-node clusters, or unit tests that run in parallel with each other.


pkg/util/admission/work_queue.go, line 188 at r2 (raw file):

// waitingWork.ch.

// Admit is called when requesting admission for some work.

[nit] Explain the enabled flag and the expectation that AdmittedWorkDone cannot be called if enabled=false (and it must be called if enabled=true for specific work types)

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR. I addressed some of the comments. I'll come back to the rest soon.

I'm unclear about the chart catalog (TestChartCatalogMetrics is failing because the admission metrics are not registered there) -- do we really need every metric in there? I don't want to add an admin UI page for the admission metrics yet. And the Organization field is not very clear to me -- seems like each slice is a hierarchy, but the "runnable goroutines" metric you recently added shows up under "runtime", while here is it organized as Organization: [][]string{{Process, "Server", "go"}}.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)


pkg/kv/txn.go, line 140 at r3 (raw file):

Previously, RaduBerinde wrote…

Would it make sense to get the time from inside the admission control package, and only if we actually have to queue the operation?

We want the CreateTime to be assigned at txn creation time, so that earlier txns get preference in getting their work scheduled (within the same tenant and priority).


pkg/server/server.go, line 435 at r3 (raw file):

Previously, RaduBerinde wrote…

This will break for unit tests that set up multi-node clusters, or unit tests that run in parallel with each other.

I've changed this to allow registration of multiple callbacks, and added unregistration.


pkg/util/admission/granter.go, line 829 at r1 (raw file):

Previously, RaduBerinde wrote…

Aren't these going to be super frequent logs? Maybe we want to put them behind a debug flag? (we can leave it to true for now, I realize this is all disabled by default)

forgot to remove these temporary ones. I've done so now.


pkg/util/goschedstats/runnable.go, line 83 at r1 (raw file):

Previously, RaduBerinde wrote…

Consider storing the callback in anatomic.Value instead of using a mutex.

this is now a slice.


pkg/util/goschedstats/runnable.go, line 97 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] I would rename to SetRunnableCountCallback to make it more clear that it "overwrites" any previously set callback.

it now no longer overwrites


pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):

Previously, RaduBerinde wrote…

Won't these add up when a binary is running a lot of unit tests? (e.g. logictests)

I suppose yes, if all the tests in a package are run using the same go runtime -- is that true?
I've switched to using the stopper that is passed to NewServer.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything needs to be in chart_catalog.go, so it can be selected for a custom chart in the UI. Not very sure of the correct organization

The charts that actually show up in the UI by default are defined elsewhere (under ui/).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, @RaduBerinde, and @sumeerbhola)


pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):

Previously, sumeerbhola wrote…

I suppose yes, if all the tests in a package are run using the same go runtime -- is that true?
I've switched to using the stopper that is passed to NewServer.

Yes, they all run from testting.M.Run().

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, and @RaduBerinde)


pkg/kv/db.go, line 264 at r2 (raw file):

Previously, RaduBerinde wrote…

Maybe this belongs in DBContext ?

kv.DBContext? That just shifts things around in KV land.
I guess you are saying that until we find a proper home for this, DBContext is more principled than directly in DB, yes?


pkg/server/node.go, line 925 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] This if should be around all the new code.

Done


pkg/util/admission/granter.go, line 25 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] maybe OverloadThreshold, and use "the number of runnable goroutines per CPU" in the description

Done


pkg/util/admission/granter.go, line 437 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] could use a comment explaining how it is set and how it is used.

Things would be more clear if we just initialized this to GOMAXPROCS when we create the GrantCoordinator. We don't expect this to change dynamically, and tolerating that here adds unnecessary complication.

I've added a comment. I don't see the complexity argument -- this is rather trivial and we've built the rest to handle changes in GOMAXPROCS in case we do vertical scaling in cloud, or customers do it for their deployments.


pkg/util/admission/granter.go, line 443 at r1 (raw file):

"number of that chain" doesn't help much, what does the number represent?
Also, given that it it used in interfaces, maybe it would be a good idea to make it a separate type and document it there.

It's an ID. I've renamed to grantChainID everywhere and also added a grantChainID type and the noGrantChain value.


pkg/util/admission/granter.go, line 792 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] this comment is confusing (we are not using grantChainNum here)

Removed


pkg/util/admission/granter.go, line 844 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] might be cleaner to do coord.grantChainNum++ in the if above, and just return here.

I think it is less bug-prone this way since the "no grant chain" cases are all handled in one place. I've added a code comment above OuterLoop and a code comment below stating the invariant.


pkg/util/admission/work_queue.go, line 42 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit]

.. = {
KVWork: KVAdmissionControlEnabled,
SQLKVResponseWork: SQLKVResponseWorkAdmissionControlEnabled,
}

Done


pkg/util/admission/work_queue.go, line 495 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] document what goes in the channel

Done


pkg/util/admission/work_queue.go, line 188 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] Explain the enabled flag and the expectation that AdmittedWorkDone cannot be called if enabled=false (and it must be called if enabled=true for specific work types)

Done. And updated the usage comments here and in doc.go


pkg/util/goschedstats/runnable.go, line 94 at r1 (raw file):

Previously, RaduBerinde wrote…

[not for this change] We may be able to do some kind of smart smoothing to make this signal react faster to decreases and slower to increases.

Perhaps an exponentially weighted harmonic mean instead of arithmetic mean (the harmonic mean is biased towards the smaller values). This can have some theoretical footing if we think as our real signal to be not the number of goroutines but the inverse of that - i.e. what fraction of a CPU does a running goroutine get.

Ack


pkg/util/leaktest/leaktest.go, line 74 at r2 (raw file):

Previously, RaduBerinde wrote…

Yes, they all run from testting.M.Run().

Thanks

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great!

Reviewed 3 of 22 files at r1, 2 of 8 files at r2, 2 of 4 files at r3, 15 of 15 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @cucaroach, and @sumeerbhola)


pkg/kv/db.go, line 264 at r2 (raw file):

Previously, sumeerbhola wrote…

kv.DBContext? That just shifts things around in KV land.
I guess you are saying that until we find a proper home for this, DBContext is more principled than directly in DB, yes?

Not sure, I just noticed that DBContext contains the stopper which is the closest thing to this conceptually, but maybe that wasn't a very principled choice either.


pkg/server/server.go, line 436 at r4 (raw file):

	}
	cbID := goschedstats.RegisterRunnableCountCallback(gcoord.CPULoad)
	gcoord.RunCallbackOnClose(func() {

Seems odd that we Register here but relegate the Unregister on to gcoord. Why not just set up the closer directly?

stopper.AddCloser(stop.CloserFn(func() {
  goschedstats.UnregisterRunnableCountCallback(cbID)
})

pkg/util/admission/granter.go, line 714 at r4 (raw file):

	coord.granters[SQLSQLResponseWork].(*tokenGranter).refillBurstTokens()
	if coord.grantChainActive {
		coord.grantChainID++

[nit] This could use an explanation. We are basically "expiring" any outstanding grant chains, right? Well, we're actually setting them up to be expired when we start another grant chain? Maybe point in a comment to the logic in continueGrantChain too, as they are logically coupled.


pkg/util/admission/granter.go, line 793 at r4 (raw file):

	coord.mu.Lock()
	defer coord.mu.Unlock()
	if coord.grantChainID > grantChainID && coord.grantChainActive {

[nit] this could use a comment, it's not immediately clear why we're comparing IDs like that. I guess that coord.grantChainID will never be less than grantChainID, so it would be more clear if we did if coord.grantChainActive && coord.grantChainID != grantChainID ?


pkg/util/admission/granter.go, line 818 at r4 (raw file):

	grantBurstLimit := coord.numProcs
	multiplier := 1
	if coord.settings != nil {

It seems odd that we tolerate having no settings and the behavior when there are no settings is questionable - wouldn't we want the default value of 8 here instead of 1?

It would be cleaner to always require settings (we can use MakeTestingClusterSettings() in tests).


pkg/util/admission/granter.go, line 853 at r4 (raw file):

			case grantFailLocal:
				localDone = true
			}

[nit] we should probably panic here for other values, or we may end up with an infinite loop if there's a bug (e.g. if we add another result value in the future)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

It'd be interesting to understand the overhead of this system when the slots are turned way up (such that nothing is being throttled whatsoever but the data structures are being used) vs. master.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @sumeerbhola)

@RaduBerinde
Copy link
Member

It'd be interesting to understand the overhead of this system when the slots are turned way up (such that nothing is being throttled whatsoever but the data structures are being used) vs. master.

I agree. The tpcc roachtest which finds the max # warehouses is a good one to try.

@sumeerbhola
Copy link
Collaborator Author

It'd be interesting to understand the overhead of this system when the slots are turned way up (such that nothing is being throttled whatsoever but the data structures are being used) vs. master.

I haven't looked carefully, or tried to optimize, but it was ~2% when running on my laptop, in an overload situation where throttling was happening.
Screen Shot 2021-05-25 at 1 34 33 PM

- Add roachpb.AdmissionHeader used for KV admission control.
- Initialize this for NewTxnWithSteppingEnabled that is used
  by SQL. Other paths will bypass KV admission control.
- Node.Batch does server-side admission control for KV.
- sql.tableWriterBase and row.txnKVFetcher do KV=>SQL
  response admission control since these are the shared
  read and write paths.
- Cluster settings to enable admission control.
- RegisterRunnableCountCallback is used to register for
  unsmoothed load information, as used in earlier
  experiments.
- The grant chaining in the admission package was causing
  the runnable count to be too low. Grant batching is
  introduced, that goes along with chaining. Additionally,
  we also immediately terminate and start a new grant chain,
  since the lag in asking for termination and waiting for
  it to happen was also preventing runnable counts from
  getting to a higher value.

These changes were evaluated using kv50/enc=false/nodes=1/conc=8192
which causes CPU overload in the absence of admission control:
CPU > 96%, runnable goroutines per cpu > 120, node heartbeat
latency > 1s, clock offset +- 100ms.
After turning on admission control, runnable goroutines
drop to < 40, and cpu utilization to 90-93%, node heartbeat
latency ~200ms, clock offset to +- 1ms.
Queuing is transferred to the WorkQueue, with p50 queueing
latency for KV work at 200ms and KV=>SQL response work
at 1.1s.
Since the CPU utilization decreases, there is an increase
in SQL execution latency since we are leaving more unused
CPU. Without further increasing the runnable count, which
would defeat the purpose of admission control, we cannot
get to a higher utilization. It is likely that this CPU
utilization decrease is more pronounced because of the
small units of work in the KV workoad

Release note (ops change): Node-level admission control that
considers the cpu resource is introduced for KV request
processing, and response processing (in SQL) for KV responses.
This admission control can be enabled using
admission.kv.enabled and admission.sql_kv_response.enabled.
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach and @RaduBerinde)


pkg/kv/db.go, line 264 at r2 (raw file):

Previously, RaduBerinde wrote…

Not sure, I just noticed that DBContext contains the stopper which is the closest thing to this conceptually, but maybe that wasn't a very principled choice either.

I've let this be for now.


pkg/server/server.go, line 436 at r4 (raw file):

Previously, RaduBerinde wrote…

Seems odd that we Register here but relegate the Unregister on to gcoord. Why not just set up the closer directly?

stopper.AddCloser(stop.CloserFn(func() {
  goschedstats.UnregisterRunnableCountCallback(cbID)
})

I looked for something like CloserFn and didn't notice it, and didn't want to define a new type just for this. Thanks for the pointer.
Go is sometimes so crazily wonderful -- treating a function as a type that can have a method!
And then I remember it is a garbage collected language.


pkg/util/admission/granter.go, line 714 at r4 (raw file):

We are basically "expiring" any outstanding grant chains, right? Well, we're actually setting them up to be expired when we start another grant chain? Maybe point in a comment to the logic in continueGrantChain too, as they are logically coupled.

We are allowed to start a new grant chain immediately. And yes we are setting the existing one(s) to expire.
I've moved this to a terminateGrantChain method with a comment since the same logic was used in 2 places. And added a comment in continueGrantChain.


pkg/util/admission/granter.go, line 793 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] this could use a comment, it's not immediately clear why we're comparing IDs like that. I guess that coord.grantChainID will never be less than grantChainID, so it would be more clear if we did if coord.grantChainActive && coord.grantChainID != grantChainID ?

This was a bug: it is now simply `coord.grantChainID != grantChainID


pkg/util/admission/granter.go, line 818 at r4 (raw file):

Previously, RaduBerinde wrote…

It seems odd that we tolerate having no settings and the behavior when there are no settings is questionable - wouldn't we want the default value of 8 here instead of 1?

It would be cleaner to always require settings (we can use MakeTestingClusterSettings() in tests).

Done. I was using 1 for the test.


pkg/util/admission/granter.go, line 853 at r4 (raw file):

Previously, RaduBerinde wrote…

[nit] we should probably panic here for other values, or we may end up with an infinite loop if there's a bug (e.g. if we add another result value in the future)

Done

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented May 26, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented May 26, 2021

Build failed (retrying...):

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 8 of 8 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)

@craig
Copy link
Contributor

craig bot commented May 26, 2021

Build failed:

@sumeerbhola
Copy link
Collaborator Author

I think the build failures have been fixed by #65745

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented May 27, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented May 27, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented May 27, 2021

Build succeeded:

@craig craig bot merged commit ec58395 into cockroachdb:master May 27, 2021
@nvanbenschoten
Copy link
Member

@sumeerbhola out of curiosity, why did you decide to include the AdmissionHeader next to the Header in BatchRequest and RangeFeedRequest, instead of nesting AdmissionHeader inside of the more general Header? I'm concerned that there are places in KV where we copy a Header from on BatchRequest struct to another and that pattern will now lose admission control state.

@sumeerbhola
Copy link
Collaborator Author

why did you decide to include the AdmissionHeader next to the Header in BatchRequest and RangeFeedRequest, instead of nesting AdmissionHeader inside of the more general Header?

I thought I maybe polluting Header by including AdmissionHeader in it, since I didn't know the full extent to where Header is used. And transparent propagation of the AdmissionHeader can also be problematic if the AdmissionHeader_Source field is not changed (resulting in deadlock). We can totally revisit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants