base,kvserver,server: configuration of provisioned bandwidth for a store #86063

sumeerbhola · 2022-08-12T21:37:54Z

A previous PR #85722 added support for disk bandwidth as a bottlneck
resource, in the admission control package. To utilize this, admission
control needs to be provided the provisioned bandwidth and the observed
read and write bytes. This PR adds configuration support for this via
the StoreSpec (that uses the --store flag). The StoreSpec now has an
optional ProvisionedRateSpec that contains the name of the disk
corresponding to the store, and an optional provisioned bandwidth,
that are specified as
provisioned-rate=name=[:bandwidth=].

The disk-name is used to map the DiskStats, retrieved via the existing
code in status.GetDiskCounters to the correct Store. These DiskStats
contain the read and write bytes. The optional bandwidth is used to
override the provisioned bandwidth set via the new cluster setting
kv.store.admission.provisioned_bandwidth.

Fixes #82898

Release note (ops change): Disk bandwidth constraint can now be
used to control admission of elastic writes. This requires configuration
for each store, via the --store flag, that now contains an optional
provisioned-rate field. The provisioned-rate field, if specified,
needs to provide a disk-name for the store and optionally a disk
bandwidth. If the disk bandwidth is not provided the cluster setting
kv.store.admission.provisioned_bandwidth will be used. The cluster
setting defaults to 0 (which means that the disk bandwidth constraint
is disabled). If the effective disk bandwidth, i.e., using the possibly
overridden cluster setting is 0, there is no disk bandwidth constraint.
Additionally, the admission control cluster setting
admission.disk_bandwidth_tokens.elastic.enabled (defaults to true)
can be used to turn off enforcement even when all the other
configuration has been setup. Turning off enforcement will still
output all the relevant information about disk bandwidth usage, so
can be used to observe part of the mechanism in action.
To summarize, to enable this for a cluster with homogenous disk,
provide a disk-name in the provisioned-rate field in the store-spec,
and set the kv.store.admission.provisioned_bandwidth cluster setting
to the bandwidth limit. To only get information about disk bandwidth
usage by elastic traffic (currently via logs, not metrics), do the
above but also set admission.disk_bandwidth_tokens.elastic.enabled
to false.

Release justification: Low risk, high benefit change that allows
an operator to enable new functionality (disabled by default).

cockroach-teamcity · 2022-08-12T21:38:05Z

This change is

sumeerbhola

Lacks any tests, since I want to get an opinion first on the approach.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)

irfansharif

Didn't look over the code too closely (will do once ready with tests), but at a high-level this LGTM. We should assume that most deployments are single store and use homogenous setups, so gear the release notes/help text to emphasize the use of the cluster setting and only use the per-store flag if there's non-homogeneity. Users with multi-store setups but homogeneity in provisioned bandwidth can still get away with using just the cluster setting. Other things I'd spell out in the help text is what happens if this limit is too high, or too low (you've already mentioned that this is optional, but worth spelling out the benefits of opting in). Since heterogeneity in provisioned bandwidth is rare (there are likely other problems in such deployments and I don't recall any incidents where I've observed such heterogeneity), and this is all optional, I'd also be completely ok ripping out the per-store flags and just have the cluster setting. It's simpler and less error prone. We can see how useful the per-store flag is over the next release once this is out in the wild.

irfansharif · 2022-08-15T19:34:46Z

pkg/base/store_spec.go

+type ProvisionedRateSpec struct {
+	// DiskName is the name of the disk observed by the code in disk_counters.go
+	// when retrieving stats for this store.
+	DiskName string


We don't need this DiskName field, I think. Looking at the top-level StoreSpec, we already have a Path to uniquely identify the disk device we're using. Let's continue using that; it's a required field unless using type=mem, which this doesn't apply to anyway. I'm looking at this test for ex:

cockroach/pkg/base/store_spec_test.go

Line 69 in 2407d3f

{"/mnt/hda1", "", StoreSpec{Path: "/mnt/hda1"}},

This also makes the flag syntax easier (we can drop the name argument).

irfansharif · 2022-08-15T19:55:03Z

pkg/base/store_spec.go

@@ -273,6 +348,10 @@ var fractionRegex = regexp.MustCompile(`^([-]?([0-9]+\.[0-9]*|[0-9]*\.[0-9]+|[0-
 //   - 20%             -> 20% of the available space
 //   - 0.2             -> 20% of the available space
 // - attrs=xxx:yyy:zzz A colon separated list of optional attributes.
+// - provisioned-rate=name=<disk-name>[:bandwidth=<bandwidth-bytes>] The


Ignoring the name thing as discussed above, the syntax this gives us is: cockroach start ... --store=<path>,provisioned-rate=:bandwidth=12500000. Could this instead be cockroach start ... --store=<path>,bandwidth=119MB/s. Similar to the allowable formats for size, but with the (optional) /s suffix. If later we need to learn about provisioned IOPs, we can introduce an iops=42/s thing.

irfansharif · 2022-08-15T19:58:15Z

pkg/kv/kvserver/store.go

+
+// ProvisionedBandwidthForAdmissionControl set a value of the provisioned
+// bandwidth for each store in the cluster.
+var ProvisionedBandwidthForAdmissionControl = settings.RegisterIntSetting(


Use RegisterByteSizeSetting instead? Or better yet, add an RegisterByteRateSetting (shouldn't be difficult) to get to SET CLUSTER SETTING kv.store.admission.provisioned_bandwidth = '125 MiB/s;. There are other cluster settings that should've had this but default to the ByteSize variant (server.consistency_check.max_rate for ex.), so I'll defer to you if you want to improve the status quo.

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @sumeerbhola)

pkg/base/store_spec.go line 168 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

We don't need this DiskName field, I think. Looking at the top-level StoreSpec, we already have a Path to uniquely identify the disk device we're using. Let's continue using that; it's a required field unless using type=mem, which this doesn't apply to anyway. I'm looking at this test for ex:

cockroach/pkg/base/store_spec_test.go

Line 69 in 2407d3f

{"/mnt/hda1", "", StoreSpec{Path: "/mnt/hda1"}},

This also makes the flag syntax easier (we can drop the name argument).

I don't see a way to avoid the name mapping configuration. The stats that we get from disk_counters.go have names like nvme1n1, nvme2n1 (on EBS -- I have not looked at what we see when using PD), so there needs to be a way to map that name to a store. There can also be disks that have nothing to do with any of the stores, and those stats need to be ignored.

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @sumeerbhola)

pkg/base/store_spec.go line 168 at r1 (raw file):

Previously, sumeerbhola wrote…

I don't see a way to avoid the name mapping configuration. The stats that we get from disk_counters.go have names like nvme1n1, nvme2n1 (on EBS -- I have not looked at what we see when using PD), so there needs to be a way to map that name to a store. There can also be disks that have nothing to do with any of the stores, and those stats need to be ignored.

Oh, I missed that. Sorry. For a given filename path, the way to retrieve the corresponding device name (the same as seen by disk_counters.go), you can df -P <path>. I don't know off hand what system calls that makes underneath, or whether there's an importable Go library to retrieve just that, but it should be a thing or we can write such a library.

Sources: https://stackoverflow.com/questions/13403866/how-to-get-a-device-partition-name-of-a-file. That refers to a st_dev, which I think is https://go.dev/src/syscall/ztypes_linux_amd64.go#L102.

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @sumeerbhola)

pkg/base/store_spec.go line 168 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Oh, I missed that. Sorry. For a given filename path, the way to retrieve the corresponding device name (the same as seen by disk_counters.go), you can df -P <path>. I don't know off hand what system calls that makes underneath, or whether there's an importable Go library to retrieve just that, but it should be a thing or we can write such a library.

Sources: https://stackoverflow.com/questions/13403866/how-to-get-a-device-partition-name-of-a-file. That refers to a st_dev, which I think is https://go.dev/src/syscall/ztypes_linux_amd64.go#L102.

I'm also ok just not doing this work in favor of the cluster setting. Asking users to specify the right name is a bit more work on their end, and more steps to get wrong, if it's something we can retrieve programatically given we have a file path to consult.

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist, @irfansharif, and @sumeerbhola)

pkg/base/store_spec.go line 168 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I'm also ok just not doing this work in favor of the cluster setting. Asking users to specify the right name is a bit more work on their end, and more steps to get wrong, if it's something we can retrieve programatically given we have a file path to consult.

I don't see a good platform independent library for this. There is some discussion on https://groups.google.com/g/golang-nuts/c/mu8XMmRXMOk which is linux specific. There is also https://github.com/gyuho/linux-inspect, also linux specific (and I don't know whether we want such a dependency anyway).
The name in df does match what disk_counters.go is producing, though there are many things in the latter not in the former e.g. on EBS I see nvme0n1 and nvme0n1p1, and on PD I see sda, sda1, sda14, which don't appear in df (sdb is the device for /mnt/data1 on PD).

I can take a dependency on that library and make the name optional. Thoughts?

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @sumeerbhola)

pkg/base/store_spec.go line 168 at r1 (raw file):

Previously, sumeerbhola wrote…

I don't see a good platform independent library for this. There is some discussion on https://groups.google.com/g/golang-nuts/c/mu8XMmRXMOk which is linux specific. There is also https://github.com/gyuho/linux-inspect, also linux specific (and I don't know whether we want such a dependency anyway).
The name in df does match what disk_counters.go is producing, though there are many things in the latter not in the former e.g. on EBS I see nvme0n1 and nvme0n1p1, and on PD I see sda, sda1, sda14, which don't appear in df (sdb is the device for /mnt/data1 on PD).

I can take a dependency on that library and make the name optional. Thoughts?

I would prefer just copying some code over instead of taking on a dependency (that library hasn't seen commits in five years so we might as well understand what we're copying over). Realistically we'd be maintaining this dependency, so might as well copy it into tree. We'd also have to make this dependency linux specific so provide fallback behaviour for unsupported platforms. Do you want to defer this work for now by only using the cluster setting? (I ask because I don't feel this is adding much given the headache.)

sumeerbhola · 2022-08-16T17:21:39Z

We'd also have to make this dependency linux specific so provide fallback behaviour for unsupported platforms. Do you want to defer this work for now by only using the cluster setting? (I ask because I don't feel this is adding much given the headache.)

Yes, I would prefer not taking this on now.
But I can't use only the cluster setting, since we need a name mapping. Are you saying specify the device name also in a cluster setting, since it is likely the same across all nodes, and it would work for single store nodes?

irfansharif · 2022-08-16T18:58:48Z

I'm saying something simpler, instead of introducing the diskname->storeID mapping at all, simply use the summed disk counters instead:

cockroach/pkg/server/status/runtime.go

Lines 626 to 633 in 7c38417

    
           func getSummedDiskCounters(ctx context.Context) (diskStats, error) { 
        
           	diskCounters, err := getDiskCounters(ctx) 
        
           	if err != nil { 
        
           		return diskStats{}, err 
        
           	} 
        
           	return sumDiskCounters(diskCounters), nil 
        
           }

And perhaps disable this functionality for multi-store nodes until we do the legwork for diskname -> store ID accounting. Does that sound reasonable? This is all entirely opinion on my part.

sumeerbhola · 2022-08-17T12:46:15Z

disable this functionality for multi-store nodes until we do the legwork for diskname -> store ID accounting.

Multi-store nodes were one of the important production cases where disk bandwidth was scarce, so I definitely do not want to do that. How about we stick with this current configuration schema, and add a TODO that we want to eventually eliminate/reduce the need to provide the name. All the stuff in StoreSpec added here is optional, so this should not introduce configuration churn.

irfansharif · 2022-08-17T13:16:11Z

SGTM.

sumeerbhola

TFTR!
Added tests

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @irfansharif)

pkg/base/store_spec.go line 168 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I would prefer just copying some code over instead of taking on a dependency (that library hasn't seen commits in five years so we might as well understand what we're copying over). Realistically we'd be maintaining this dependency, so might as well copy it into tree. We'd also have to make this dependency linux specific so provide fallback behaviour for unsupported platforms. Do you want to defer this work for now by only using the cluster setting? (I ask because I don't feel this is adding much given the headache.)

Based on our discussion, I have a added a long TODO comment here about eliminating the DiskName field and making ProvisionedRateSpec even more optional.

pkg/base/store_spec.go line 351 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Ignoring the name thing as discussed above, the syntax this gives us is: cockroach start ... --store=<path>,provisioned-rate=:bandwidth=12500000. Could this instead be cockroach start ... --store=<path>,bandwidth=119MB/s. Similar to the allowable formats for size, but with the (optional) /s suffix. If later we need to learn about provisioned IOPs, we can introduce an iops=42/s thing.

I've kept this nested inside provisioned-rate for now, since we do need the disk name and this is all for purposes of rate control by the system. IOPS fits in that naming scheme, which is why it isn't called provisioned-bandwidth.

pkg/kv/kvserver/store.go line 3964 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Use RegisterByteSizeSetting instead? Or better yet, add an RegisterByteRateSetting (shouldn't be difficult) to get to SET CLUSTER SETTING kv.store.admission.provisioned_bandwidth = '125 MiB/s;. There are other cluster settings that should've had this but default to the ByteSize variant (server.consistency_check.max_rate for ex.), so I'll defer to you if you want to improve the status quo.

Switched to RegisterByteSizeSetting

A previous PR cockroachdb#85722 added support for disk bandwidth as a bottlneck resource, in the admission control package. To utilize this, admission control needs to be provided the provisioned bandwidth and the observed read and write bytes. This PR adds configuration support for this via the StoreSpec (that uses the --store flag). The StoreSpec now has an optional ProvisionedRateSpec that contains the name of the disk corresponding to the store, and an optional provisioned bandwidth, that are specified as provisioned-rate=name=<disk-name>[:bandwidth=<bandwidth-bytes>]. The disk-name is used to map the DiskStats, retrieved via the existing code in status.GetDiskCounters to the correct Store. These DiskStats contain the read and write bytes. The optional bandwidth is used to override the provisioned bandwidth set via the new cluster setting kv.store.admission.provisioned_bandwidth. Fixes cockroachdb#82898 Release note (ops change): Disk bandwidth constraint can now be used to control admission of elastic writes. This requires configuration for each store, via the --store flag, that now contains an optional provisioned-rate field. The provisioned-rate field, if specified, needs to provide a disk-name for the store and optionally a disk bandwidth. If the disk bandwidth is not provided the cluster setting kv.store.admission.provisioned_bandwidth will be used. The cluster setting defaults to 0 (which means that the disk bandwidth constraint is disabled). If the effective disk bandwidth, i.e., using the possibly overridden cluster setting is 0, there is no disk bandwidth constraint. Additionally, the admission control cluster setting admission.disk_bandwidth_tokens.elastic.enabled (defaults to true) can be used to turn off enforcement even when all the other configuration has been setup. Turning off enforcement will still output all the relevant information about disk bandwidth usage, so can be used to observe part of the mechanism in action. To summarize, to enable this for a cluster with homogenous disk, provide a disk-name in the provisioned-rate field in the store-spec, and set the kv.store.admission.provisioned_bandwidth cluster setting to the bandwidth limit. To only get information about disk bandwidth usage by elastic traffic (currently via logs, not metrics), do the above but also set admission.disk_bandwidth_tokens.elastic.enabled to false. Release justification: Low risk, high benefit change that allows an operator to enable new functionality (disabled by default).

irfansharif

Reviewed 2 of 11 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist)

sumeerbhola · 2022-08-24T17:03:01Z

TFTR!

sumeerbhola · 2022-08-24T17:03:11Z

bors r=irfansharif

craig · 2022-08-24T21:13:12Z

This PR was included in a batch that was canceled, it will be automatically retried

craig · 2022-08-24T22:33:19Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-08-25T02:15:07Z

Build succeeded:

Bazel Essential CI (Cockroach)

sumeerbhola requested review from tbg, irfansharif and a team August 12, 2022 21:37

sumeerbhola requested review from a team as code owners August 12, 2022 21:37

sumeerbhola requested a review from a team August 12, 2022 21:37

sumeerbhola requested a review from a team as a code owner August 12, 2022 21:37

sumeerbhola commented Aug 12, 2022

View reviewed changes

sumeerbhola force-pushed the disk_bandwidth3 branch from 8745de3 to 2a3e63e Compare August 13, 2022 13:43

tbg requested review from andrewbaptist and removed request for tbg August 15, 2022 12:59

sumeerbhola force-pushed the disk_bandwidth3 branch from 2a3e63e to 09ad169 Compare August 15, 2022 15:34

irfansharif reviewed Aug 15, 2022

View reviewed changes

sumeerbhola commented Aug 15, 2022

View reviewed changes

irfansharif reviewed Aug 15, 2022

View reviewed changes

sumeerbhola commented Aug 16, 2022

View reviewed changes

irfansharif reviewed Aug 16, 2022

View reviewed changes

sumeerbhola force-pushed the disk_bandwidth3 branch from 09ad169 to 87b59d9 Compare August 21, 2022 18:24

blathers-crl bot requested a review from irfansharif August 21, 2022 18:24

sumeerbhola commented Aug 21, 2022

View reviewed changes

sumeerbhola force-pushed the disk_bandwidth3 branch 2 times, most recently from 904e80d to c895c48 Compare August 22, 2022 02:33

sumeerbhola force-pushed the disk_bandwidth3 branch from c895c48 to 833a031 Compare August 22, 2022 21:52

irfansharif approved these changes Aug 24, 2022

View reviewed changes

craig bot merged commit af9d88c into cockroachdb:master Aug 25, 2022

cockroach-teamcity mentioned this pull request Aug 25, 2022

base,kvserver,server: configuration of provisioned bandwidth for a store cockroachdb/docs#14928

Closed

mdlinville mentioned this pull request Nov 22, 2022

[DOC-5471][DOC-4766] Document provisioned-rate store field cockroachdb/docs#15665

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

base,kvserver,server: configuration of provisioned bandwidth for a store #86063

base,kvserver,server: configuration of provisioned bandwidth for a store #86063

sumeerbhola commented Aug 12, 2022 •

edited

Loading

cockroach-teamcity commented Aug 12, 2022

sumeerbhola left a comment

irfansharif left a comment

irfansharif Aug 15, 2022

irfansharif Aug 15, 2022

irfansharif Aug 15, 2022

sumeerbhola left a comment

irfansharif left a comment

irfansharif left a comment

sumeerbhola left a comment

irfansharif left a comment

sumeerbhola commented Aug 16, 2022

irfansharif commented Aug 16, 2022

sumeerbhola commented Aug 17, 2022

irfansharif commented Aug 17, 2022

sumeerbhola left a comment

irfansharif left a comment

sumeerbhola commented Aug 24, 2022

sumeerbhola commented Aug 24, 2022

craig bot commented Aug 24, 2022

craig bot commented Aug 24, 2022

craig bot commented Aug 25, 2022

base,kvserver,server: configuration of provisioned bandwidth for a store #86063

base,kvserver,server: configuration of provisioned bandwidth for a store #86063

Conversation

sumeerbhola commented Aug 12, 2022 • edited Loading

cockroach-teamcity commented Aug 12, 2022

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif Aug 15, 2022

Choose a reason for hiding this comment

irfansharif Aug 15, 2022

Choose a reason for hiding this comment

irfansharif Aug 15, 2022

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

sumeerbhola commented Aug 16, 2022

irfansharif commented Aug 16, 2022

sumeerbhola commented Aug 17, 2022

irfansharif commented Aug 17, 2022

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

sumeerbhola commented Aug 24, 2022

sumeerbhola commented Aug 24, 2022

craig bot commented Aug 24, 2022

craig bot commented Aug 24, 2022

craig bot commented Aug 25, 2022

sumeerbhola commented Aug 12, 2022 •

edited

Loading