roachtest: add tpccbench style workload to measure throughput while sustaining an avg latency threshold. by aayushshah15 · Pull Request #42203 · cockroachdb/cockroach

aayushshah15 · 2019-11-05T20:53:48Z

This PR adds a new benchmarking workload that can be used to measure max
throughput on the kv workload (under various distributions) that be
sustained while maintaining an avg latency lower than a configured
threshold.

This tool was primarily written in order to demonstrate the performance
characteristics of hash sharded indexes under write-heavy sequential
worklaods, which would, in the absence of a hash sharded index, cause
a hotspot on a single range (and consequently, a single node).

Release note: None

cockroach-teamcity · 2019-11-05T20:53:57Z

This change is

aayushshah15 · 2019-11-05T20:55:56Z

@ajwerner, once you feel that the methodology is correct, I can spin up a few runs of the benchmark for the schema variations we're interested in (sharded/unsharded with/without secondary index).

ajwerner

@ajwerner, once you feel that the methodology is correct, I can spin up a few runs of the benchmark for the schema variations we're interested in (sharded/unsharded with/without secondary index).

I encourage you to try it out with uniform runs and see if the data it produces line up with data in https://roachperf.crdb.dev/

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

ajwerner

My first pass is going to mean a bunch of refactoring so here it is

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @ajwerner)

pkg/cmd/roachtest/kvbench.go, line 193 at r1 (raw file):

			initCmd := strings.Builder{}
			initCmd.WriteString("CREATE TABLE kv (k BIGINT NOT NULL, v BYTES NOT NULL")

This stuff feels like it should live in workload. This is the workload init stuff. While the search itself should live here, the workload setup and run should live in workload. We want the sharded sequential kv workload to be just like any other workload.

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @ajwerner)

pkg/cmd/roachtest/kvbench.go, line 193 at r1 (raw file):

Previously, ajwerner wrote…

This stuff feels like it should live in workload. This is the workload init stuff. While the search itself should live here, the workload setup and run should live in workload. We want the sharded sequential kv workload to be just like any other workload.

Yes, I agree with this.

pkg/cmd/roachtest/kvbench.go, line 303 at r1 (raw file):

		if failErr == nil {
			ttycolor.Stdout(ttycolor.Green)
			t.l.Printf(`--- PASS: kv workload maintained a p50 latency of %0.1fms`+

p50 != avg

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, and @nvanbenschoten)

pkg/cmd/roachtest/kvbench.go, line 193 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yes, I agree with this.

Moved the initialization logic into workload.

pkg/cmd/roachtest/kvbench.go, line 303 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

p50 != avg

Fixed.

nvb

Reviewed 3 of 3 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @ajwerner)

pkg/cmd/roachtest/kvbench.go, line 39 at r2 (raw file):

type kvBenchSpec struct {
	Nodes                  int

Separate these fields into a group of fields that go into the name and a group of fields that don't. That will make it easier to understand which fields are configuration options and which aren't. This is important because we can't have duplicate test names.

pkg/cmd/roachtest/kvbench.go, line 51 at r2 (raw file):

	// splitting mechanism should achieve something similar to this, but that would take
	// too long for it to be feasibly relied upon for the purposes of this benchmark.
	ShouldPreSplitTable bool

Why is this a separate config flag? It's always the same as NumShards > 0.

pkg/cmd/roachtest/kvbench.go, line 51 at r2 (raw file):

	// splitting mechanism should achieve something similar to this, but that would take
	// too long for it to be feasibly relied upon for the purposes of this benchmark.
	ShouldPreSplitTable bool

Why do we need to presplit when using a hash index? Isn't load-based splitting sufficient once hash indexing gives it clear split points?

pkg/cmd/roachtest/kvbench.go, line 54 at r2 (raw file):

	SecondaryIndex      bool
	RampDurationSecs    int
	LoadDurationSecs    int

Do these two need to be configurable?

pkg/cmd/roachtest/kvbench.go, line 56 at r2 (raw file):

	LoadDurationSecs    int
	MinVersion          string
	Tags                []string

If we don't need Tags or MinVersion, let's remove them.

pkg/cmd/roachtest/kvbench.go, line 61 at r2 (raw file):

func registerKVBenchSpec(r *testRegistry, b kvBenchSpec) {
	nameParts := []string{
		"kvbench",

We should make it clear in the name that this is running kv0.

pkg/cmd/roachtest/kvbench.go, line 64 at r2 (raw file):

		fmt.Sprintf("nodes=%d", b.Nodes),
		fmt.Sprintf("cpu=%d", b.CPUs),
		fmt.Sprintf("shards=%d", b.NumShards),

Don't include this if the table isn't sharded. This isn't really a core property of the workload, it's more of an auxiliary option that will be useful for testing out hash indexes.

pkg/cmd/roachtest/kvbench.go, line 79 at r2 (raw file):

	}

	switch b.SecondaryIndex {

if ... else ...

And we prefer to keep the name short. How about something like if b.SecondaryIndex { nameParts = append(nameParts, "2nd_idx") }

pkg/cmd/roachtest/kvbench.go, line 234 at r2 (raw file):

	// sustaining an avg latency of less than `LatencyThresholdMs`.
	// TODO(aayush): `avg` here is just an arbitrary statistic, should I make the
	// concerned statistic a config option?

I wouldn't. Let's keep the configs to a minimum wherever possible.

pkg/cmd/roachtest/kvbench.go, line 239 at r2 (raw file):

		t.Fatal(errors.Wrapf(err, `failed to create temp results dir`))
	}
	s := search.NewLineSearcher(100, 10000000, b.EstimatedMaxThroughput, initStepSize, precision)

Use /* name_of_arg */ for the inline values, per our style guide.

pkg/cmd/roachtest/kvbench.go, line 253 at r2 (raw file):

			db := c.Conn(ctx, 1)

			initCmd := strings.Builder{}

var initCmd strings.Builder

pkg/cmd/roachtest/kvbench.go, line 258 at r2 (raw file):

			// load-based splitting, which can significantly change the underlying layout
			// of the table and affect benchmark results.
			fmt.Fprintf(&initCmd, `./workload init kv --drop --num-shards=%d {pgurl%s}`,

Since we're not maintaining any state across runs, I think we can do better. Let's just c.Wipe before each iteration.

pkg/cmd/roachtest/kvbench.go, line 271 at r2 (raw file):

				// but failing seems like a better idea since this is almost certainly a
				// configuration error.
				panic(`ShouldPreSplitTable only applies to sharded` +

Return an error instead of panic-ing.

pkg/cmd/roachtest/kvbench.go, line 274 at r2 (raw file):

					`kv schemas (ie NumShards > 0)`)
			}
			splitCmd := strings.Builder{}

var splitCmd strings.Builder

pkg/cmd/roachtest/kvbench.go, line 307 at r2 (raw file):

			// would lead to the `searchPredicate` always returning true as long as the
			// latency at this throughput ceiling was below the configured threshold.
			const loadConcurrency int = 128

nit: no need for int

pkg/cmd/roachtest/kvbench.go, line 310 at r2 (raw file):

			fmt.Fprintf(&workloadCmd,
				`./workload run kv --ramp=%ds --duration=%ds {pgurl%s}`+

Pass --read-percent=0 to be clear that this is kv0.

pkg/cmd/roachtest/kvbench.go, line 320 at r2 (raw file):

				workloadCmd.WriteString(` --zipfian`)
			case random:
				// Nothing, since random is the default.

Might as well pass the flag so that we can change the default without breaking this.

pkg/cmd/roachtest/kvbench.go, line 372 at r2 (raw file):

}

type result struct {

This is being leaked into the entire roachtest packages scope. Because of that, it's probably not appropriately named.

pkg/cmd/roachtest/kvbench.go, line 380 at r2 (raw file):

// so this could definitely be cleaner and better abstracted.
func newResultFromSnapshots(maxrate int, snapshots map[string][]histogram.SnapshotTick) *result {
	var start time.Time

nit: var start, end time.Time

pkg/workload/kv/kv.go, line 186 at r2 (raw file):

		}
		checkConstraint := strings.Builder{}
		checkConstraint.WriteString(`(shard in (`)

nit: IN

pkg/workload/kv/kv.go, line 193 at r2 (raw file):

			fmt.Fprintf(&checkConstraint, "%d", i)
		}
		checkConstraint.WriteString("))")

nit: move the second parentheses into the template.

pkg/workload/kv/kv.go, line 194 at r2 (raw file):

		}
		checkConstraint.WriteString("))")
		table.Schema = fmt.Sprintf(schema, w.shards, checkConstraint.String())

nit: no need for .String()

pkg/workload/kv/kv.go, line 256 at r2 (raw file):

		// are available. See
		// https://github.com/cockroachdb/cockroach/issues/39340#issuecomment-535338071
		// for details.

Add a TODO here to fix this once that's addressed.

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, and @nvanbenschoten)

pkg/cmd/roachtest/kvbench.go, line 39 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Separate these fields into a group of fields that go into the name and a group of fields that don't. That will make it easier to understand which fields are configuration options and which aren't. This is important because we can't have duplicate test names.

Done.

pkg/cmd/roachtest/kvbench.go, line 51 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why is this a separate config flag? It's always the same as NumShards > 0.

I thought it could be useful to demonstrate / measure how long load-based splitting takes to achieve a "good" set of splits. Maybe this is too vague. Removed for now.

pkg/cmd/roachtest/kvbench.go, line 51 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we need to presplit when using a hash index? Isn't load-based splitting sufficient once hash indexing gives it clear split points?

Relying on load-based splitting made the benchmarks require a way longer ramp duration to achieve an even load across nodes. Let me know if you'd prefer not splitting in the benchmark.

pkg/cmd/roachtest/kvbench.go, line 54 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do these two need to be configurable?

In some sense, having these configurable is sort of a footgun. While I was playing with this stuff, I noticed that if these durations are too short, the unsharded/unsplit schema (with secondary index) performed better than the sharded/split schema (also with secondary index); since we go through the single-range fast path until the size of the data exceeds our 64mb threshold. Removed.

pkg/cmd/roachtest/kvbench.go, line 56 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we don't need Tags or MinVersion, let's remove them.

Done.

pkg/cmd/roachtest/kvbench.go, line 61 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should make it clear in the name that this is running kv0.

Done.

pkg/cmd/roachtest/kvbench.go, line 64 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Don't include this if the table isn't sharded. This isn't really a core property of the workload, it's more of an auxiliary option that will be useful for testing out hash indexes.

Done.

pkg/cmd/roachtest/kvbench.go, line 79 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

if ... else ...

And we prefer to keep the name short. How about something like if b.SecondaryIndex { nameParts = append(nameParts, "2nd_idx") }

Done.

pkg/cmd/roachtest/kvbench.go, line 234 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I wouldn't. Let's keep the configs to a minimum wherever possible.

Done.

pkg/cmd/roachtest/kvbench.go, line 239 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Use /* name_of_arg */ for the inline values, per our style guide.

Done.

pkg/cmd/roachtest/kvbench.go, line 253 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

var initCmd strings.Builder

Done.

pkg/cmd/roachtest/kvbench.go, line 258 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Since we're not maintaining any state across runs, I think we can do better. Let's just c.Wipe before each iteration.

Done.

pkg/cmd/roachtest/kvbench.go, line 271 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Return an error instead of panic-ing.

Removed the ShouldPreSplitTable option.

pkg/cmd/roachtest/kvbench.go, line 274 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

var splitCmd strings.Builder

Done.

pkg/cmd/roachtest/kvbench.go, line 307 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: no need for int

Done.

pkg/cmd/roachtest/kvbench.go, line 310 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Pass --read-percent=0 to be clear that this is kv0.

Done.

pkg/cmd/roachtest/kvbench.go, line 320 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Might as well pass the flag so that we can change the default without breaking this.

Done.

pkg/cmd/roachtest/kvbench.go, line 372 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is being leaked into the entire roachtest packages scope. Because of that, it's probably not appropriately named.

Renamed.

pkg/cmd/roachtest/kvbench.go, line 380 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: var start, end time.Time

Done.

pkg/workload/kv/kv.go, line 186 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: IN

Done.

pkg/workload/kv/kv.go, line 193 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: move the second parentheses into the template.

Done.

pkg/workload/kv/kv.go, line 256 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add a TODO here to fix this once that's addressed.

Done.

aayushshah15 · 2019-12-17T20:55:13Z

How do we get this merged in before I leave?

nvb

this is great! Mind posting a few perf numbers on this PR before merging though? What are the results of these new benchmarks?

Reviewed 2 of 2 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @ajwerner)

pkg/cmd/roachtest/kvbench.go, line 220 at r3 (raw file):

		// affect benchmark results.
		c.Wipe(ctx, roachNodes)
		c.Stop(ctx, roachNodes)

I think Wipe also Stops the cluster, so we shouldn't need this.

pkg/cmd/roachtest/kvbench.go, line 274 at r3 (raw file):

			const loadConcurrency = 192
			const ramp = 300
			const duration = 300

nit: make these two typed constants (time.Duration) and use .Second() when passing to the cmd.

ajwerner

too

Sorry for the delay, this fell off my radar.

Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @aayushshah15)

pkg/workload/kv/kv.go, line 42 at r3 (raw file):

		INDEX (v)
	)`
	// TODO(aayush): Change this to use the "easier" hash sharded index syntax once that

feel free to make this ajwerner

pkg/workload/kv/kv.go, line 254 at r3 (raw file):

		}
	} else {
		// TODO(aayush): We're currently manually plumbing down the computed shard column

ditto.

sustaining an avg latency threshold. This PR adds a new benchmarking workload that can be used to measure max throughput on the `kv` workload (under various distributions) that be sustained while maintaining an avg latency lower than a configured threshold. This tool was primarily written in order to demonstrate the performance characteristics of hash sharded indexes under write-heavy sequential worklaods, which would, in the absence of a hash sharded index, cause a hotspot on a single range (and consequently, a single node). Release note: None

aayushshah15 · 2019-12-20T18:25:20Z

TFTRs! The perf numbers on my RFC were all generated with kvbench. Here you go: https://github.com/cockroachdb/cockroach/pull/42924/files?short_path=8778ea5#diff-8778ea5fd9afd71f6a0a0818d734f6a3.

aayushshah15 · 2019-12-20T18:26:06Z

bors r+

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner and @nvanbenschoten)

pkg/cmd/roachtest/kvbench.go, line 220 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think Wipe also Stops the cluster, so we shouldn't need this.

Done.

pkg/cmd/roachtest/kvbench.go, line 274 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: make these two typed constants (time.Duration) and use .Second() when passing to the cmd.

Done.

pkg/workload/kv/kv.go, line 42 at r3 (raw file):

Previously, ajwerner wrote…

feel free to make this ajwerner

Done.

pkg/workload/kv/kv.go, line 254 at r3 (raw file):

Previously, ajwerner wrote…

ditto.

Done.

42203: roachtest: add tpccbench style workload to measure throughput while sustaining an avg latency threshold. r=aayushshah15 a=aayushshah15 This PR adds a new benchmarking workload that can be used to measure max throughput on the `kv` workload (under various distributions) that be sustained while maintaining an avg latency lower than a configured threshold. This tool was primarily written in order to demonstrate the performance characteristics of hash sharded indexes under write-heavy sequential worklaods, which would, in the absence of a hash sharded index, cause a hotspot on a single range (and consequently, a single node). Release note: None 43380: changefeedccl: transfer todos from aayush to dan/ajwerner r=aayushshah15 a=aayushshah15 Transferring my TODOs to dan/ajwerner before I leave. Release note: None Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>

craig · 2019-12-20T19:19:33Z

Build succeeded

GitHub CI (Cockroach)

aayushshah15 assigned ajwerner Nov 5, 2019

aayushshah15 assigned aayushshah15 and unassigned ajwerner Nov 5, 2019

aayushshah15 requested a review from ajwerner November 5, 2019 20:54

ajwerner reviewed Nov 5, 2019

View reviewed changes

ajwerner suggested changes Nov 5, 2019

View reviewed changes

nvb reviewed Nov 6, 2019

View reviewed changes

aayushshah15 force-pushed the kv_bench branch 5 times, most recently from 895f8e8 to 6c675cb Compare November 7, 2019 00:07

aayushshah15 commented Nov 7, 2019

View reviewed changes

aayushshah15 force-pushed the kv_bench branch 3 times, most recently from 09d0a97 to 51e7af9 Compare November 13, 2019 16:39

nvb reviewed Nov 18, 2019

View reviewed changes

aayushshah15 force-pushed the kv_bench branch 2 times, most recently from 6ba0e81 to fc9450a Compare November 26, 2019 03:27

aayushshah15 commented Nov 26, 2019

View reviewed changes

aayushshah15 force-pushed the kv_bench branch 2 times, most recently from 56877a7 to a0f945f Compare November 27, 2019 22:38

aayushshah15 requested review from ajwerner and nvb December 17, 2019 20:54

nvb approved these changes Dec 17, 2019

View reviewed changes

ajwerner approved these changes Dec 20, 2019

View reviewed changes

aayushshah15 force-pushed the kv_bench branch from a0f945f to 7026018 Compare December 20, 2019 18:25

aayushshah15 commented Dec 20, 2019

View reviewed changes

craig Bot merged commit 7026018 into cockroachdb:master Dec 20, 2019

Conversation

aayushshah15 commented Nov 5, 2019

Uh oh!

cockroach-teamcity commented Nov 5, 2019

Uh oh!

aayushshah15 commented Nov 5, 2019

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

aayushshah15 left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

aayushshah15 left a comment

Choose a reason for hiding this comment

Uh oh!

aayushshah15 commented Dec 17, 2019

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

aayushshah15 commented Dec 20, 2019

Uh oh!

aayushshah15 commented Dec 20, 2019

Uh oh!

aayushshah15 left a comment

Choose a reason for hiding this comment

Uh oh!

craig Bot commented Dec 20, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants