New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpccbench: various improvements to chaos and non-partitioned tests #31275

Merged
merged 6 commits into from Oct 16, 2018

Conversation

Projects
None yet
4 participants
@nvanbenschoten
Member

nvanbenschoten commented Oct 11, 2018

Closes #31234.
Closes #31156.

cc. @awoods187

@nvanbenschoten nvanbenschoten requested a review from m-schneider Oct 11, 2018

@cockroach-teamcity

This comment has been minimized.

Show comment
Hide comment
@cockroach-teamcity

cockroach-teamcity Oct 11, 2018

Member

This change is Reviewable

Member

cockroach-teamcity commented Oct 11, 2018

This change is Reviewable

@nvanbenschoten

This comment has been minimized.

Show comment
Hide comment
@nvanbenschoten

nvanbenschoten Oct 11, 2018

Member

@awoods187 I'm running tpccbench/nodes=7/cpu=16/chaos and tpccbench/nodes=3/cpu=16 now on cockroach-v2.1.0-beta.20181008. I'll let you know what the results are.

Member

nvanbenschoten commented Oct 11, 2018

@awoods187 I'm running tpccbench/nodes=7/cpu=16/chaos and tpccbench/nodes=3/cpu=16 now on cockroach-v2.1.0-beta.20181008. I'll let you know what the results are.

@m-schneider

Reviewed 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/tpcc.go, line 204 at r5 (raw file):

	switch d {
	case singleZone:
		return []string{"us-central1-a"}

I usually ran tests in 1-b, when I was tracking down variability, can't really comment on 1-a performance.


pkg/cmd/roachtest/tpcc.go, line 701 at r5 (raw file):

			LoadWarehouses: 2000,
			EstimatedMax:   1000,

Did you set it about 300 below on purpose so that it runs a bit longer?

@a-robinson

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/tpcc.go, line 495 at r2 (raw file):

		c.Install(ctx, loadNodes, "haproxy")
		c.Put(ctx, cockroach, "./cockroach", loadNodes)
		c.Run(ctx, loadNodes, fmt.Sprintf("./cockroach gen haproxy --insecure --host %s",

So was this just failing previously?


pkg/cmd/roachtest/tpcc.go, line 537 at r2 (raw file):

					Target:       roachNodes.randNode,
					Stopper:      loadDone,
					DrainAndQuit: true,

Is the thought just that there's a bug in the test logic that implements DrainAndQuit?


pkg/cmd/roachtest/tpcc.go, line 419 at r4 (raw file):

	// Run 1/10th of the expected load in the
	// desired distribution by dropping the worker count. This should allow
	// for load-based rebalancing to help distribute load.

We may want to just run --ramp=<rebalanceWait> --duration=1s instead of dropping the worker count so much. Load-based rebalancing ignores differences of less than 100qps per store, and for low warehouse counts will we even see differences that large if we do this?


pkg/cmd/roachtest/tpcc.go, line 426 at r4 (raw file):

		"./workload run tpcc --warehouses=%d --workers=%d --split --scatter "+
		"--duration=%d --tolerate-errors %s {pgurl%s}",
		b.LoadWarehouses, b.LoadWarehouses, rebalanceWait, partArgs, roachNodes)

I'm not sure rebalanceWait is really going to be long enough, but I guess we'll find out.

nvanbenschoten added some commits Oct 10, 2018

tpccbench: run clusters in us-central1 by default
We've found that this region is less noisy.

Release note: None
tpccbench: upload cockroach to load gen nodes for chaos
The load nodes need the cockroach binary to generate their haproxy
config.

Release note: None
tpccbench: fix chaos tests
Closes #31156.

The drain and quit chaos was not working well. For now, this switches
back to the hard chaos, which worked in the past and still works.

Release note: None
tpccbench: remove ulimit commands
These are no longer needed now that roachprod sets the
open file descriptor limit when creating clusters.

Release note: None
@nvanbenschoten

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/cmd/roachtest/tpcc.go, line 495 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

So was this just failing previously?

Yes. I added a new tpccbench/nodes=12/cpu=4/chaos/partition test that will run nightly.


pkg/cmd/roachtest/tpcc.go, line 537 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Is the thought just that there's a bug in the test logic that implements DrainAndQuit?

I didn't track down exactly what was going wrong, but I don't think this ever really worked. We can explore this more when we revist chaos. For now, this is reverting to the approach that we did most of our testing with.


pkg/cmd/roachtest/tpcc.go, line 419 at r4 (raw file):

Previously, a-robinson (Alex Robinson) wrote…
	// Run 1/10th of the expected load in the
	// desired distribution by dropping the worker count. This should allow
	// for load-based rebalancing to help distribute load.

We may want to just run --ramp=<rebalanceWait> --duration=1s instead of dropping the worker count so much. Load-based rebalancing ignores differences of less than 100qps per store, and for low warehouse counts will we even see differences that large if we do this?

Good idea. Done.


pkg/cmd/roachtest/tpcc.go, line 426 at r4 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

I'm not sure rebalanceWait is really going to be long enough, but I guess we'll find out.

Bumped it by 50%. Even with the old wait period, we were able to hit TPC-C 10k without partitioning.


pkg/cmd/roachtest/tpcc.go, line 204 at r5 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

I usually ran tests in 1-b, when I was tracking down variability, can't really comment on 1-a performance.

Done.


pkg/cmd/roachtest/tpcc.go, line 701 at r5 (raw file):

Previously, m-schneider (Masha Schneider) wrote…

Did you set it about 300 below on purpose so that it runs a bit longer?

No, it just never got updated.

@a-robinson

:lgtm:

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/cmd/roachtest/tpcc.go, line 426 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Bumped it by 50%. Even with the old wait period, we were able to hit TPC-C 10k without partitioning.

🎉🎉🎉


pkg/cmd/roachtest/tpcc.go, line 184 at r11 (raw file):

	})
	registerTPCCBenchSpec(r, tpccBenchSpec{
		Nodes:      9,

You know better than me, but isn't 9 nodes a lot for CI?

@nvanbenschoten

TFTRs!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/cmd/roachtest/tpcc.go, line 701 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No, it just never got updated.

Done.


pkg/cmd/roachtest/tpcc.go, line 184 at r11 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

You know better than me, but isn't 9 nodes a lot for CI?

Not if they're 4-core machines. This should be fine.

craig bot pushed a commit that referenced this pull request Oct 16, 2018

Merge #31275
31275: tpccbench: various improvements to chaos and non-partitioned tests r=nvanbenschoten a=nvanbenschoten

Closes #31234.
Closes #31156.

cc. @awoods187 

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig

This comment has been minimized.

Show comment
Hide comment
@craig

craig bot commented Oct 16, 2018

Build succeeded

@craig craig bot merged commit cca2d0f into cockroachdb:master Oct 16, 2018

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@nvanbenschoten nvanbenschoten deleted the nvanbenschoten:nvanbenschoten/fixChaos branch Oct 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment