Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign uptpccbench: various improvements to chaos and non-partitioned tests #31275
Conversation
nvanbenschoten
requested a review
from
m-schneider
Oct 11, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nvanbenschoten
Oct 11, 2018
Member
@awoods187 I'm running tpccbench/nodes=7/cpu=16/chaos and tpccbench/nodes=3/cpu=16 now on cockroach-v2.1.0-beta.20181008. I'll let you know what the results are.
|
@awoods187 I'm running |
m-schneider
reviewed
Oct 11, 2018
Reviewed 1 of 1 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/cmd/roachtest/tpcc.go, line 204 at r5 (raw file):
switch d { case singleZone: return []string{"us-central1-a"}
I usually ran tests in 1-b, when I was tracking down variability, can't really comment on 1-a performance.
pkg/cmd/roachtest/tpcc.go, line 701 at r5 (raw file):
LoadWarehouses: 2000, EstimatedMax: 1000,
Did you set it about 300 below on purpose so that it runs a bit longer?
a-robinson
requested changes
Oct 11, 2018
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/cmd/roachtest/tpcc.go, line 495 at r2 (raw file):
c.Install(ctx, loadNodes, "haproxy") c.Put(ctx, cockroach, "./cockroach", loadNodes) c.Run(ctx, loadNodes, fmt.Sprintf("./cockroach gen haproxy --insecure --host %s",
So was this just failing previously?
pkg/cmd/roachtest/tpcc.go, line 537 at r2 (raw file):
Target: roachNodes.randNode, Stopper: loadDone, DrainAndQuit: true,
Is the thought just that there's a bug in the test logic that implements DrainAndQuit?
pkg/cmd/roachtest/tpcc.go, line 419 at r4 (raw file):
// Run 1/10th of the expected load in the // desired distribution by dropping the worker count. This should allow // for load-based rebalancing to help distribute load.
We may want to just run --ramp=<rebalanceWait> --duration=1s instead of dropping the worker count so much. Load-based rebalancing ignores differences of less than 100qps per store, and for low warehouse counts will we even see differences that large if we do this?
pkg/cmd/roachtest/tpcc.go, line 426 at r4 (raw file):
"./workload run tpcc --warehouses=%d --workers=%d --split --scatter "+ "--duration=%d --tolerate-errors %s {pgurl%s}", b.LoadWarehouses, b.LoadWarehouses, rebalanceWait, partArgs, roachNodes)
I'm not sure rebalanceWait is really going to be long enough, but I guess we'll find out.
nvanbenschoten
added some commits
Oct 10, 2018
nvanbenschoten
reviewed
Oct 15, 2018
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/cmd/roachtest/tpcc.go, line 495 at r2 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
So was this just failing previously?
Yes. I added a new tpccbench/nodes=12/cpu=4/chaos/partition test that will run nightly.
pkg/cmd/roachtest/tpcc.go, line 537 at r2 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
Is the thought just that there's a bug in the test logic that implements
DrainAndQuit?
I didn't track down exactly what was going wrong, but I don't think this ever really worked. We can explore this more when we revist chaos. For now, this is reverting to the approach that we did most of our testing with.
pkg/cmd/roachtest/tpcc.go, line 419 at r4 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
// Run 1/10th of the expected load in the // desired distribution by dropping the worker count. This should allow // for load-based rebalancing to help distribute load.We may want to just run
--ramp=<rebalanceWait> --duration=1sinstead of dropping the worker count so much. Load-based rebalancing ignores differences of less than 100qps per store, and for low warehouse counts will we even see differences that large if we do this?
Good idea. Done.
pkg/cmd/roachtest/tpcc.go, line 426 at r4 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
I'm not sure
rebalanceWaitis really going to be long enough, but I guess we'll find out.
Bumped it by 50%. Even with the old wait period, we were able to hit TPC-C 10k without partitioning.
pkg/cmd/roachtest/tpcc.go, line 204 at r5 (raw file):
Previously, m-schneider (Masha Schneider) wrote…
I usually ran tests in 1-b, when I was tracking down variability, can't really comment on 1-a performance.
Done.
pkg/cmd/roachtest/tpcc.go, line 701 at r5 (raw file):
Previously, m-schneider (Masha Schneider) wrote…
Did you set it about 300 below on purpose so that it runs a bit longer?
No, it just never got updated.
a-robinson
approved these changes
Oct 15, 2018
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/cmd/roachtest/tpcc.go, line 426 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Bumped it by 50%. Even with the old wait period, we were able to hit TPC-C 10k without partitioning.
pkg/cmd/roachtest/tpcc.go, line 184 at r11 (raw file):
}) registerTPCCBenchSpec(r, tpccBenchSpec{ Nodes: 9,
You know better than me, but isn't 9 nodes a lot for CI?
nvanbenschoten
dismissed
m-schneider’s
stale review
Oct 16, 2018
Dismissed.
nvanbenschoten
reviewed
Oct 16, 2018
TFTRs!
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/cmd/roachtest/tpcc.go, line 701 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No, it just never got updated.
Done.
pkg/cmd/roachtest/tpcc.go, line 184 at r11 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
You know better than me, but isn't 9 nodes a lot for CI?
Not if they're 4-core machines. This should be fine.
bot
pushed a commit
that referenced
this pull request
Oct 16, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
craig
bot
commented
Oct 16, 2018
Build succeeded |
nvanbenschoten commentedOct 11, 2018
Closes #31234.
Closes #31156.
cc. @awoods187