tpccbench: use HAProxy when running chaos benchmarks#26075
tpccbench: use HAProxy when running chaos benchmarks#26075craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Loadgen's tpcc implementation pins workers to individual driver connections, and it looks like workload's tpcc does the same. Is there a round-robin policy I'm missing? If so, I would expect it to look more like kv. And regardless of whether there is or not, shouldn't a load balancer be able to make better decisions than a simple round-robin policy because it can remember which nodes are down and avoid sending them each 1/nth of the traffic only to have all of that traffic fail? My intuition is that this should result in fewer failed operations. |
|
Ah, I keep forgetting that tpcc pins workers to connections (surprising given that I added that code). If we removed that pinning (or made it optional), the builtin round-robin driver should be essentially the same as using haproxy. Why use one over the other? Convenience. The round-robin driver was initially added to avoid having to setup haproxy when doing perf experiments.
I'm pretty sure we configure haproxy with a round-robin load balancing policy. We can certainly investigate more complex load balancing, though I'm not sure if we'll be able to achieve it with haproxy given that it doesn't know about the pgwire protocol. Regardless, this PR is fine to merge given how small it is. I'm mostly talking to spread my concern about placing too much faith in haproxy (especially the out of the box configuration). We might have to put effort into either improving that configuration or providing or own load balancing. To give you a concrete example of a concern: workers open connections to haproxy which in turn opens a long lived connection to a cockroach node. If a new cockroach node comes on line (or restarts after being killed), haproxy will not utilize this node until a worker opens a new connection. |
Or switching from haproxy to something that does understand the postgres protocol (like pgbouncer or pgpool). |
This change adjusts tpccbench to run an HAProxy server on the load generator when running chaos benchmarking. HAProxy is then configured to point at the cluster and the load generator is pointed at HAProxy. This allows the load generator to fail-over when nodes go down without any extra client-side logic. It also has the nice effect of being the first roachtest to use HAProxy, a tool that we recommend our users use. This still needs some tweaking. For instance, I still need to tune the haproxy.cfg to support a larger number of concurrent requests and to fail over faster. Release note: None
ee9e337 to
2f6a3a9
Compare
That's a great point. HAProxy has a
Have we ever tried running either of these against Cockroach? |
I ran some trivial tests with pgbouncer a while back and it seemed to work. @asubiotto has tried pgpool; it looks like there were issues in older versions of pgpool but they may be fixed now (#13009 (comment)) |
|
bors r+ |
26075: tpccbench: use HAProxy when running chaos benchmarks r=nvanbenschoten a=nvanbenschoten This change adjusts tpccbench to run an HAProxy server on the load generator when running chaos benchmarking. HAProxy is then configured to point at the cluster and the load generator is pointed at HAProxy. This allows the load generator to fail-over when nodes go down without any extra client-side logic. It also has the nice effect of being the first roachtest to use HAProxy, a tool that we recommend our users use. This still needs some tweaking. For instance, I still need to tune the haproxy.cfg to support a larger number of concurrent requests and to fail over faster. First commit is #26073. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
|
Whatever we decide here, I already like the dogfooding aspect of this. We have |
This change adjusts tpccbench to run an HAProxy server on the load
generator when running chaos benchmarking. HAProxy is then configured
to point at the cluster and the load generator is pointed at HAProxy.
This allows the load generator to fail-over when nodes go down without
any extra client-side logic.
It also has the nice effect of being the first roachtest to use HAProxy,
a tool that we recommend our users use.
This still needs some tweaking. For instance, I still need to tune the
haproxy.cfg to support a larger number of concurrent requests and to
fail over faster.
Release note: None