Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resiliency tests #271

Closed
6 tasks
sirupsen opened this issue Feb 14, 2015 · 8 comments
Closed
6 tasks

Resiliency tests #271

sirupsen opened this issue Feb 14, 2015 · 8 comments

Comments

@sirupsen
Copy link
Contributor

It'd be awesome if Sarama has resiliency tests with Toxiproxy. I'm working on a Go client currently which should make it fairly easy. It does put an extra dependency on the tests, but I definitely think it's worth it. It'd make me trust Sarama a lot more.

Tell me how I can make your life easier in terms of Toxiproxy for testing. :-) I'll be working on adding Toxics to the client this weekend, but it supports up/down currently. I'll write a README later for the Go client, however, it shouldn't be too hard to figure out.

(incomplete) to-do list:

@eapache @wvanbergen @drdee @yagnik

@wvanbergen
Copy link
Contributor

I mainly want to pair with you to set up some initial tests that use
toxiproxy to get a feel for how to approach this.

Can we also influence internal cluster communication with it, i.e.
connections outside the scope of the go client?

On Sat, Feb 14, 2015 at 9:45 AM, Simon Eskildsen notifications@github.com
wrote:

It'd be awesome if Sarama has resiliency tests with Toxiproxy. I'm working
on a Go client
https://github.com/Shopify/toxiproxy/blob/master/client/client.go
currently which should make it fairly easy. It does put an extra dependency
on the tests, but I definitely think it's worth it. It'd make me trust
Sarama a lot more.

Tell me how I can make your life easier in terms of Toxiproxy for testing.
:-) I'll be working on adding Toxics to the client this weekend, but it
supports up/down currently. I'll write a README later for the Go client,
however, it shouldn't be too hard to figure out.

@eapache https://github.com/eapache @wvanbergen
https://github.com/wvanbergen


Reply to this email directly or view it on GitHub
#271.

@sirupsen
Copy link
Contributor Author

@wvanbergen depends, is it possible to change the listening port for Kafka?

@wvanbergen
Copy link
Contributor

Yeah, but I am not quite sure how it works, because of how it advertises itself in Zookeeper. Might require some experimentation.

@sirupsen
Copy link
Contributor Author

oh yes... that may be a bit complicated then

@eapache
Copy link
Contributor

eapache commented Feb 15, 2015

I'd prefer we do the toxiproxy tests against mock brokers, as that should be much easier to manage. We're testing Sarama here, not kafka :)

@sirupsen
Copy link
Contributor Author

@eapache Sarama is a client of ZK and Kafka though, how can you be sure the failover etc. logic is implemented correctly unless you test against the real thing?

@eapache
Copy link
Contributor

eapache commented Feb 15, 2015

Kafka's broker behaviour is very well-defined, and (in many cases) much easier to simulate with mockbrokers than to trigger in a real cluster (e.g. three-way network partition during failover). Testing against that behaviour is all we need to verify Sarama - if the cluster behaves differently, that's a kafka bug.

For example, in TestProducerFailureRetry we use mock brokers test what happens when leadership moves (e.g. due to a rebalance) and the broker in question returns NotLeaderForPartition. That test could be easily copied+tweaked to test what happens when a broker crashes (disappears) and the connection sends a TCP reset instead. We could also throw toxiproxy in the middle to test what happens when latency is present. Doing a similar test against a live broker cluster is much more complicated.

Which isn't to say that we shouldn't try to be resilient to the kafka cluster behaving outside of spec, but I'm much more comfortable with "best effort" in that scenario (especially since the brokers themselves have been so reliable for us).

Tangential point: sarama does not currently talk to zk at all.

@eapache
Copy link
Contributor

eapache commented Feb 21, 2016

Significant chunks of this have ended up done over the past year, and the rest have been effectively forgotten. The main toxiproxy tests (#444) still haven't been merged because they exposed an upstream kafka bug which hasn't been fixed yet, but our test suite is in a much better spot now regardless.

@eapache eapache closed this as completed Feb 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants