Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: functional-tester: decoupling stresser, fault injector and consistency checker #6446

Closed
mitake opened this issue Sep 16, 2016 · 9 comments · Fixed by #6506
Closed

RFC: functional-tester: decoupling stresser, fault injector and consistency checker #6446

mitake opened this issue Sep 16, 2016 · 9 comments · Fixed by #6506

Comments

@mitake
Copy link
Contributor

mitake commented Sep 16, 2016

Hi @gyuho,

I'm trying to integrate an external fuzzing and fault injection tester (https://github.com/osrg/namazu) to the functional-tester of etcd. The motivation is that fuzzing the well understood workload will increase a chance of finding bugs introduced by newer commits.

For doing it, I tried to decouple the stresser, fault injector and consistency checker for clean integration. Currently these components are tightly integrated and work in a single etcd-tester process. But namazu wants to fuzz the stresser and etcd cluster only. Fault injection is done by namazu and consistency checker shouldn't be fuzzed. So I'm working on some changes of etcd-tester like below:

  • add an option for disabling the fault injection (including failpoints) of functional-tester
  • add an option for only doing consistency check
  • add an option for only doing stresser
  • add an option for not terminating agents after exit of tester

And I noticed that the changes are introducing complexity to the source of the functional-tester :( So I'm considering that creating individual binaries for each functionality would be better than adding the options (e.g. etcd-stresser, etcd-checker). I'm glad if I can hear opinions.

(My very rought wip branch is here: https://github.com/mitake/etcd/tree/namazu)

@mitake
Copy link
Contributor Author

mitake commented Sep 16, 2016

/cc @AkihiroSuda

@xiang90
Copy link
Contributor

xiang90 commented Sep 16, 2016

@mitake I think this is a good idea. We want to make the functionality boundary clear a while ago, but have not got time to do it.

@heyitsanthony also has some opinions. I can give this a closer look later this week.

@gyuho
Copy link
Contributor

gyuho commented Sep 16, 2016

@mitake It's already possible to bypass consistency check (only do stressing with fault injection https://github.com/coreos/etcd/blob/master/tools/functional-tester/etcd-tester/main.go#L49). If you can help decouple more of those, it would be great.

First we need to do compaction more often, before we do the decoupling. Currently, we do compaction every round. If we stop fault injection, we write more keys per round, which would take tester more time to finish compaction.

@heyitsanthony
Copy link
Contributor

@mitake I peeked at your branch and have some suggestions about the general architecture for integrating this.

I agree with your observation about the complexity going off the rails-- the on/off booleans seem like too much special casing. It might be worthwhile to have general stresser/checker/fault injection interfaces with several implementations. The Stresser interface is already there (although possibly not the best interface) but checking/fault interfaces will need to be untangled from the tester. I can try to take a crack at this, but not sure when I'll be able to get around to it.

The tester would be called like etcd-tester --stresser=none --faults=none ... and the etcd-tester will instantiate a dummy stresser:

// nopStresser implements Stresser that does nothing
type nopStresser struct {}
func (s *nopStresser) Stress() error { return nil }
func (s *nopStresser) Cancel() {}
func (s *nopStresser) Report() (int, int) { return 0, 0}

I'm not familiar with the namazu architecture, but would it be possible to have stresser/fault injector implementations that etcd-tester can call to orchestrate namazu (or some general stresser/injector that can kick off a script that would activate/deactivate namazu)?

@mitake
Copy link
Contributor Author

mitake commented Sep 19, 2016

@heyitsanthony the generalized functionalities seem to be good and it will work well with namazu. Can I work on it if you are busy?

@heyitsanthony
Copy link
Contributor

@mitake sure, have at it.

mitake added a commit to mitake/etcd that referenced this issue Sep 22, 2016
This commit decouples stresser from the tester of
functional-tester. For doing it, this commit adds a new option
--stresser to etcd-tester. The option accepts two types of stresser:
"default" and "nop". If the option is "default", etcd-tester stresses
its etcd cluster with the existing stresser. If the option is "nop",
etcd-tester does nothing for stressing.

Partially fixes etcd-io#6446
mitake added a commit to mitake/etcd that referenced this issue Sep 23, 2016
This commit decouples stresser from the tester of
functional-tester. For doing it, this commit adds a new option
--stresser to etcd-tester. The option accepts two types of stresser:
"default" and "nop". If the option is "default", etcd-tester stresses
its etcd cluster with the existing stresser. If the option is "nop",
etcd-tester does nothing for stressing.

Partially fixes etcd-io#6446
@heyitsanthony
Copy link
Contributor

@gyuho @mitake should this be closed or is there still some work to do on fault injection and agent termination?

@gyuho
Copy link
Contributor

gyuho commented Sep 23, 2016

@mitake Do you need still more changes to run namazu with etcd functional-tester?
I think most of the items you mentioned is now doable?

Please feel free to leave this open.

Thanks.

@mitake
Copy link
Contributor Author

mitake commented Sep 24, 2016

@heyitsanthony @gyuho I still want to introduce some changes (as Anthony mentioned, decoupling fault injection and agent termination) but they will be straightforward. Closing this issue is ok. I'll open new PRs later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants