Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: design for a network simulator debugging tool #27

Closed
andybons opened this issue Jun 4, 2014 · 16 comments
Closed

tests: design for a network simulator debugging tool #27

andybons opened this issue Jun 4, 2014 · 16 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-question A question rather than an issue. No code/spec/doc change needed. help wanted Help is requested / needed by the one who filed the issue to fix it. O-community Originated from the community
Milestone

Comments

@andybons
Copy link
Contributor

andybons commented Jun 4, 2014

Megastore has a pseudo-random test framework capable of exploring the space of all possible orderings and delays of communications between simulated nodes or threads, and deterministically reproduces the same behavior given the same seed.

This would be unbelievably useful.

/cc @toddlipcon since he’s had experience designing something like this for HDFS HA.

@strangemonad
Copy link
Contributor

have a look at Jepsen:
https://github.com/aphyr/jepsen

Kyle’s done a great job with simulating failures and network partitions for many DBs

shawn

On Jun 4, 2014, at 4:48 PM, Andrew Bonventre notifications@github.com wrote:

https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQJMWithFaults.java#L202


Reply to this email directly or view it on GitHub.

@strangemonad
Copy link
Contributor

have a look at Jepsen:
https://github.com/aphyr/jepsen

and the series of blog posts on breaking serializability in distributed DBs:
http://aphyr.com/tags/jepsen

@andybons
Copy link
Contributor Author

andybons commented Jun 4, 2014

Sweet.

On Wed, Jun 4, 2014 at 5:28 PM, Shawn Morel notifications@github.com
wrote:

have a look at Jepsen:
https://github.com/aphyr/jepsen

and the series of blog posts on breaking serializability in distributed
DBs:
http://aphyr.com/tags/jepsen


Reply to this email directly or view it on GitHub
#27 (comment)
.

@spencerkimball
Copy link
Member

Wow, Jepsen seems great. We should obviously use it. Thanks Shawn!

On Wed, Jun 4, 2014 at 5:29 PM, Andrew Bonventre notifications@github.com
wrote:

Sweet.

On Wed, Jun 4, 2014 at 5:28 PM, Shawn Morel notifications@github.com
wrote:

have a look at Jepsen:
https://github.com/aphyr/jepsen

and the series of blog posts on breaking serializability in distributed
DBs:
http://aphyr.com/tags/jepsen


Reply to this email directly or view it on GitHub
<
https://github.com/cockroachdb/cockroach/issues/27#issuecomment-45154135>
.


Reply to this email directly or view it on GitHub
#27 (comment)
.

@jonasschneider
Copy link

For randomized testing, instead of the "attack-based" approach of Jepsen, there's also this: https://github.com/stripe-ctf/octopus -- @zenazn might be interested.

@spencerkimball
Copy link
Member

Cool, octopus sounds helpful.

On Fri, Aug 15, 2014 at 8:31 AM, Jonas Schneider notifications@github.com
wrote:

For randomized testing, instead of the "attack-based" approach of Jepsen,
there's also this: https://github.com/stripe-ctf/octopus -- @zenazn
https://github.com/zenazn might be interested.


Reply to this email directly or view it on GitHub
#27 (comment)
.

@xiang90
Copy link
Contributor

xiang90 commented Nov 28, 2014

We are exploring something similar for etcd.
Personally, I do not think jespen is good enough for this kind of test.

@xiang90
Copy link
Contributor

xiang90 commented Feb 1, 2015

@andybons @spencerkimball We just started a pseudo-random test framework inside raft.

The first step is to randomly probing the possible testing space with a seed, so that we can reproduce the found problem.
The second step is to actually recording the probed testing space and find a way to end the testing, when we are sure that we have searched all possible sequences for the internal state machine.

etcd-io/etcd#2203

@andybons
Copy link
Contributor Author

andybons commented Feb 2, 2015

@xiang90 👍

@bdarnell
Copy link
Contributor

Making a note of this for later: https://github.com/ahorn/linearizability-checker is another tool for analyzing Jepsen logs. It's similar to but more efficient than Aphyr's Knossos tool.

@knz
Copy link
Contributor

knz commented Feb 12, 2016

Ok this is really a superset of what I was looking at with #4036. Let me study these options and then I'll see what I can propose.

@AkihiroSuda
Copy link

We have also a similar tool which increases the non-deternimism of thread scheduling (and so on).
This tool can be easily used with the existing unit tests and Jepsen tests.

http://osrg.github.io/earthquake/post/fosdem2016/

@petermattis petermattis added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Feb 14, 2016
@petermattis petermattis modified the milestone: Beta Feb 14, 2016
@petermattis petermattis removed this from the Beta milestone Mar 8, 2016
@petermattis petermattis changed the title Design for a network simulator debugging tool tests: design for a network simulator debugging tool Mar 31, 2016
@knz
Copy link
Contributor

knz commented Jul 5, 2016

I'm not sure this is happening in this way any more. Not planning to work on this soon.

@knz knz removed their assignment Jul 5, 2016
@knz
Copy link
Contributor

knz commented Jul 5, 2016

cc @knz for later

@bdarnell
Copy link
Contributor

We've been using jepsen for a long time now and have no specific plans to build anything else in this area, so there's no point in keeping this issue open.

@jordanlewis jordanlewis added C-question A question rather than an issue. No code/spec/doc change needed. O-community Originated from the community and removed O-deprecated-community-questions labels Apr 24, 2018
pav-kv pushed a commit to pav-kv/cockroach that referenced this issue Mar 5, 2024
…tools/mod/golang.org/x/net-0.7.0

build(deps): bump golang.org/x/net from 0.4.0 to 0.7.0 in /tools/mod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-question A question rather than an issue. No code/spec/doc change needed. help wanted Help is requested / needed by the one who filed the issue to fix it. O-community Originated from the community
Projects
None yet
Development

No branches or pull requests