Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Node locks up after partitions #2273
While trying to fix a bug in the tests for #2152, I managed to lock up dgraph into a state where every request to one node (n1) timed out, despite the process running on all nodes. Here's the complete logs and data file from all five nodes.
This occurs on
Dgraph version : v1.0.4
and involves a series of network partitions; it looks as if the final partition healing might have left n1 in a state where it believed the leader was... possibly a node which was not the leader?
To reproduce this, run with Jepsen e31b29d1a5302766c2c83454eeed9124ef9820f5:
This appears to be a semi-rare fault; I've only seen it once so far.