New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predicate moves can take down Dgraph indefinitely #2397
Comments
Any suggestions on how quickly can it be reproduced? And with which test? |
I haven't figured out a fast path to reproduce this yet, but the original case was
And it took about 4 hours to appear. My guess is that the bank test is more likely to be interesting, since there are more predicates. I'm also going to explore reducing the time between nemeses, and longer test times, so we spend more time migrating. |
Okay. I'll try running it at my end as well for long. |
Ah.. this is a stupid one. I could see this in Dgraph's own integration tests. Have pushed a potential fix in master. 7f21b82 This could be killing off many Dgraph alphas, which is why the entire cluster is failing. Testing the fix right now. |
Commit 7205275 also helps fix this one. Ran it on my computer, with the command above. All tests passed. Closing this. Will send you a binary to test as well at your end. |
In the build @manishrjain provided for testing on 2018-05-16 (sorry, don't have a SHA!), with predicate moves roughly every 10 seconds, Dgraph went down: 20180516T194304.000-0500.zip
In particular, it looks like a predicate move caused a single node's Alpha to crash with the following message:
Oddly, the cluster was never able to recover from this state, even though five Zero nodes and four Alpha nodes were running and connected. All alpha nodes logged
... indefinitely, and Zero nodes logged the same thing:
Every request either timed out (on n3), or returned UNAVAILABLE. Attempts to perform further predicate moves all timed out as well, though Zero would answer
/state
queries:The text was updated successfully, but these errors were encountered: