New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inserts can be lost during a predicate move #2338
Comments
You can now reproduce this problem significantly faster in Jepsen eb796cfcc204c592545965968bd28ad1e6b2eff0 by using the move-tablet nemesis, which shuffles tablets around every 15 seconds or so.
With predicate moves, we can get dgraph to lose 99% of acknowledged inserts in 60 seconds:
|
@aphyr You were probably using old nightly build, did you try with |
Ah, good catch, thank you. Unfortunately we're still broken in b8c0908. :-( |
Some of the issue are fixed in #2339 |
- The reason for bug #2338 was that there was a race condition between a mutation and predicate move. Zero was not checking if a predicate is under move before allowing a commit. Thus, a mutation could get proposed in a group, then a move starts, and get committed by Zero (after the move starts). - This change this issue by ensuring that Zero checks if a predicate is being moved, before allowing commit. - Any pending transactions are also cancelled once the move starts, so this would only happen as part of a race condition and not afterward. Mechanism: - Send the real keys back to Zero, as part of Transaction Context. - Zero uses these keys to parse the predicate, and checks if that predicate is currently moving. If so, it would abort the transaction. - Also, check for `_predicate_` being moved. For some reason, if we don't consider this predicate, we could still lose data. - Before doing a mutation in Dgraph alpha, check if that tablet can be written to. - Loop until all transactions corresponding to the predicate move are aborted. Only then start the move. Tangential changes: - Update the port number for bank integration test. - Remove the separate key value or clean channel. Make it run as part of the main Node.Run loop. - Add a max function. - Small refactoring here and there.
Can confirm that after half an hour of running this test, no violation was found with the above commit. |
With server-side ordering, @upsert schemas, no crashes or network faults, roughly 10 inserts/sec, and no updates or deletes, Dgraph can occasionally (once every five hours or so) lose successfully inserted records: 20180412T161038.000-0500.zip
The lost records occur during a predicate move, which suggests this issue might be related to #2321. This occurs with
and can be reproduced with Jepsen 23329ead4c4e3d8352234658026d09792f15c406 via
The text was updated successfully, but these errors were encountered: