New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-2623: Fix database corruption caused by quorum check #1988
ZOOKEEPER-2623: Fix database corruption caused by quorum check #1988
Conversation
Ping @eolivelli @tisonkun @symat @maoling @cnauroth for review. |
Individual `OpCode.check` causes: * Connection loss due to null stat in `SetDataResponse`. * Database corruption since `SerializeUtils.deserializeTxn` does not support `OpCode.check`. This commit makes `OpCode.check` a pure read operation and returns nothing to match `OpResult.CheckResult` in `MultiResponse`.
b691f3d
to
ba95e21
Compare
Request for review @anmolnar @breed @eolivelli @symat @tisonkun. It causes not only connection loss but also database corruption. Depends on deployment, it could be vulnerable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! LGTM.
To be clear, this op is normally processed within multi
which is a write op to go through the commit process.
Merging... |
All tests are refactored to fail before apache#1988 and resist more than 100 runs locally without failure after apache#1988.
Looks like an old and nasty bug. Congrats @kezhuw for fixing it. Shouldn't we backport it to |
I was rethinking this in investigating ZOOKEEPER-4750. I think it might be more appropriate to throw error |
I am positive to backport this anyway as it could corrupt disk database. |
This pr make CI unstable, I have opened #2067 to solve it. Also, I wonder whether UNIMPLEMENTED is more appropriate. I sent https://lists.apache.org/thread/vl816jfrklvqz29coz5qnwpom9q41pcg for this. |
Individual
OpCode.check
causes:SetDataResponse
.SerializeUtils.deserializeTxn
does not supportOpCode.check
.This commit makes
OpCode.check
a pure read operation and returns nothing to matchOpResult.CheckResult
inMultiResponse
.