Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-2623: Fix database corruption caused by quorum check #1988

Conversation

kezhuw
Copy link
Member

@kezhuw kezhuw commented Mar 8, 2023

Individual OpCode.check causes:

  • Connection loss due to null stat in SetDataResponse.
  • Database corruption since SerializeUtils.deserializeTxn does not support OpCode.check.

This commit makes OpCode.check a pure read operation and returns nothing to match OpResult.CheckResult in MultiResponse.

@kezhuw
Copy link
Member Author

kezhuw commented May 8, 2023

Ping @eolivelli @tisonkun @symat @maoling @cnauroth for review.

Individual `OpCode.check` causes:
* Connection loss due to null stat in `SetDataResponse`.
* Database corruption since `SerializeUtils.deserializeTxn` does not
  support `OpCode.check`.

This commit makes `OpCode.check` a pure read operation and returns
nothing to match `OpResult.CheckResult` in `MultiResponse`.
@kezhuw kezhuw force-pushed the ZOOKEEPER-4680-fix-corrupted-database-from-quorum-check branch from b691f3d to ba95e21 Compare May 11, 2023 18:00
@kezhuw kezhuw changed the title ZOOKEEPER-4680: Fix database corruption caused by quorum check ZOOKEEPER-2623: Fix database corruption caused by quorum check May 11, 2023
@kezhuw kezhuw closed this May 11, 2023
@kezhuw kezhuw reopened this May 11, 2023
@kezhuw
Copy link
Member Author

kezhuw commented Aug 7, 2023

Request for review @anmolnar @breed @eolivelli @symat @tisonkun.

It causes not only connection loss but also database corruption. Depends on deployment, it could be vulnerable.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! LGTM.

To be clear, this op is normally processed within multi which is a write op to go through the commit process.

@tisonkun
Copy link
Member

Merging...

@tisonkun tisonkun merged commit b31f776 into apache:master Sep 10, 2023
14 checks passed
rahulrane50 pushed a commit to rahulrane50/zookeeper that referenced this pull request Sep 15, 2023
kezhuw added a commit to kezhuw/zookeeper that referenced this pull request Sep 19, 2023
All tests are refactored to fail before apache#1988 and resist more than 100
runs locally without failure after apache#1988.
@kezhuw kezhuw deleted the ZOOKEEPER-4680-fix-corrupted-database-from-quorum-check branch September 24, 2023 15:08
@anmolnar
Copy link
Contributor

Looks like an old and nasty bug. Congrats @kezhuw for fixing it. Shouldn't we backport it to branch-3.9 too?
Maybe branch-3.8 as well?

@kezhuw
Copy link
Member Author

kezhuw commented Oct 12, 2023

I was rethinking this in investigating ZOOKEEPER-4750. I think it might be more appropriate to throw error UNIMPLEMENTED for OpCode.check at the very first. OpCode.check was not designed to function individually and we have OpCode.exists already. Also, it might be distractive in metrics/logging(e.g. ZOOKEEPER-4750), as admins have to distinguish individual OpCode.check or transactional OpCode.check embedded inside OpCode.multi. I am sorry for the trouble. Any thoughts on this @tisonkun @anmolnar @eolivelli ?

@kezhuw
Copy link
Member Author

kezhuw commented Oct 12, 2023

I am positive to backport this anyway as it could corrupt disk database.

@kezhuw
Copy link
Member Author

kezhuw commented Oct 17, 2023

This pr make CI unstable, I have opened #2067 to solve it.

Also, I wonder whether UNIMPLEMENTED is more appropriate. I sent https://lists.apache.org/thread/vl816jfrklvqz29coz5qnwpom9q41pcg for this.

cc @tisonkun @anmolnar @eolivelli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants