Skip to content

Conversation

@lvfangmin
Copy link
Contributor

Please check the Jira for more details of the bug.

Need more opinion on this fix to see if there is other corner cases or side effect we didn't handle properly.

@lvfangmin lvfangmin changed the title [ZOOKEEPER-3306] Fixing node may not accessible issue due the the inconsistent ACL reference map after SNAP sync [ZOOKEEPER-3306] Fixing node not accessible issue due the the inconsistent ACL reference map after SNAP sync Mar 11, 2019
@lvfangmin lvfangmin changed the title [ZOOKEEPER-3306] Fixing node not accessible issue due the the inconsistent ACL reference map after SNAP sync ZOOKEEPER-3306: Fixing node not accessible issue due the the inconsistent ACL reference map after SNAP sync Mar 11, 2019
@anmolnar
Copy link
Contributor

@lvfangmin Huh, this is not easy. Another edge case captured by FB. :)
The fix seems fine to me, but not sure about the implications. You essentially touch the ACL to prevent garbage collecting it by incrementing the ref counter, right?

@lvfangmin lvfangmin changed the title ZOOKEEPER-3306: Fixing node not accessible issue due the the inconsistent ACL reference map after SNAP sync ZOOKEEPER-3306: Fixing node not accessible issue due the inconsistent ACL reference map after SNAP sync Mar 21, 2019
@lvfangmin
Copy link
Contributor Author

@anmolnar currently, we cannot tell if the ACL reference is already added or not during fuzzy snapshot sync and replaying a txn for a already exist node, this code change may count twice, but it's not a big deal since this is a rare case only happen during SNAP syncing, and we'll clean the ACL map if there is no node using it when deserializing the DataTree from disk.

@lvfangmin
Copy link
Contributor Author

We've released this fix onto prod for more than 2 weeks, haven't hit this corner case so far, but it says this is a really corner case which we may not even happen in 2 weeks before this fix.

But this should be safe enough to merge in master. @anmolnar would you like to take another look?

@anmolnar
Copy link
Contributor

anmolnar commented Apr 5, 2019

@lvfangmin Where is the point in the code where we clear the ACL cache eventually?

@lvfangmin
Copy link
Contributor Author

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 lgtm

@asfgit asfgit closed this in 46b2018 Apr 29, 2019
@anmolnar
Copy link
Contributor

Committed to master. Thanks @lvfangmin !

RokLenarcic pushed a commit to RokLenarcic/zookeeper that referenced this pull request Sep 3, 2022
… ACL reference map after SNAP sync

Please check the Jira for more details of the bug.

Need more opinion on this fix to see if there is other corner cases or side effect we didn't handle properly.

Author: Fangmin Lyu <fangmin@apache.org>

Reviewers: andor@apache.org

Closes apache#848 from lvfangmin/ZOOKEEPER-3306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants