Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-3531: Synchronization on ACLCache cause cluster to hang whe… #1077

Closed
wants to merge 5 commits into from

Conversation

@mcfatealan
Copy link
Contributor

mcfatealan commented Sep 4, 2019

…n network/disk issues happen during datatree serialization

mcfatealan added 2 commits Sep 4, 2019
…n network/disk issues happen during datatree serialization
@maoling

This comment has been minimized.

Copy link
Contributor

maoling commented Sep 4, 2019

Loooooooooking!, this weekend(at the latest:D).

Copy link
Contributor

lvfangmin left a comment

Thanks @mcfatealan for reporting this issue and creating a PR.

The deserialize function might stall due to the same reason, we should probably make the same change.

Also we should think about the potential correctness issue here. Since we serialize ACL first before serializing data tree, with the this change, we may have node reference to a ACL id which is not exist in the fuzzy snapshot.

We may create a new ACL ID and update the count when replaying the createNode txn with the fix made in ZOOKEEPER-3306, but we should double check.

@mcfatealan

This comment has been minimized.

Copy link
Contributor Author

mcfatealan commented Sep 6, 2019

Hi @lvfangmin thanks for pointing out the potential risks here!
I totally agree that we should always be careful about potential correctness issue when dealing with changes to synchronization. Meanwhile, in this case (plz correct me if I'm wrong here) I think the proposed fix is rather safe that: 1) it does not break existing fixes in ZOOKEEPER-3306 2) it does not introduce new problems.

1)

The race condition mentioned in the ZOOKEEPER-3306 exists before the fix mentioned in this thread. We carefully inspect the issue you mentioned in ZOOKEEPER-3306. And the fix introduced in ZOOKEEPER-3306 is not to prevent this race condition to avoid the fuzzy snapshot, but to add the missing ACL during replay. New transactions are always possible between serializeACL and serializeNodes, and our fix would not create a new race condition here.

After the fix we can still pass the test testACLCreatedDuringFuzzySnapshotSync.

2)

Essentially the proposed fix is finer-grained synchronization. The original codes look like:
v1

synchronized public void serialize(OutputArchive oa) throws IOException {
    writeACL(longKeyMap);
}

which is equilvalent to:
v1.b=v1

public void serialize(OutputArchive oa) throws IOException {
    synchronized (this) {
        clonedLongKeyMap = copyACL(longKeyMap);
        writeACL(clonedLongKeyMap);
    }
}

This is how codes look like after our fix:
v2

public void serialize(OutputArchive oa) throws IOException {
    synchronized (this) {
        clonedLongKeyMap = copyACL(longKeyMap);
    }
    writeACL(clonedLongKeyMap);
}

Because clonedLongKeyMap is a local, immutable data structure, it would not create new correctness issues even we move it out of synchronization blocks. Suppose right now we have a thread t2 coming in when t1 is doing writeACL, it is still not gonna changing the in-memory state or on-disk records. The only difference this change made is when writeACL gets stuck, it would get stuck outside the synchronization block so it would not block other access to ACLCache.

@mcfatealan

This comment has been minimized.

Copy link
Contributor Author

mcfatealan commented Sep 6, 2019

@lvfangmin @maoling Thanks so much for your thoughts and any feedbacks are appreciated. 👍

@lvfangmin

This comment has been minimized.

Copy link
Contributor

lvfangmin commented Sep 10, 2019

Thanks @mcfatealan for double checking on the correctness issue, it seems correct to me, but let's have more people review on this.

Btw, let's change the deserialize function to avoid similar stall.

@mcfatealan

This comment has been minimized.

Copy link
Contributor Author

mcfatealan commented Sep 10, 2019

@lvfangmin sure, I'll submit a patch for deserialize shortly after

mcfatealan added 2 commits Sep 10, 2019
@mcfatealan mcfatealan force-pushed the mcfatealan:ZOOKEEPER-3531 branch from 265f048 to 951dd5c Sep 10, 2019
@mcfatealan

This comment has been minimized.

Copy link
Contributor Author

mcfatealan commented Sep 10, 2019

@lvfangmin @maoling deserialize is refactored following a similar approach (move I/O out of sync blocks), the regression test is also attached. Any comments are appreciated :)

Copy link
Contributor

maoling left a comment

  • Look reasonable and great. need to backport to branch-3.5 ?
  • All the place where deserialize/serialize is inside of synchronized block have the potential risk of hanging. I'am searching for other places
  • BTW, @mcfatealan, any possibility that you can share/recommend us some classic/emerging papers about chaos engineering/fault injection for distributed system?
clear();
int i = ia.readInt("map");

LinkedHashMap<Long, List<ACL>> deserializedMap = new LinkedHashMap<Long, List<ACL>>();
// keep read operations out of synchronization block

This comment has been minimized.

Copy link
@maoling

maoling Sep 23, 2019

Contributor

Map<Long, List> clonedDeserializedMap = new LinkedHashMap<>();

{
clonedLongKeyMap = new HashMap<Long, List<ACL>>(longKeyMap);
}
oa.writeInt(clonedLongKeyMap.size(), "map");

This comment has been minimized.

Copy link
@maoling

maoling Sep 23, 2019

Contributor
    synchronized (this) {
        clonedLongKeyMap = new HashMap<>(longKeyMap);
    }
@mcfatealan

This comment has been minimized.

Copy link
Contributor Author

mcfatealan commented Sep 23, 2019

@maoling thx a lot for the reviewing work! I've pushed a new commit to address two places of lengthy codes you point out.

As for fault injections for distributed systems, I haven't done a comprehensive survey yet. But there are some related papers off top of my head, some of them claim direct supports on ZooKeeper:
PREFAIL: A Programmable Tool for Multiple-Failure Injection [OOPSLA'11]
Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions[FAST'17]
FATE and DESTINI: A Framework for Cloud Recovery Testing [NSDI'11]

Hope it helps :) I'll update the list if I could remember more

@mcfatealan

This comment has been minimized.

Copy link
Contributor Author

mcfatealan commented Sep 23, 2019

@maoling I came across two other related ones today:
Lineage-driven Fault Injection [SIGMOD '15]
On Fault Resilience of OpenStack [SoCC '13]

Actually right now we are working on a systematic fault injector and runtime checker framework using program analysis techniques for catching these type of failures. We have a short paper HotOS '19 describing our approach and some preliminary results and plan to release the tool to the open-source community when it is mature. We could let you know when it's available if you are interested.
(sorry for a little self-promotion here 😸 ) Any feedback will be highly appreciated :)

Copy link
Contributor

lvfangmin left a comment

+1

Thanks @mcfatealan for reporting and fixing the issue.

@mcfatealan

This comment has been minimized.

Copy link
Contributor Author

mcfatealan commented Sep 30, 2019

@maoling @lvfangmin My pleasure, thanks for the reviewing work!

@lvfangmin

This comment has been minimized.

Copy link
Contributor

lvfangmin commented Oct 8, 2019

We have two +1 on this, will merge it today.

@lvfangmin

This comment has been minimized.

Copy link
Contributor

lvfangmin commented Oct 8, 2019

@eolivelli @hanm @anmolnar Wanna to take a look at this as well? I'll wait for you guys to review before merging.

Copy link
Contributor

anmolnar left a comment

+1 LGTM.

@asfgit asfgit closed this in f4c7b69 Oct 9, 2019
@anmolnar

This comment has been minimized.

Copy link
Contributor

anmolnar commented Oct 9, 2019

Merged to master branch. Thanks @mcfatealan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.