[bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Session Expired #17759

michaeljmarshall · 2022-09-21T05:49:22Z

Observations

I haven't been able to reproduce this issue, but in one of my test environments, I see this behavior often. It seems pretty clear that there is a potential for an infinite loop in the ResourceLockImpl class. I specifically see it taking place for the bookie's lock on its znode. Here are the observed events. The bookie's logs are below.

The bookie starts out connected and as the owner of an ephemeral node like this /ledgers/available/bookie-0:3181.
Then the bookie's zk session expires. (This might be due to being idle for a while since it is a test cluster.)

Then, we see this log: Metadata store session has been re-established. Revalidating all the existing locks.. That log line is associated with revalidating locks, and given the behavior we observe, it seems to find the lock existing in Zookeeper (otherwise, we wouldn't get this loop). Here is the code that we don't see executed (we don't see that line)

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java

Lines 244 to 250 in 12ca27f

    
           if (!optGetResult.isPresent()) { 
        
               // The lock just disappeared, try to acquire it again 
        
               // Reset the expectation on the version 
        
               setVersion(-1L); 
        
               return acquireWithNoRevalidation(newValue) 
        
                       .thenRun(() -> log.info("Successfully re-acquired missing lock at {}", path)); 
        
           }

Then the next nuance that took me a while to find is

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java

Line 267 in 12ca27f

if (newValue.equals(existingValue)) {

. Given that we fell through step 3, I couldn't see why we weren't getting into this block (again, we don't see that log line). It's because the serde.deserialize implementation for BookieServiceInfo deserializes all objects to null. (This definitely seems like a bug, and I think fixing this would fix the bug for the rack awareness case but not for the general case.)

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/bookkeeper/BookieServiceInfoSerde.java

Lines 65 to 68 in 12ca27f

    
           @Override 
        
           public BookieServiceInfo deserialize(String path, byte[] content, Stat stat) throws IOException { 
        
               return null; 
        
           }

Then, we delete the znode because we determined that we created it, but the new value is different that the old value (see 4). Naturally, we delete the current value and then recreate it asynchronously:

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java

Lines 300 to 305 in 12ca27f

    
           return store.delete(path, Optional.of(res.getStat().getVersion())) 
        
                   .thenRun(() -> 
        
                       // Reset the expectation that the key is not there anymore 
        
                       setVersion(-1L) 
        
                   ) 
        
                   .thenCompose(__ -> acquireWithNoRevalidation(newValue))

At some point in the middle of step 5, we get a notification that the znode was deleted and that triggers the lock invalidation logic, which gets us to enter into a loop of deleting and recreating the znode.

2022-07-29T19:43:33,222+0000 [main-SendThread(zookeeper-0:2181)] WARN  org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x1002e8eca9c0019 has expired
2022-07-29T19:43:33,222+0000 [main-SendThread(zookeeper-0:2181)] WARN  org.apache.zookeeper.ClientCnxn - Session 0x1002e8eca9c0019 for server zookeeper-0/10.249.212.6:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$SessionExpiredException: Unable to reconnect to ZooKeeper service, session 0x1002e8eca9c0019 has expired
        at org.apache.zookeeper.ClientCnxn$SendThread.onConnected(ClientCnxn.java:1430) ~[org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
        at org.apache.zookeeper.ClientCnxnSocket.readConnectResult(ClientCnxnSocket.java:154) ~[org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:86) ~[org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) [org.apache.zookeeper-zookeeper-3.8.0.jar:3.8.0]
2022-07-29T19:44:16,903+0000 [main-EventThread] INFO  org.apache.pulsar.metadata.impl.ZKSessionWatcher - Got ZK session watch event: WatchedEvent state:SyncConnected type:None path:null
2022-07-29T19:44:16,904+0000 [main-EventThread] INFO  org.apache.pulsar.metadata.impl.ZKSessionWatcher - ZooKeeper client reconnection with server quorum. Current status: SessionLost
2022-07-29T19:44:16,904+0000 [metadata-store-coordination-service-5-1] INFO  org.apache.pulsar.metadata.coordination.impl.LockManagerImpl - Metadata store connection has been re-established. Revalidating locks that were pending.
2022-07-29T19:44:16,984+0000 [metadata-store-coordination-service-5-1] INFO  org.apache.pulsar.metadata.coordination.impl.LockManagerImpl - Metadata store session has been re-established. Revalidating all the existing locks.
2022-07-29T19:44:17,409+0000 [metadata-store-3-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated
2022-07-29T19:44:17,411+0000 [metadata-store-3-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated
2022-07-29T19:44:17,476+0000 [main-EventThread] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /ledgers/available/bookkeeper-0:3181
2022-07-29T19:44:17,477+0000 [main-EventThread] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Successfully re-acquired lock at /ledgers/available/bookkeeper-0:3181
2022-07-29T19:44:17,575+0000 [metadata-store-3-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated
2022-07-29T19:44:17,576+0000 [metadata-store-3-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated
2022-07-29T19:44:17,582+0000 [main-EventThread] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Acquired resource lock on /ledgers/available/bookkeeper-0:3181
2022-07-29T19:44:17,582+0000 [main-EventThread] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Successfully re-acquired lock at /ledgers/available/bookkeeper-0:3181
2022-07-29T19:44:17,582+0000 [main-EventThread] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Successfully revalidated the lock on /ledgers/available/bookkeeper-0:3181
2022-07-29T19:44:17,592+0000 [metadata-store-3-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated
2022-07-29T19:44:17,592+0000 [metadata-store-3-1] INFO  org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated

Potential solutions

Instead of deleting the znode, why not just use a store.put to replace the znode (we already confirmed that we own it). This might have bad consequences for other use cases. It's relevant to remember that this is for a generic lock class. However, given this code handling "bad version" updates, it seems like this might be the right path:

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java

Lines 191 to 194 in 12ca27f

    
           if (ex.getCause() instanceof BadVersionException) { 
        
               log.warn("Failed to revalidate the lock at {}. Marked as expired", path); 
        
               state = State.Released; 
        
               expiredFuture.complete(null);

.

Try to find a way to ignore the ZK notification that the node is deleted. We could do this by adding state to the ResourceLockImpl that indicates the lock is being revalidated in the edge case that requires deletion and we should ignore node deleted notification.

I didn't have any time to test these implementations, but after writing this issue, I think one or both of these could be the right direction.

Open questions

One of the details I couldn't account for was the number of updates. We see Lock on resource /ledgers/available/bookie-0:3181 was invalidated logged twice at a time and then later in the log (not shared here), I saw it 3 times at a time. This led me to test with ZK to see how persistent watches work, and it seems to me that we shouldn't be getting extra notifications. It could have to do with the way callbacks are handled in the ResourceLockImpl#revalidate.
Why does the bookie think the data is already in ZK (see step 3 above)? This seems to indicate that the client gave up on the connection before the server, given that the znode is ephemeral. I don't have the relevant ZK logs to validate against my specific bk logs.

The text was updated successfully, but these errors were encountered:

mattisonchao · 2022-09-21T06:31:54Z

2022-07-29T19:44:17,409+0000 [metadata-store-3-1] INFO org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated
2022-07-29T19:44:17,411+0000 [metadata-store-3-1] INFO org.apache.pulsar.metadata.coordination.impl.ResourceLockImpl - Lock on resource /ledgers/available/bookkeeper-0:3181 was invalidated

Looks like we have two same path locks in the same broker.
Do we have the heap dump to help confirm the lock state?

eolivelli · 2022-09-21T07:15:28Z

I added a implementation of the SerDe in this PR
#17762

I have found the problem of #17762 while trying to understand this bug

mattisonchao · 2022-09-21T07:20:09Z

Thanks for your fix. @eolivelli

. Given that we fell through step 3, I couldn't see why we weren't getting into this block (again, we don't see that log line). It's because the serde.deserialize implementation for BookieServiceInfo deserializes all objects to null. (This definitely seems like a bug, and I think fixing this would fix the bug for the rack awareness case but not for the general case.)

mattisonchao · 2022-09-26T01:40:55Z

After #17762 merged, I think we solved this problem, but it still has risks in the future.
@michaeljmarshall WDYT?

. Given that we fell through step 3, I couldn't see why we weren't getting into this block (again, we don't see that log line). It's because the serde.deserialize implementation for BookieServiceInfo deserializes all objects to null. (This definitely seems like a bug, and I think fixing this would fix the bug for the rack awareness case but not for the general case.)

michaeljmarshall added the type/bug The PR fixed a bug or issue reported a bug label Sep 21, 2022

eolivelli added area/broker component/bookkeeper area/metadata labels Sep 21, 2022

michaeljmarshall mentioned this issue Sep 21, 2022

[fix][metadata] Fix allow updating the released resource lock value. #17758

Closed

13 tasks

michaeljmarshall changed the title ~~[bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Connection Expired~~ [bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Session Expired Sep 21, 2022

sijie mentioned this issue Sep 21, 2022

ISSUE-17759: [bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Session Expired streamnative/pulsar-archived#4916

Open

mattisonchao mentioned this issue Sep 26, 2022

Make BookieId work with PulsarRegistrationDriver #17762

Merged

4 tasks

eolivelli mentioned this issue Oct 4, 2022

Make BookieId work with PulsarRegistrationDriver (second take) #17922

Merged

4 tasks

eolivelli closed this as completed in #17922 Oct 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Session Expired #17759

[bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Session Expired #17759

michaeljmarshall commented Sep 21, 2022 •

edited

Loading

mattisonchao commented Sep 21, 2022 •

edited

Loading

eolivelli commented Sep 21, 2022

mattisonchao commented Sep 21, 2022

mattisonchao commented Sep 26, 2022 •

edited

Loading

[bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Session Expired #17759

[bug] Infinite Loop Observed for ResourceLockImpl after Zookeeper Session Expired #17759

Comments

michaeljmarshall commented Sep 21, 2022 • edited Loading

Observations

Potential solutions

Open questions

mattisonchao commented Sep 21, 2022 • edited Loading

eolivelli commented Sep 21, 2022

mattisonchao commented Sep 21, 2022

mattisonchao commented Sep 26, 2022 • edited Loading

michaeljmarshall commented Sep 21, 2022 •

edited

Loading

mattisonchao commented Sep 21, 2022 •

edited

Loading

mattisonchao commented Sep 26, 2022 •

edited

Loading