Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][metadata] Set revalidateAfterReconnection true for certain failures #17664

Merged

Conversation

michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Sep 15, 2022

Motivation

In reading through the ResourceLockImpl, I noticed that revalidateAfterReconnection is always false. This seems like an oversight. It was introduced in #11886.

Modifications

  • Set revalidateAfterReconnection = true; when there is a failure during lock validation.

Verifying this change

I am not sure that this change will actually affect anything. In recently looking through the reconnection lo logs, I noticed the following two get logged together:

2022-07-29T19:44:16,904+0000 [metadata-store-coordination-service-5-1] INFO  org.apache.pulsar.metadata.coordination.impl.LockManagerImpl - Metadata store connection has been re-established. Revalidating locks that were pending.
2022-07-29T19:44:16,984+0000 [metadata-store-coordination-service-5-1] INFO  org.apache.pulsar.metadata.coordination.impl.LockManagerImpl - Metadata store session has been re-established. Revalidating all the existing locks.

These logs relate to the following:

if (se == SessionEvent.SessionReestablished) {
log.info("Metadata store session has been re-established. Revalidating all the existing locks.");
for (ResourceLockImpl<T> lock : locks.values()) {
futures.add(lock.revalidate(lock.getValue()));
}
} else if (se == SessionEvent.Reconnected) {
log.info("Metadata store connection has been re-established. Revalidating locks that were pending.");
for (ResourceLockImpl<T> lock : locks.values()) {
futures.add(lock.revalidateIfNeededAfterReconnection());
}
}

Note that the changes in this PR will affect how lock.revalidateIfNeededAfterReconnection() will run, but if the lock is going to be revalidated in a subsequent call because there is always another ZK notification that triggers the lock.revalidate(lock.getValue()), then I am not sure that this PR is important. I'll continue looking into this.

Documentation

  • doc-not-needed

@michaeljmarshall michaeljmarshall added type/bug The PR fixed a bug or issue reported a bug doc-not-needed Your PR changes do not impact docs area/metadata labels Sep 15, 2022
@michaeljmarshall michaeljmarshall added this to the 2.12.0 milestone Sep 15, 2022
@michaeljmarshall michaeljmarshall self-assigned this Sep 15, 2022
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! It was definitely a mistake :/

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! An additional test would be nice too.

@michaeljmarshall michaeljmarshall marked this pull request as ready for review September 15, 2022 05:38
@michaeljmarshall
Copy link
Member Author

Thanks for confirming!

@michaeljmarshall
Copy link
Member Author

Nice catch! An additional test would be nice too.

I can look at adding a test tomorrow.

@mattisonchao
Copy link
Member

mattisonchao commented Sep 15, 2022

I think this bug may cause forget to invalidate ownedBundlesCache, and make the same bundle have two owners in the cache.

@@ -196,6 +196,7 @@ synchronized void lockWasInvalidated() {
// We failed to revalidate the lock due to connectivity issue
Copy link
Member

@mattisonchao mattisonchao Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, I think we also need to handle LockBusyException here that throws by doRevalidate method, which can avoid forgetting to complete expiredFuture.

@eolivelli eolivelli merged commit c40c7ee into apache:master Sep 18, 2022
@Technoboy- Technoboy- modified the milestones: 2.12.0, 2.11.0 Sep 19, 2022
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Sep 19, 2022
codelipenghui pushed a commit that referenced this pull request Sep 20, 2022
codelipenghui pushed a commit that referenced this pull request Sep 20, 2022
@michaeljmarshall michaeljmarshall deleted the fix-revalidate-after-reconnection branch September 28, 2022 17:49
aloyszhang pushed a commit to aloyszhang/pulsar that referenced this pull request Jan 9, 2023
…est !188)

Squash merge branch 'release_2.8.1.4_fix_metadata' into 'release-2.8.1.4'
Fixes #<xyz>

### Motivation

chery pick了三个PR:

PIP-45: Handle session events and invalidations from single thread (apache#12184)
[fix][metadata] Set revalidateAfterReconnection true for certain failures apache#17664
[fix][metadata] Cleanup state when lock revalidation gets LockBusyException apache#17700

TAPD: --story=881235137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants