Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix update ledger list to znode version mismatch failed, ledger not delete #12015

Conversation

hangc0276
Copy link
Contributor

Motivation

When Zookeeper throws Failed to update ledger list. z-node version mismatch. Closing managed ledger exception when update ZNode list, it will not clean up the created ledger, which will lead to the new created ledger not be indexed to the topic managedLedger list, and can't be cleanup as topic retention. What's more, it will cause ZNode number increase in Zookeeper if the z-node version mismatch exception keeping throw out. The exception list as follow:

10:44:29.017 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Created new ledger 67311140
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Failed to update ledger list. z-node version mismatch. Closing managed ledger
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://test/test/test_v1-partition-4}, client=/10.1.2.3:38938, producerName=pulsar-101-1123, producerId=20}

Modification

  1. When updating ZNode list failed, delete the created ledger from broker cache and BookKeeper, regardless of whether the exception type is BadVersionException or not.

@hangc0276 hangc0276 self-assigned this Sep 12, 2021
@hangc0276 hangc0276 added area/broker type/bug The PR fixed a bug or issue reported a bug release/2.7.4 release/2.8.2 labels Sep 12, 2021
@hangc0276 hangc0276 added this to the 2.9.0 milestone Sep 12, 2021
@eolivelli
Copy link
Contributor

Good catch

I am not sure that this applies to2.7.x

return;
}
}

synchronized (ManagedLedgerImpl.this) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the exception check here? We could record the lastLedgerCreationFailureTimestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaoran10 I have add record the lastLedgerCreationFailureTimestamp into the exception block. We couldn't merge the exception block here due to the metadataMutex unlock.

@@ -1466,6 +1452,20 @@ public void operationFailed(MetaStoreException e) {

metadataMutex.unlock();

if (e instanceof BadVersionException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put this logic before releasing the lock? Now we changed the previous behavior. I'm not sure if it will cause other thread safety issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@Anonymitaet
Copy link
Member

@hangc0276 Thanks for your contribution. For this PR, do we need to update docs?

(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@eolivelli eolivelli added the doc-not-needed Your PR changes do not impact docs label Sep 14, 2021
@eolivelli
Copy link
Contributor

@Anonymitaet I added "no-need-docs", thanks

@eolivelli
Copy link
Contributor

@hangc0276 can you please answer the comments ?
it would be good to see this patch into 2.9.0

@hangc0276 hangc0276 force-pushed the chenhang/fix_znode_version_mismatch_ledger_not_delete_bug branch from 2a9b2a1 to bd74c58 Compare November 21, 2021 09:29
@codelipenghui codelipenghui merged commit e7b0e3d into apache:master Nov 29, 2021
codelipenghui pushed a commit that referenced this pull request Nov 29, 2021
…elete (#12015)

### Motivation
When Zookeeper throws `Failed to update ledger list. z-node version mismatch. Closing managed ledger` exception when update ZNode list, it will not clean up the created ledger, which will lead to the new created ledger not be indexed to the topic managedLedger list, and can't be cleanup as topic retention. What's more, it will cause ZNode number increase in Zookeeper if the `z-node version mismatch` exception keeping throw out. The exception list as follow:
```
10:44:29.017 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Created new ledger 67311140
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Failed to update ledger list. z-node version mismatch. Closing managed ledger
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://test/test/test_v1-partition-4}, client=/10.1.2.3:38938, producerName=pulsar-101-1123, producerId=20}
```

### Modification
1. When updating ZNode list failed, delete the created ledger from broker cache and BookKeeper, regardless of whether the exception type is BadVersionException or not.

(cherry picked from commit e7b0e3d)
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Nov 29, 2021
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request Nov 29, 2021
…elete (apache#12015)

### Motivation
When Zookeeper throws `Failed to update ledger list. z-node version mismatch. Closing managed ledger` exception when update ZNode list, it will not clean up the created ledger, which will lead to the new created ledger not be indexed to the topic managedLedger list, and can't be cleanup as topic retention. What's more, it will cause ZNode number increase in Zookeeper if the `z-node version mismatch` exception keeping throw out. The exception list as follow:
```
10:44:29.017 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Created new ledger 67311140
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Failed to update ledger list. z-node version mismatch. Closing managed ledger
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://test/test/test_v1-partition-4}, client=/10.1.2.3:38938, producerName=pulsar-101-1123, producerId=20}
```

### Modification
1. When updating ZNode list failed, delete the created ledger from broker cache and BookKeeper, regardless of whether the exception type is BadVersionException or not.
codelipenghui pushed a commit that referenced this pull request Dec 11, 2021
…elete (#12015)

### Motivation
When Zookeeper throws `Failed to update ledger list. z-node version mismatch. Closing managed ledger` exception when update ZNode list, it will not clean up the created ledger, which will lead to the new created ledger not be indexed to the topic managedLedger list, and can't be cleanup as topic retention. What's more, it will cause ZNode number increase in Zookeeper if the `z-node version mismatch` exception keeping throw out. The exception list as follow:
```
10:44:29.017 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Created new ledger 67311140
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Failed to update ledger list. z-node version mismatch. Closing managed ledger
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://test/test/test_v1-partition-4}, client=/10.1.2.3:38938, producerName=pulsar-101-1123, producerId=20}
```

### Modification
1. When updating ZNode list failed, delete the created ledger from broker cache and BookKeeper, regardless of whether the exception type is BadVersionException or not.

(cherry picked from commit e7b0e3d)
@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Dec 11, 2021
fxbing pushed a commit to fxbing/pulsar that referenced this pull request Dec 19, 2021
…elete (apache#12015)

### Motivation
When Zookeeper throws `Failed to update ledger list. z-node version mismatch. Closing managed ledger` exception when update ZNode list, it will not clean up the created ledger, which will lead to the new created ledger not be indexed to the topic managedLedger list, and can't be cleanup as topic retention. What's more, it will cause ZNode number increase in Zookeeper if the `z-node version mismatch` exception keeping throw out. The exception list as follow:
```
10:44:29.017 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Created new ledger 67311140
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Failed to update ledger list. z-node version mismatch. Closing managed ledger
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://test/test/test_v1-partition-4}, client=/10.1.2.3:38938, producerName=pulsar-101-1123, producerId=20}
```

### Modification
1. When updating ZNode list failed, delete the created ledger from broker cache and BookKeeper, regardless of whether the exception type is BadVersionException or not.
codelipenghui pushed a commit that referenced this pull request Dec 21, 2021
…elete (#12015)

### Motivation
When Zookeeper throws `Failed to update ledger list. z-node version mismatch. Closing managed ledger` exception when update ZNode list, it will not clean up the created ledger, which will lead to the new created ledger not be indexed to the topic managedLedger list, and can't be cleanup as topic retention. What's more, it will cause ZNode number increase in Zookeeper if the `z-node version mismatch` exception keeping throw out. The exception list as follow:
```
10:44:29.017 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Created new ledger 67311140
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [test/test/persistent/test_v1-partition-4] Failed to update ledger list. z-node version mismatch. Closing managed ledger
10:44:29.018 [bookkeeper-ml-workers-OrderedExecutor-2-0] INFO  org.apache.pulsar.broker.service.Producer - Disconnecting producer: Producer{topic=PersistentTopic{topic=persistent://test/test/test_v1-partition-4}, client=/10.1.2.3:38938, producerName=pulsar-101-1123, producerId=20}
```

### Modification
1. When updating ZNode list failed, delete the created ledger from broker cache and BookKeeper, regardless of whether the exception type is BadVersionException or not.

(cherry picked from commit e7b0e3d)
@codelipenghui codelipenghui added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-2.7 Archived: 2.7 is end of life cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.7.4 release/2.8.2 release/2.9.2 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants