Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in concurrent schema deletion #11606

Conversation

wuzhanpeng
Copy link
Contributor

This PR may fixed #11605

Motivation

Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs.

Modifications

  1. Add existence checks before schema deletion in AbstractTopic#deleteSchema.
  2. Add existence checks before actually performing schema storage deletion in BookkeeperSchemaStorage#deleteSchema.
  3. Ignore NoNodeException in BookkeeperSchemaStorage#deleteSchema.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no
  • The rest endpoints: no
  • The admin cli options: no
  • Anything that affects deployment: no

Documentation

This PR does not need to update the docs.

@wuzhanpeng
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@wuzhanpeng
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Anonymitaet Anonymitaet added the doc-not-needed Your PR changes do not impact docs label Aug 10, 2021
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there already a test case in which we try to delete the Schema but the ledger on BookKeeper has already been deleted ? if not, can you please add it ?

@@ -372,6 +374,21 @@ public String getReplicatorPrefix() {
String id = TopicName.get(base).getSchemaName();
SchemaRegistryService schemaRegistryService = brokerService.pulsar().getSchemaRegistryService();
return schemaRegistryService.getSchema(id)
.exceptionally(t -> {
if (t.getCause() != null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition looks a little bit hacky.

  1. we are only checking t.getCause(), IMHO we should have some utility method that traverses the full exception chain, looking for a SchemaException
  2. a "non recoverable" SchemaException is not enough to say that this is a "Schema does not exist" exception, can we introduce some specific SchemaException subclass or method like isRecoverable() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reply!

For question 1, the condition is inspired by SchemaRegistryServiceImpl#trimDeletedSchemaAndGetList(java.lang.String), that is, those schemas with unrecoverable exceptions would be regarded as deleted schemas. Therefore here I continue to use the design concept.

For question 2, I agree that unrecoverable SchemaException is not equivant to "Schema does not exist". However in schema storage deletion scenario, an unrecoverable exception, which includes NoSuchLedgerException and NoSuchEntryException, is enough to say that the schema has already been deleted by others and we can quit the process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your clarification

What about creating an utility method and reduce code duplication?

return (forceFully ? CompletableFuture.completedFuture(null) : getSchema(schemaId))
return (forceFully ? CompletableFuture.completedFuture(null) : getSchema(schemaId).exceptionally(t -> {
if (t.getCause() != null
&& (t.getCause() instanceof SchemaException)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, at least we should have one utility method to detect this case

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

thank you

@wuzhanpeng
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@congbobo184 congbobo184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! could you please add some tests for it?

@wuzhanpeng
Copy link
Contributor Author

LGTM! could you please add some tests for it?

Sorry for the late. I supplemented a test case. PTAL~ @congbobo184

@wuzhanpeng
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@wuzhanpeng
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@congbobo184 congbobo184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sijie sijie merged commit 43ded59 into apache:master Aug 17, 2021
codelipenghui pushed a commit that referenced this pull request Sep 9, 2021
This PR fixes #11605

### Motivation

Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs.

### Modifications

1. Add existence checks before schema deletion in `AbstractTopic#deleteSchema`.
2. Add existence checks before actually performing schema storage deletion in `BookkeeperSchemaStorage#deleteSchema`.
3. Ignore `NoNodeException` in `BookkeeperSchemaStorage#deleteSchema`.

(cherry picked from commit 43ded59)
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Sep 9, 2021
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Oct 27, 2021
This PR fixes apache#11605

### Motivation

Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs.

### Modifications

1. Add existence checks before schema deletion in `AbstractTopic#deleteSchema`.
2. Add existence checks before actually performing schema storage deletion in `BookkeeperSchemaStorage#deleteSchema`.
3. Ignore `NoNodeException` in `BookkeeperSchemaStorage#deleteSchema`.

(cherry picked from commit 43ded59)
(cherry picked from commit 0873c09)
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
This PR fixes apache#11605

### Motivation

Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs.

### Modifications

1. Add existence checks before schema deletion in `AbstractTopic#deleteSchema`.
2. Add existence checks before actually performing schema storage deletion in `BookkeeperSchemaStorage#deleteSchema`.
3. Ignore `NoNodeException` in `BookkeeperSchemaStorage#deleteSchema`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in concurrent schema deletion
6 participants