New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in concurrent schema deletion #11606
Fix race condition in concurrent schema deletion #11606
Conversation
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there already a test case in which we try to delete the Schema but the ledger on BookKeeper has already been deleted ? if not, can you please add it ?
@@ -372,6 +374,21 @@ public String getReplicatorPrefix() { | |||
String id = TopicName.get(base).getSchemaName(); | |||
SchemaRegistryService schemaRegistryService = brokerService.pulsar().getSchemaRegistryService(); | |||
return schemaRegistryService.getSchema(id) | |||
.exceptionally(t -> { | |||
if (t.getCause() != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition looks a little bit hacky.
- we are only checking
t.getCause()
, IMHO we should have some utility method that traverses the full exception chain, looking for a SchemaException - a "non recoverable" SchemaException is not enough to say that this is a "Schema does not exist" exception, can we introduce some specific SchemaException subclass or method like
isRecoverable()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your reply!
For question 1, the condition is inspired by SchemaRegistryServiceImpl#trimDeletedSchemaAndGetList(java.lang.String)
, that is, those schemas with unrecoverable exceptions would be regarded as deleted schemas. Therefore here I continue to use the design concept.
For question 2, I agree that unrecoverable SchemaException is not equivant to "Schema does not exist". However in schema storage deletion scenario, an unrecoverable exception, which includes NoSuchLedgerException
and NoSuchEntryException
, is enough to say that the schema has already been deleted by others and we can quit the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your clarification
What about creating an utility method and reduce code duplication?
return (forceFully ? CompletableFuture.completedFuture(null) : getSchema(schemaId)) | ||
return (forceFully ? CompletableFuture.completedFuture(null) : getSchema(schemaId).exceptionally(t -> { | ||
if (t.getCause() != null | ||
&& (t.getCause() instanceof SchemaException) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, at least we should have one utility method to detect this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thank you
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! could you please add some tests for it?
…tion_in_concurrent_schema_deletion
…tion_in_concurrent_schema_deletion
Sorry for the late. I supplemented a test case. PTAL~ @congbobo184 |
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR fixes #11605 ### Motivation Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs. ### Modifications 1. Add existence checks before schema deletion in `AbstractTopic#deleteSchema`. 2. Add existence checks before actually performing schema storage deletion in `BookkeeperSchemaStorage#deleteSchema`. 3. Ignore `NoNodeException` in `BookkeeperSchemaStorage#deleteSchema`. (cherry picked from commit 43ded59)
This PR fixes apache#11605 ### Motivation Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs. ### Modifications 1. Add existence checks before schema deletion in `AbstractTopic#deleteSchema`. 2. Add existence checks before actually performing schema storage deletion in `BookkeeperSchemaStorage#deleteSchema`. 3. Ignore `NoNodeException` in `BookkeeperSchemaStorage#deleteSchema`. (cherry picked from commit 43ded59) (cherry picked from commit 0873c09)
This PR fixes apache#11605 ### Motivation Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs. ### Modifications 1. Add existence checks before schema deletion in `AbstractTopic#deleteSchema`. 2. Add existence checks before actually performing schema storage deletion in `BookkeeperSchemaStorage#deleteSchema`. 3. Ignore `NoNodeException` in `BookkeeperSchemaStorage#deleteSchema`.
This PR may fixed #11605
Motivation
Concurrently deleting topics with the same schema may cause race condition in broker side. If we do not handle these scenarios correctly we will get unexpected exceptions in broker logs.
Modifications
AbstractTopic#deleteSchema
.BookkeeperSchemaStorage#deleteSchema
.NoNodeException
inBookkeeperSchemaStorage#deleteSchema
.Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesDocumentation
This PR does not need to update the docs.