Skip to content

Fix delete namespace with 'Cannot delete non empty bundle' issue.#13337

Closed
Technoboy- wants to merge 5 commits intoapache:masterfrom
Technoboy-:avoid-system-topic-check
Closed

Fix delete namespace with 'Cannot delete non empty bundle' issue.#13337
Technoboy- wants to merge 5 commits intoapache:masterfrom
Technoboy-:avoid-system-topic-check

Conversation

@Technoboy-
Copy link
Contributor

@Technoboy- Technoboy- commented Dec 15, 2021

Fix #10263.

Motivation

When there are no user-created topics under a namespace, Namespace should be deleted. But currently, system topic existed and reader/producer could auto-create system and may cause the namespace deletion to fail.

Modifications

  • Close system topic readers.

Documentation

Need to update docs?

  • no-need-doc

@Technoboy- Technoboy- self-assigned this Dec 15, 2021
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 15, 2021
@Technoboy-
Copy link
Contributor Author

/pulsarbot run-failure-checks

Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shibd
Copy link
Member

shibd commented Dec 16, 2021

@Technoboy- This may be should not skip the system topic check here, I test it. There maybe problems.

  1. Deleting a namespace may be return 500
➜  bin git:(master) ✗ ./pulsar-admin namespaces delete sample/ns1
2021-12-16T16:51:07,550+0800 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/namespaces/sample/ns1?force=false] Failed to perform http delete request: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
HTTP 500 Internal Server Error
  1. Can still find the namespace after deleting it
➜  bin git:(master) ✗ ./pulsar-admin topics list sample/ns1
persistent://sample/ns1/__change_events
  1. Broker maybe print error log
2021-12-16T16:49:04,065+0800 [pulsar-io-18-23] WARN  org.apache.pulsar.broker.web.PulsarWebResource - Policies not found for sample/ns1 namespace
2021-12-16T16:49:04,065+0800 [pulsar-io-18-23] WARN  org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/127.0.0.1:56739] persistent://sample/ns1/__change_events: Policies not found for sample/ns1 namespace
org.apache.pulsar.broker.web.RestException: Policies not found for sample/ns1 namespace
	at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$12(PulsarWebResource.java:774) ~[classes/:?]
	at java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108) ~[?:?]
	at org.apache.pulsar.broker.web.PulsarWebResource.checkLocalOrGetPeerReplicationCluster(PulsarWebResource.java:745) ~[classes/:?]

I debugged the code.

In fact, an try to delete the system topic when deleting the namespace.

topicOptional.ifPresent(systemTopic -> futures.add(systemTopic.deleteForcefully()));

But Internal consumers and producers will reconnect, resulting in the auto create of the topic.

Should we shut down these internal consumers(and producers) first? Ensure that the system topic is deleted correctly before deleting the bundle?

Or directly redirect deletenamespacebundleforcefully method #10263 (comment)

For your reference, If there is a problem, we can continue to discuss it, thanks.

@Technoboy-
Copy link
Contributor Author

@Technoboy- This may be should not skip the system topic check here, I test it. There maybe problems.

  1. Deleting a namespace may be return 500
➜  bin git:(master) ✗ ./pulsar-admin namespaces delete sample/ns1
2021-12-16T16:51:07,550+0800 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/namespaces/sample/ns1?force=false] Failed to perform http delete request: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
HTTP 500 Internal Server Error
  1. Can still find the namespace after deleting it
➜  bin git:(master) ✗ ./pulsar-admin topics list sample/ns1
persistent://sample/ns1/__change_events
  1. Broker maybe print error log
2021-12-16T16:49:04,065+0800 [pulsar-io-18-23] WARN  org.apache.pulsar.broker.web.PulsarWebResource - Policies not found for sample/ns1 namespace
2021-12-16T16:49:04,065+0800 [pulsar-io-18-23] WARN  org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/127.0.0.1:56739] persistent://sample/ns1/__change_events: Policies not found for sample/ns1 namespace
org.apache.pulsar.broker.web.RestException: Policies not found for sample/ns1 namespace
	at org.apache.pulsar.broker.web.PulsarWebResource.lambda$checkLocalOrGetPeerReplicationCluster$12(PulsarWebResource.java:774) ~[classes/:?]
	at java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731) ~[?:?]
	at java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108) ~[?:?]
	at org.apache.pulsar.broker.web.PulsarWebResource.checkLocalOrGetPeerReplicationCluster(PulsarWebResource.java:745) ~[classes/:?]

I debugged the code.

In fact, an try to delete the system topic when deleting the namespace.

topicOptional.ifPresent(systemTopic -> futures.add(systemTopic.deleteForcefully()));

But Internal consumers and producers will reconnect, resulting in the auto create of the topic.

Should we shut down these internal consumers(and producers) first? Ensure that the system topic is deleted correctly before deleting the bundle?

Or directly redirect deletenamespacebundleforcefully method #10263 (comment)

For your reference, If there is a problem, we can continue to discuss it, thanks.

Thanks first,I will re-check it and respond later.

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When deleting a namespace, we need to close the reader of the system topic first, and then delete the system topic, finally delete the namespace.

@codelipenghui codelipenghui added this to the 2.10.0 milestone Dec 17, 2021
@Technoboy- Technoboy- force-pushed the avoid-system-topic-check branch from a13b81a to e28e977 Compare December 17, 2021 09:22
@Technoboy- Technoboy- changed the title Add system topic check when delete namespace bundle. Fix delete namespace with 'Cannot delete non empty bundle' issue. Dec 17, 2021
@Technoboy-
Copy link
Contributor Author

Hi , @shibd .
Sorry for the late response.
The root issue is that: when deleting namespace, the deleteNamespaceBundleAsync may execute before systemTopic.deleteForcefully(). In deleteNamespaceBundleAsync, it will check the namespace bundle, and find the system topic reader own the bundle then throw Cannot delete non empty bundle.
If the two request execute orderly (systemTopic.deleteForcefully() -> deleteNamespaceBundleAsync), systemTopic.deleteForcefully() will close the reader first and then delete the topic . So

List<String> topics = pulsar().getNamespaceService().getListOfPersistentTopics(namespaceName)
will not contain the topic and no exception thrown.

@shibd
Copy link
Member

shibd commented Dec 18, 2021

@Technoboy- Yes,Execute not orderly is one of the problems.

But systemTopic.deleteForcefully() method not will close reader.In the source code, the following two codes will trigger the re created system topic.

  1. When the client is disconnected, the client will reconnection, resulting in the auto create of the topic.

List<CompletableFuture<Void>> futures = Lists.newArrayList();
replicators.forEach((cluster, replicator) -> futures.add(replicator.disconnect()));
producers.values().forEach(producer -> futures.add(producer.disconnect()));
subscriptions.forEach((s, sub) -> futures.add(sub.disconnect()));

  1. The code will create TopicPoliciesSystemTopicClient, resulting in the auto create of the topic.

I tested your code, but I still can't delete the namespace correctly. I think it may be related to the above two reasons.

PTAL.

@Technoboy-
Copy link
Contributor Author

@Technoboy- Yes,Execute not orderly is one of the problems.

But systemTopic.deleteForcefully() method not will close reader.In the source code, the following two codes will trigger the re created system topic.

  1. When the client is disconnected, the client will reconnection, resulting in the auto create of the topic.

List<CompletableFuture<Void>> futures = Lists.newArrayList();
replicators.forEach((cluster, replicator) -> futures.add(replicator.disconnect()));
producers.values().forEach(producer -> futures.add(producer.disconnect()));
subscriptions.forEach((s, sub) -> futures.add(sub.disconnect()));

  1. The code will create TopicPoliciesSystemTopicClient, resulting in the auto create of the topic.

I tested your code, but I still can't delete the namespace correctly. I think it may be related to the above two reasons.

PTAL.

Yes, really, I will update later.

@Technoboy- Technoboy- force-pushed the avoid-system-topic-check branch from 036d6c5 to 17f2125 Compare December 21, 2021 12:52
@Technoboy-
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Technoboy- Technoboy- force-pushed the avoid-system-topic-check branch from 9b2280d to 6af0293 Compare December 29, 2021 13:25
@mattisonchao
Copy link
Member

/pulsarbot rerun-failure-checks

@codelipenghui
Copy link
Contributor

@shibd Could you please help review this PR again?

@shibd
Copy link
Member

shibd commented Jan 12, 2022

Emm,I running in standalone mode, I tested the latest code and still have some problems.

  1. Deleting a namespace may be return 500
➜  bin git:(master) ✗ ./pulsar-admin namespaces delete sample/ns1
2021-12-16T16:51:07,550+0800 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/namespaces/sample/ns1?force=false] Failed to perform http delete request: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
HTTP 500 Internal Server Error
  1. Can still find the namespace after deleting it
➜  bin git:(master) ✗ ./pulsar-admin topics list sample/ns1
persistent://sample/ns1/__change_events
  1. Broker print error log
2022-01-12T19:53:57,495+0800 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics under namespace sample/ns1/0x00000000_0x40000000
java.util.concurrent.CompletionException: java.lang.NullPointerException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:791) ~[?:?]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.72.Final.jar:4.1.72.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.NullPointerException
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1386) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.SystemTopic.checkReplication(SystemTopic.java:62) ~[classes/:?]
	at org.apache.pulsar.broker.service.BrokerService$2.lambda$openLedgerComplete$0(BrokerService.java:1377) ~[classes/:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?]
	... 11 more
2022-01-12T19:53:57,495+0800 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] WARN  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:53755][persistent://sample/ns1/__change_events][reader-76d8cfb6b0] Failed to create consumer: consumerId=0, null
java.util.concurrent.CompletionException: java.lang.NullPointerException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:791) ~[?:?]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.72.Final.jar:4.1.72.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.NullPointerException
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1386) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.SystemTopic.checkReplication(SystemTopic.java:62) ~[classes/:?]
	at org.apache.pulsar.broker.service.BrokerService$2.lambda$openLedgerComplete$0(BrokerService.java:1377) ~[classes/:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?]
	... 11 mor

I thought about it, Terminate the topic it may not solve the problem. Because terminate a topic does not prevent it from being created again.

Now, after terminate the topic, only the producer connection will be rejected, the consumer can still connect and consume, and the topic will still trigger automatic creation.

@shibd
Copy link
Member

shibd commented Jan 12, 2022

Screenshot of supplementary test:

image

image

@gaoran10
Copy link
Contributor

Hi, @Technoboy- , could you take a look at the @shibd 's comments?

@Technoboy-
Copy link
Contributor Author

Emm,I running in standalone mode, I tested the latest code and still have some problems.

  1. Deleting a namespace may be return 500
➜  bin git:(master) ✗ ./pulsar-admin namespaces delete sample/ns1
2021-12-16T16:51:07,550+0800 [AsyncHttpClient-7-1] WARN  org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/namespaces/sample/ns1?force=false] Failed to perform http delete request: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
HTTP 500 Internal Server Error
  1. Can still find the namespace after deleting it
➜  bin git:(master) ✗ ./pulsar-admin topics list sample/ns1
persistent://sample/ns1/__change_events
  1. Broker print error log
2022-01-12T19:53:57,495+0800 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics under namespace sample/ns1/0x00000000_0x40000000
java.util.concurrent.CompletionException: java.lang.NullPointerException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:791) ~[?:?]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.72.Final.jar:4.1.72.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.NullPointerException
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1386) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.SystemTopic.checkReplication(SystemTopic.java:62) ~[classes/:?]
	at org.apache.pulsar.broker.service.BrokerService$2.lambda$openLedgerComplete$0(BrokerService.java:1377) ~[classes/:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?]
	... 11 more
2022-01-12T19:53:57,495+0800 [bookkeeper-ml-scheduler-OrderedScheduler-0-0] WARN  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:53755][persistent://sample/ns1/__change_events][reader-76d8cfb6b0] Failed to create consumer: consumerId=0, null
java.util.concurrent.CompletionException: java.lang.NullPointerException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1081) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:791) ~[?:?]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.72.Final.jar:4.1.72.Final]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.lang.NullPointerException
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.checkReplication(PersistentTopic.java:1386) ~[classes/:?]
	at org.apache.pulsar.broker.service.persistent.SystemTopic.checkReplication(SystemTopic.java:62) ~[classes/:?]
	at org.apache.pulsar.broker.service.BrokerService$2.lambda$openLedgerComplete$0(BrokerService.java:1377) ~[classes/:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) ~[?:?]
	... 11 mor

I thought about it, Terminate the topic it may not solve the problem. Because terminate a topic does not prevent it from being created again.

Now, after terminate the topic, only the producer connection will be rejected, the consumer can still connect and consume, and the topic will still trigger automatic creation.

Yes, we may need some other solution for it.

@michaeljmarshall
Copy link
Member

Removing release label and milestone since the PR is closed. Please re-add if we need to open the PR.

@michaeljmarshall michaeljmarshall removed this from the 2.10.0 milestone Feb 11, 2022
@Technoboy- Technoboy- deleted the avoid-system-topic-check branch August 10, 2022 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot delete non empty bundle

7 participants