Skip topics with remote replication producers in topic inactivity check#16622
Conversation
|
Question:
|
@mattisonchao
This PR doesn't introduce a breaking change.
No. |
|
@lhotari |
- currently there's a problem that the replicators get shutdown in the inactivity check - this is an improved fix for the issue described in apache#11382
…call - the test is flaky and this change could help resolve the issue
65e59c8 to
76b6eae
Compare
|
@merlimat @codelipenghui Please review |
|
|
||
| // For global namespace, close repl producers first. | ||
| // Once all repl producers are closed, we can delete the topic, | ||
| // provided no remote producers connected to the broker. |
There was a problem hiding this comment.
From this comment, it seems we have to close the repl producer first. Otherwise, in a two-way replication scenario. We can get to a "deadlock" and the topic is never deleted, even if it has no backlog.
I'm not sure if it's your expected behaviour or if I'm missing some replicator close logic.
There was a problem hiding this comment.
If we have two clusters, cluster A and cluster B. They enable two-way replication. So, cluster A has remote producer B, and cluster B has remote producer A.
In the previous logic. when the replicator has no more backlog (Cluster A), we will close the replicator of Cluster A. And cluster B will have no remote producer A. In the next round of GC check, if cluster B also has no backlog. At this point, cluster B's replicator will be close and cluster A will also remove cluster B's remote producer. Then in the next new round of GC, we will clean up the topics on both clusters.
But after this PR is modified, if two clusters enable two-way replication, they hold each other's remote producers. At this point, the topic will enter a kind of circular chain, and the check GC will never delete the topic until a replicator is closed.
There was a problem hiding this comment.
+1
I also have the same concern
And do we have a test for covering the inactive geo topic deletion?
The CI gets passed.
There was a problem hiding this comment.
I can add the test to cover this behaviour If we need it.
There was a problem hiding this comment.
I can add the test to cover this behaviour If we need it.
+1, @mattisonchao
codelipenghui
left a comment
There was a problem hiding this comment.
It's better to have a clear behavior or the behavior we expected for how inactive topic check work with geo-replication topic.
I think the change will lead to the inactive topic will always skips the topic enabled geo-replication.
|
|
||
| // For global namespace, close repl producers first. | ||
| // Once all repl producers are closed, we can delete the topic, | ||
| // provided no remote producers connected to the broker. |
There was a problem hiding this comment.
+1
I also have the same concern
And do we have a test for covering the inactive geo topic deletion?
The CI gets passed.
|
The pr had no activity for 30 days, mark with Stale label. |
|
@lhotari Please add the following content to your PR description and select a checkbox: |
Motivation
Modifications
Additional context
An alternative solution is to disable delete-while-inactive for namespaces that are replicated.