Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Replication stopped due to unload topic failed #21947

Merged
merged 2 commits into from
Jan 25, 2024

Conversation

poorbarcode
Copy link
Contributor

Motivation

Steps to reproduce the issue

  • Enable replication.
  • Send 10 messages to the local cluster then close the producer.
  • Call pulsar-admin topics unload <topic> and get an error due to the internal producer of the replicator close failing.
  • The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

Root cause

Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode self-assigned this Jan 22, 2024
@poorbarcode poorbarcode added release/2.10.6 category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost release/3.0.3 release/2.11.4 release/3.1.3 labels Jan 22, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 22, 2024
@poorbarcode poorbarcode added this to the 3.3.0 milestone Jan 22, 2024
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b14fcb4) 73.60% compared to head (75937fc) 73.65%.
Report is 3 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #21947      +/-   ##
============================================
+ Coverage     73.60%   73.65%   +0.04%     
- Complexity    32411    32437      +26     
============================================
  Files          1861     1861              
  Lines        138674   138673       -1     
  Branches      15182    15182              
============================================
+ Hits         102070   102138      +68     
+ Misses        28702    28649      -53     
+ Partials       7902     7886      -16     
Flag Coverage Δ
inttests 24.13% <0.00%> (-0.07%) ⬇️
systests 23.69% <0.00%> (-0.03%) ⬇️
unittests 72.93% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ache/pulsar/broker/service/AbstractReplicator.java 80.00% <100.00%> (+3.96%) ⬆️

... and 76 files with indirect coverage changes

@poorbarcode poorbarcode merged commit 200c5f4 into apache:master Jan 25, 2024
65 of 68 checks passed
@Technoboy- Technoboy- modified the milestones: 3.3.0, 3.2.0 Jan 27, 2024
Technoboy- pushed a commit that referenced this pull request Jan 27, 2024
### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing. 
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.
Technoboy- pushed a commit that referenced this pull request Jan 31, 2024
### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing. 
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.
Technoboy- pushed a commit that referenced this pull request Feb 5, 2024
### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing. 
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.
Technoboy- pushed a commit that referenced this pull request Feb 5, 2024
### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing. 
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.
nodece pushed a commit to nodece/pulsar that referenced this pull request Feb 23, 2024
…#21947)

### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing. 
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.
nodece pushed a commit to nodece/pulsar that referenced this pull request Feb 23, 2024
poorbarcode added a commit that referenced this pull request Feb 28, 2024
### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing.
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.

(cherry picked from commit 49f6a9f)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 1, 2024
…#21947)

### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing.
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.

(cherry picked from commit 291bbb5)
mukesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Mar 6, 2024
…#21947)

### Motivation

**Steps to reproduce the issue**
- Enable replication.
- Send `10` messages to the local cluster then close the producer.
- Call `pulsar-admin topics unload <topic>` and get an error due to the internal producer of the replicator close failing.
- The topic closed failed, so we assumed the topic could work as expected, but the replication stopped.

**Root cause**
- `pulsar-admin topics unload <topic>`  will wait for the clients(including `consumers & producers & replicators`) to close successfully, and it will fail if clients can not be closed successfully.
- `replicator.producer` close failed causing the Admin API to fail, but there is a scheduled task that will retry to close `replicator.producer` which causes replication to stop. see https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractReplicator.java#L209

### Modifications

since the "replicator.producer.closeAsync()" will retry after it fails, the topic unload should be successful.

(cherry picked from commit 291bbb5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants