[Bug] Producer synchronous retries can cause retry sendAsync future to never complete

### Search before reporting

- [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.


### Read release policy

- [x] I understand that [unsupported versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions) don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.


### User environment

- master

### Issue Description

There is a reentrancy bug in the Pulsar producer send path where `pendingMessages.clear()` can be executed after a retry message has already been added to `pendingMessages`. This results in the retry send’s CompletableFuture never being completed.

This can occur when a retry `sendAsync` is triggered synchronously from within a handleSync callback of a failed send, while holding the producer mutex.

This happens in the [failPendingMessages](https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L2315) method which usually runs on the timer thread.
As the [pendingMessages.clear()](https://github.com/apache/pulsar/blob/d630394cdd02792b2dbc3a55443637a5d593a137/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java#L2327) is after the completeExceptionally, the retry logic as the code below will add the retryMessage to pendingMessages first and then the clear is called.

```
CompletableFuture<MessageId> firstSend = producer.sendAsync(message);

CompletableFuture<MessageId> retrySend =
                firstSend.handleAsync((msgId, ex) -> {
                    assertNotNull(ex, "First send must timeout");
                    assertTrue(ex instanceof PulsarClientException.TimeoutException);
                    return producer.sendAsync(retryMessage);
                }).thenCompose(f -> f);
```

### Error messages

```text

```

### Reproducing the issue

Set a low timeout value and use synchronous retries as given in the above example.

### Additional information

_No response_

### Are you willing to submit a PR?

- [x] I'm willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Producer synchronous retries can cause retry sendAsync future to never complete #25201

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Additional information

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Producer synchronous retries can cause retry sendAsync future to never complete #25201

Description

Search before reporting

Read release policy

User environment

Issue Description

Error messages

Reproducing the issue

Additional information

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions