Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed NPE when closing batch during a reconnection #4427

Merged
merged 1 commit into from
Jun 1, 2019

Conversation

merlimat
Copy link
Contributor

Motivation

There is a race condition when closing a batch and the producer is getting disconnected. This has been seen in some of the tests run and can be reproduced by running this particular test many times in a loop.

2019-05-31\T\16:32:49.647 [ERROR] testProducerReconnect(org.apache.pulsar.client.api.ClientErrorsTest)  Time elapsed: 0.034 s  <<< FAILURE!
org.apache.pulsar.client.api.PulsarClientException: java.lang.NullPointerException
	at org.apache.pulsar.client.impl.ProducerImpl.batchMessageAndSend(ProducerImpl.java:1354)
	at org.apache.pulsar.client.impl.ProducerImpl.triggerFlush(ProducerImpl.java:1282)
	at org.apache.pulsar.client.impl.ProducerBase.send(ProducerBase.java:82)
	at org.apache.pulsar.client.impl.TypedMessageBuilderImpl.send(TypedMessageBuilderImpl.java:61)
	at org.apache.pulsar.client.impl.ProducerBase.send(ProducerBase.java:52)
	at org.apache.pulsar.client.api.ClientErrorsTest.testProducerReconnect(ClientErrorsTest.java:600)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:124)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:54)
	at org.testng.internal.InvokeMethodRunnable.run(InvokeMethodRunnable.java:44)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
	at org.apache.pulsar.client.impl.ProducerImpl.batchMessageAndSend(ProducerImpl.java:1329)
	... 17 more

The problem is that we're checking isConnected() and then using the cnx() which can return null if the producer is disconnected. Rather, we should first cache the reference to the ClientCnx instance and then double check if we're connected.

The same pattern is already applied for the non-batching scenario:

ClientCnx cnx = cnx();
if (isConnected()) {
// If we do have a connection, the message is sent immediately, otherwise we'll try again once a
// new
// connection is established
cmd.retain();
cnx.ctx().channel().eventLoop().execute(WriteInEventLoopCallback.create(this, cnx, op));

@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label May 31, 2019
@merlimat merlimat added this to the 2.4.0 milestone May 31, 2019
@merlimat merlimat self-assigned this May 31, 2019
@merlimat
Copy link
Contributor Author

run java8 tests

2 similar comments
@merlimat
Copy link
Contributor Author

run java8 tests

@merlimat
Copy link
Contributor Author

run java8 tests

@merlimat merlimat merged commit 2a5b772 into apache:master Jun 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants