Skip to content

[FlakyTest]Try to fix flaky test ReplicatorTest.testReplicatorOnPartitionedTopic.#9659

Merged
codelipenghui merged 4 commits intoapache:masterfrom
MarvinCai:zxc/replicator-test-fix
Feb 25, 2021
Merged

[FlakyTest]Try to fix flaky test ReplicatorTest.testReplicatorOnPartitionedTopic.#9659
codelipenghui merged 4 commits intoapache:masterfrom
MarvinCai:zxc/replicator-test-fix

Conversation

@MarvinCai
Copy link
Contributor

@MarvinCai MarvinCai commented Feb 21, 2021

Fixes #9457

Motivation

Not sure why state is not cleaned up but exception is indicating metadata of namespace the test trying to create already exist so use a different namespace name for different test run should able to fix it.

Modifications

Append System.nanoTime to the namespace used to avoid being affect by previous state. This seems working fine for other test cases in the same test.

Try to make the test stable by append System.nanoTime to namespace it creates to avoid "Namespace already exists" exception.
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually that error is due to the test retry policy and it hides the real error.

This fix cannot be enough to fix the flakyness.

Do you have the real error in the test? That is the logs of the first run that failed?

@MarvinCai
Copy link
Contributor Author

Usually that error is due to the test retry policy and it hides the real error.

This fix cannot be enough to fix the flakyness.

Do you have the real error in the test? That is the logs of the first run that failed?

@eolivelli Sample stack trace is

org.apache.pulsar.client.admin.PulsarAdminException$ConflictException: Namespace already exists
	at org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:219)
	at org.apache.pulsar.client.admin.internal.BaseResource$1.failed(BaseResource.java:129)
	at org.glassfish.jersey.client.JerseyInvocation$1.failed(JerseyInvocation.java:839)
	at org.glassfish.jersey.client.JerseyInvocation$1.completed(JerseyInvocation.java:820)
	at org.glassfish.jersey.client.ClientRuntime.processResponse(ClientRuntime.java:229)
	at org.glassfish.jersey.client.ClientRuntime.access$200(ClientRuntime.java:62)
	at org.glassfish.jersey.client.ClientRuntime$2.lambda$response$0(ClientRuntime.java:173)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:288)
	at org.glassfish.jersey.client.ClientRuntime$2.response(ClientRuntime.java:173)
	at org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$apply$1(AsyncHttpConnector.java:210)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector.lambda$retryOperation$3(AsyncHttpConnector.java:251)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at org.asynchttpclient.netty.NettyResponseFuture.loadContent(NettyResponseFuture.java:222)
	at org.asynchttpclient.netty.NettyResponseFuture.done(NettyResponseFuture.java:257)
	at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.finishUpdate(AsyncHttpClientHandler.java:241)
	at org.asynchttpclient.netty.handler.HttpHandler.handleChunk(HttpHandler.java:114)
	at org.asynchttpclient.netty.handler.HttpHandler.handleRead(HttpHandler.java:143)
	at org.asynchttpclient.netty.handler.AsyncHttpClientHandler.channelRead(AsyncHttpClientHandler.java:78)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
Caused by: javax.ws.rs.ClientErrorException: HTTP 409 Conflict
	at org.glassfish.jersey.client.JerseyInvocation.createExceptionForFamily(JerseyInvocation.java:938)
	at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:921)
	at org.glassfish.jersey.client.JerseyInvocation.access$500(JerseyInvocation.java:77)
	... 54 more

Make the namespace name unique for different execution/retry should able to get rid of this exception.

@sijie sijie added this to the 2.8.0 milestone Feb 22, 2021
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if we want to go this way we should have a more consistent way of generating unique topic names.

Can we create a general purpose function that creates an unique topic name? IIRC there is something like that somewhere else is some suite

log.info("--- Starting ReplicatorTest::{} --- ", methodName);

final String namespace = "pulsar/partitionedNs-" + isPartitionedTopic;
final String namespace = "pulsar/partitionedNs-" + isPartitionedTopic + "-" + System.nanoTime();;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1)I searched the code, but I didn’t see a namespace with the same name in other places
2)There are two ";"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the duplicate ";"
It could be namespace left from other test run, the zookeeper used for test will have a temp dir to store data with same name. It could be really hard to debug in github action VM but make namespace unique across different test run should solve the issue for us.

@eolivelli
Copy link
Contributor

We could use something like
BrokerTestBase#newTopicName

making it a static Utility method (and possibly extracting it to a separate file

https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerTestBase.java#L93

@MarvinCai
Copy link
Contributor Author

@eolivelli Updated per you comment.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using the new method in BrokerTestBase, or better to remove the newTopicName in BrokerTestBase ?

@MarvinCai MarvinCai requested a review from 315157973 February 25, 2021 02:14
@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

2 similar comments
@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@MarvinCai
Copy link
Contributor Author

/pulsarbot run-failure-checks

@codelipenghui codelipenghui merged commit 738ec6a into apache:master Feb 25, 2021
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came late to the party.

btw LGTM

thank you @MarvinCai

eolivelli pushed a commit to datastax/pulsar that referenced this pull request May 21, 2021
…tionedTopic. (apache#9659)

Fixes apache#9457

Not sure why state is not cleaned up but exception is indicating metadata of namespace the test trying to create already exist so use a different namespace name for different test run should able to fix it.

Append System.nanoTime to the namespace used to avoid being affect by previous state. This seems working fine for other test cases in the same test.

(cherry picked from commit 738ec6a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky-test:ReplicatorTest.testReplicatorOnPartitionedTopic

5 participants