Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulsar-io Sinks & Sources tests fail (flake?) with "Failed to open ledger" #14936

Closed
dlg99 opened this issue Mar 29, 2022 · 3 comments · Fixed by #14962
Closed

pulsar-io Sinks & Sources tests fail (flake?) with "Failed to open ledger" #14936

dlg99 opened this issue Mar 29, 2022 · 3 comments · Fixed by #14962
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@dlg99
Copy link
Contributor

dlg99 commented Mar 29, 2022

Describe the bug

"CI - Integration - Pulsar-IO Sinks and Sources / pulsar-io" job fails on multiple PRs and fail on re-runs.
Failed integration test seems to not be affected by the changes in the PR per se.

Most commonly I see testRabbitMQSink failures, but sometimes testKinesis or testKafka.

Example PRs: #14922 #14912 #14916

To Reproduce

Rerun "CI - Integration - Pulsar-IO Sinks and Sources / pulsar-io" on PRs, the job fails quite repeatedly.

Expected behavior

The job pass.

Log

2022-03-29T14:25:38,055+0000 INFO  [main] o.a.p.t.MockitoCleanupListener@40 - Cleaning up Mockito's ThreadSafeMockingProgress.MOCKING_PROGRESS_PROVIDER thread local state. {}
Error:  Tests run: 12, Failures: 3, Errors: 0, Skipped: 6, Time elapsed: 866.072 s <<< FAILURE! - in TestSuite
Error:  testKafkaSink(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)  Time elapsed: 132.423 s  <<< FAILURE!
org.apache.pulsar.client.api.PulsarClientException: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger","reqId":1253756777559247167, "remote":"localhost/127.0.0.1:49251", "local":"/127.0.0.1:48208"}
	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:1074)
	at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:88)
	at org.apache.pulsar.tests.integration.io.sinks.SinkTester.produceMessage(SinkTester.java:118)
	at org.apache.pulsar.tests.integration.io.PulsarIOTestRunner.produceMessagesToInputTopic(PulsarIOTestRunner.java:93)
	at org.apache.pulsar.tests.integration.io.sinks.PulsarIOSinkRunner.runSinkTester(PulsarIOSinkRunner.java:116)
	at org.apache.pulsar.tests.integration.io.PulsarIOTestBase.testSink(PulsarIOTestBase.java:50)
	at org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest.testKafkaSink(PulsarSinksTest.java:34)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger","reqId":1253756777559247167, "remote":"localhost/127.0.0.1:49251", "local":"/127.0.0.1:48208"}
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
	at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:86)
	... 17 more
Caused by: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger caused by org.apache.pulsar.broker.service.schema.exceptions.SchemaException: Error while recovering ledger -  ledger=5 - operation=Failed to open ledger","reqId":1253756777559247167, "remote":"localhost/127.0.0.1:49251", "local":"/127.0.0.1:48208"}
	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(ClientCnx.java:1196)
	at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:726)
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:178)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:327)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:299)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more

Error:  testKinesis(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)  Time elapsed: 0 s  <<< FAILURE!
org.apache.pulsar.tests.FailFastNotifier$FailFastSkipException: Skipped after failure since testFailFast system property is set.
	at org.apache.pulsar.tests.FailFastNotifier.beforeInvocation(FailFastNotifier.java:88)

Error:  testRabbitMQSink(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)  Time elapsed: 0 s  <<< FAILURE!
org.apache.pulsar.tests.FailFastNotifier$FailFastSkipException: Skipped after failure since testFailFast system property is set.
	at org.apache.pulsar.tests.FailFastNotifier.beforeInvocation(FailFastNotifier.java:88)

[INFO] 
[INFO] Results:
[INFO] 
Error:  Failures: 
Error:  org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest.testKafkaSink(org.apache.pulsar.tests.integration.io.sinks.PulsarSinksTest)
[INFO]   Run 1: PASS
Error:    Run 2: PulsarSinksTest.testKafkaSink:34->PulsarIOTestBase.testSink:50 » PulsarClient ...
[INFO] 
Error:    PulsarSinksTest.testKinesis » FailFastSkip Skipped after failure since testFai...
Error:    PulsarSinksTest.testRabbitMQSink » FailFastSkip Skipped after failure since te...
[INFO] 
Error:  Tests run: 9, Failures: 3, Errors: 0, Skipped: 3
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Pulsar :: Tests 2.10.0-SNAPSHOT:
[INFO] 
[INFO] Apache Pulsar :: Tests ............................. SUCCESS [  2.587 s]
[INFO] Apache Pulsar :: Tests :: Integration .............. FAILURE [14:35 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  14:38 min
[INFO] Finished at: 2022-03-29T14:25:38Z
[INFO] ------------------------------------------------------------------------
Error:  Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M3:test (default-test) on project integration: There are test failures.
Error:  
Error:  Please refer to /home/runner/work/pulsar/pulsar/tests/integration/target/surefire-reports for the individual test results.
Error:  Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
Error:  -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please read the following articles:
Error:  [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:    mvn <args> -rf :integration
Error: Process completed with exit code 1.
6s

@dlg99 dlg99 added the type/bug The PR fixed a bug or issue reported a bug label Mar 29, 2022
@eolivelli
Copy link
Contributor

In integration tests we run the bookies and the brokers in docker containers.
maybe there is some leak ?
cc @lhotari @nicoloboschi

@nicoloboschi
Copy link
Contributor

Maybe related to the recent pull #14796
btw the containers shutdown is handled in the same way for every tester

@lhotari
Copy link
Member

lhotari commented Mar 31, 2022

Will be fixed by #14962

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants