-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate multi node testkit to Netty 4. #486
Conversation
...-testkit/src/main/mima-filters/1.0.1.backwards.excludes/migrate-to-netty4.backwards.excludes
Outdated
Show resolved
Hide resolved
22cc854
to
ad512eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - pekko-mult-node-testkit is documented to be 'API may change'
private[pekko] trait RemoteConnection { | ||
def channel: Channel | ||
def shutdown(): Unit | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new trait is added for help.
Thanks @pjfanning for the review , waiting another LGTM before merge it. |
@He-Pin So I am getting failures when running I ran the test with JDK 11 + export JAVA_8_HOME=/usr/lib/jvm/java-8-openjdk/ and also a ran a |
@mdedetrich these are my outputs and I don't have the errors in your gist, not sure why. I just rebased on the current main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im sorry but I have to block the PR at least for now because at least on my system its causing a regression in the cluster/MultiJvm/test
.
I did an additional run with the latest rebase and its still failing (see https://gist.github.com/mdedetrich/5f68a3e3faa9cf8ee220547618fb89e2) however if I check out current main
then the tests pass fine (see https://gist.github.com/mdedetrich/73ceedc04f51fda4381e9a3af7ec7bb4 ).
I had a deeper look, and it appears that the core issue is that its unable to resolve a connection, i.e.
[info] [JVM-3-MultiDcMultiJvmNode3] [ERROR] [08/01/2023 09:16:40.161] [MultiDc-pekko.actor.internal-dispatcher-3] [pekko://MultiDc/system/TestConductorClient] Connection refused: localhost/127.0.0.1:4711
[info] [JVM-3-MultiDcMultiJvmNode3] org.apache.pekko.actor.ActorInitializationException: pekko://MultiDc/system/TestConductorClient: exception during creation, root cause message: [Connection refused]
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.ActorInitializationException$.apply(Actor.scala:206)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:679)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.ActorCell.invokeAll$1(ActorCell.scala:523)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.ActorCell.systemInvoke(ActorCell.scala:545)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:305)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:240)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
[info] [JVM-3-MultiDcMultiJvmNode3] Caused by: java.lang.reflect.InvocationTargetException
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.util.Reflect$.instantiate(Reflect.scala:82)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.ArgsReflectConstructor.produce(IndirectActorProducer.scala:111)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.Props.newActor(Props.scala:236)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.ActorCell.newActor(ActorCell.scala:626)
[info] [JVM-3-MultiDcMultiJvmNode3] at org.apache.pekko.actor.ActorCell.create(ActorCell.scala:653)
[info] [JVM-3-MultiDcMultiJvmNode3] ... 10 more
[info] [JVM-3-MultiDcMultiJvmNode3] Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:4711
[info] [JVM-3-MultiDcMultiJvmNode3] Caused by: java.net.ConnectException: Connection refused
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
[info] [JVM-3-MultiDcMultiJvmNode3] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[info] [JVM-3-MultiDcMultiJvmNode3] at java.base/java.lang.Thread.run(Thread.java:829)
I think netty/netty#6865 is related and you got pinged in that.
@pjfanning @raboof @jrudolph If you have time do you mind checking out this PR's branch and running |
I'm seeing lots of test failures - similar to what @mdedetrich reported. I think we need to go back to the drawing board and start by getting the multi-jvm tests up and running in CI and sorting out the logging. The tests are reporting lots of slf4j binding issues - so we need to ensure that logback or something like it is set up when running the JVMs used in these tests. |
iirc Making these tests run in our current CI is problematic because of high flakiness due to noisy neighbour problem/VM's being very weak for the cluster tests, there is a reason why the tests were never enabled in CI for Akka. At least with Akka, before Lightbend would make a release/merge a PR like this they would test on machines like we are. Although this is another topic, this is one of the problems that dedicated hardware was meant to solve. We can try temporarily enabling the tests only on this PR to see that it won't break |
|
multi-node-testkit/src/main/scala/org/apache/pekko/remote/testconductor/RemoteConnection.scala
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
I am getting issues that I think are related to #486 (comment), see https://gist.github.com/mdedetrich/e4f0cec36d405bd0ba21921849d8a07a |
cd1c74a makes org.apache.pekko.cluster.SplitBrainQuarantine work on my laptop - it was failing before this. |
@mdedetrich @pjfanning Wired, I didn't see this on my Windows, and just saw it when try to run |
@He-Pin So I just re-ran the tests against latest state of the branch and I got the following https://gist.github.com/mdedetrich/d91a5d9e1731a20806ce6e04e58aa9ca. Running |
@mdedetrich Maybe it's bind on a different ip(not loopback) and try to connect with I mean, the |
Yes this definitely seems related although I wonder why its only occurring when updating to Netty 4, I guess Netty 4 may have changed loopback/resolution defaults? |
0f525d8
to
aec80d4
Compare
@He-Pin Here are the results of https://gist.github.com/mdedetrich/5c0acdba4cfb4bd180395e273033b880 at commit 0f525d8 |
37709b3
to
f826ddc
Compare
@He-Pin Here are the results of https://gist.github.com/mdedetrich/1f483dc433b32be6c9199c55c7de0b02 at commit aec80d4 |
@He-Pin Here are the results for https://gist.github.com/mdedetrich/e1cf848f28a03861df1be3dd318e8c64 at commit f826ddc |
I can 100% reproduce it on linux box, still find the root cause. |
Update: I will try to allocate sometime this weekend for this. |
d868d85
to
41b6152
Compare
Signed-off-by: He-Pin <hepin1989@gmail.com>
Signed-off-by: He-Pin <hepin1989@gmail.com>
Signed-off-by: He-Pin <hepin1989@gmail.com>
Close now,will reopen after I tested on linux box. |
refs: #462
I verified it locally with
cluster/MultiJvm/test