Skip to content

HADOOP-19881. Fix "port in use" detection in TestFrameDecoder#8472

Merged
pan3793 merged 2 commits intoapache:trunkfrom
pan3793:HADOOP-19881
May 8, 2026
Merged

HADOOP-19881. Fix "port in use" detection in TestFrameDecoder#8472
pan3793 merged 2 commits intoapache:trunkfrom
pan3793:HADOOP-19881

Conversation

@pan3793
Copy link
Copy Markdown
Member

@pan3793 pan3793 commented May 6, 2026

Description of PR

  1. BindException was not caught — main cause of the failure shown in the log.
    Netty's ChannelFuture#sync() uses PlatformDependent.throwException to "sneaky-throw" the original cause. When the OS returns "Address already in use", the cause is java.net.BindException (which extends IOException), not ChannelException. The original catch (InterruptedException | ChannelException e) missed it entirely, so the exception propagated out of the loop and failed the test on the first attempt - exactly matching the stack trace in the failure log.
Error:  Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.224 s <<< FAILURE! -- in org.apache.hadoop.oncrpc.TestFrameDecoder
Error:  org.apache.hadoop.oncrpc.TestFrameDecoder.testFrames -- Time elapsed: 0.013 s <<< ERROR!
java.net.BindException: Address already in use
	at java.base/sun.nio.ch.Net.bind0(Native Method)
	at java.base/sun.nio.ch.Net.bind(Net.java:567)
	at java.base/sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:337)
	at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:294)
	at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:561)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1281)
	at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:600)
	at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:579)
	at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:922)
	at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:259)
	at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:384)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:840)
	Suppressed: java.lang.RuntimeException: Rethrowing promise failure cause
		at io.netty.util.concurrent.DefaultPromise.rethrowIfFailed(DefaultPromise.java:686)
		at io.netty.util.concurrent.DefaultPromise.sync(DefaultPromise.java:420)
		at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:119)
		at io.netty.channel.DefaultChannelPromise.sync(DefaultChannelPromise.java:30)
		at org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:88)
		at org.apache.hadoop.oncrpc.TestFrameDecoder.startRpcServer(TestFrameDecoder.java:237)
		at org.apache.hadoop.oncrpc.TestFrameDecoder.testFrames(TestFrameDecoder.java:177)
		at java.base/java.lang.reflect.Method.invoke(Method.java:569)
		at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
		at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
  1. The port increment could be zero.
    serverPort += rand.nextInt(20) returns [0, 20), so on retry the same busy port could be picked again. Changed to 1 + rand.nextInt(20) so the port is always bumped.

  2. InterruptedException was lumped with "port in use".
    An external thread interrupt should not trigger a port-bump retry. Split into its own handler that restores the interrupt flag and propagates.

Contains content generated by: Claude Opus 4.7

How was this patch tested?

Run dozens of rounds

./mvnw test -pl hadoop-common-project/hadoop-common -am -Dtest=TestFrameDecoder
...
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.oncrpc.TestFrameDecoder
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.036 s -- in org.apache.hadoop.oncrpc.TestFrameDecoder
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (HADOOP-19881)?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

If an AI tool was used:

@hadoop-yetus
Copy link
Copy Markdown

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 5m 8s Docker failed to build run-specific yetus/hadoop:tp-13463}.
Subsystem Report/Notes
GITHUB PR #8472
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8472/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the TestFrameDecoder test helper that starts a Netty-based RPC server, aiming to make “port already in use” retries reliable and to avoid incorrectly treating thread interrupts as retriable bind failures.

Changes:

  • Catch broader bind failures during server startup and retry only when the failure looks like a port-in-use condition.
  • Ensure each retry actually changes the port (+ 1 + rand.nextInt(20)), avoiding a zero increment.
  • Handle InterruptedException separately by restoring the interrupt flag and rethrowing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;

import java.io.IOException;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adopted

Comment on lines +266 to +274
* bound by another process. Handles {@link ChannelException} thrown by
* Netty as well as {@link BindException} that may be sneaky-thrown from
* {@code ChannelFuture#sync()}.
*/
private static boolean isPortInUse(Throwable t) {
Throwable cursor = t;
while (cursor != null) {
if (cursor instanceof BindException || cursor instanceof ChannelException) {
return true;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partially adopted. only match BindException, but do not check the message - this is not something in api contract, it might change in future JDK releases.

@hadoop-yetus
Copy link
Copy Markdown

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 9m 56s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 45m 10s trunk passed
+1 💚 compile 16m 4s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 16m 21s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 30s trunk passed
+1 💚 mvnsite 1m 56s trunk passed
+1 💚 javadoc 1m 29s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 23s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 2s trunk passed
+1 💚 shadedclient 30m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 11s the patch passed
+1 💚 compile 15m 14s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 15m 14s the patch passed
+1 💚 compile 16m 30s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 16m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 27s the patch passed
+1 💚 mvnsite 1m 54s the patch passed
+1 💚 javadoc 1m 24s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 26s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 17s the patch passed
+1 💚 shadedclient 30m 57s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 28s hadoop-common in the patch passed.
+1 💚 asflicense 1m 12s The patch does not generate ASF License warnings.
226m 2s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8472/2/artifact/out/Dockerfile
GITHUB PR #8472
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f94a6eaaefd1 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8aa2aac
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8472/2/testReport/
Max. process+thread count 3151 (vs. ulimit of 10000)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8472/2/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented May 8, 2026

@slfan1989 I addressed the Copilot review comments, and Yetus says good. Do you have further comments?

@slfan1989
Copy link
Copy Markdown
Contributor

LGTM

@pan3793 pan3793 merged commit 8b0ffef into apache:trunk May 8, 2026
6 checks passed
pan3793 added a commit that referenced this pull request May 8, 2026
Reviewed-by: Shilun Fan <slfan1989@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793
Copy link
Copy Markdown
Member Author

pan3793 commented May 8, 2026

thanks, merged to trunk/branch-3.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants