HADOOP-19881. Fix "port in use" detection in TestFrameDecoder#8472
HADOOP-19881. Fix "port in use" detection in TestFrameDecoder#8472pan3793 merged 2 commits intoapache:trunkfrom
Conversation
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Pull request overview
This PR updates the TestFrameDecoder test helper that starts a Netty-based RPC server, aiming to make “port already in use” retries reliable and to avoid incorrectly treating thread interrupts as retriable bind failures.
Changes:
- Catch broader bind failures during server startup and retry only when the failure looks like a port-in-use condition.
- Ensure each retry actually changes the port (
+ 1 + rand.nextInt(20)), avoiding a zero increment. - Handle
InterruptedExceptionseparately by restoring the interrupt flag and rethrowing.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import static org.junit.jupiter.api.Assertions.assertFalse; | ||
| import static org.junit.jupiter.api.Assertions.assertTrue; | ||
|
|
||
| import java.io.IOException; |
| * bound by another process. Handles {@link ChannelException} thrown by | ||
| * Netty as well as {@link BindException} that may be sneaky-thrown from | ||
| * {@code ChannelFuture#sync()}. | ||
| */ | ||
| private static boolean isPortInUse(Throwable t) { | ||
| Throwable cursor = t; | ||
| while (cursor != null) { | ||
| if (cursor instanceof BindException || cursor instanceof ChannelException) { | ||
| return true; |
There was a problem hiding this comment.
partially adopted. only match BindException, but do not check the message - this is not something in api contract, it might change in future JDK releases.
|
🎊 +1 overall
This message was automatically generated. |
|
@slfan1989 I addressed the Copilot review comments, and Yetus says good. Do you have further comments? |
|
LGTM |
Reviewed-by: Shilun Fan <slfan1989@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
|
thanks, merged to trunk/branch-3.5 |
Description of PR
BindExceptionwas not caught — main cause of the failure shown in the log.Netty's
ChannelFuture#sync()usesPlatformDependent.throwExceptionto "sneaky-throw" the original cause. When the OS returns "Address already in use", the cause isjava.net.BindException(which extendsIOException), notChannelException. The originalcatch (InterruptedException | ChannelException e)missed it entirely, so the exception propagated out of the loop and failed the test on the first attempt - exactly matching the stack trace in the failure log.The port increment could be zero.
serverPort += rand.nextInt(20)returns[0, 20), so on retry the same busy port could be picked again. Changed to1 + rand.nextInt(20)so the port is always bumped.InterruptedExceptionwas lumped with "port in use".An external thread interrupt should not trigger a port-bump retry. Split into its own handler that restores the interrupt flag and propagates.
Contains content generated by: Claude Opus 4.7
How was this patch tested?
Run dozens of rounds
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
If an AI tool was used:
where is the name of the AI tool used.
https://www.apache.org/legal/generative-tooling.html