Skip to content

Fix Netty 4.2 shutdown and TLS compatibility gaps#17989

Open
xiangfu0 wants to merge 3 commits intoapache:masterfrom
xiangfu0:codex/debug-query-routing-flake
Open

Fix Netty 4.2 shutdown and TLS compatibility gaps#17989
xiangfu0 wants to merge 3 commits intoapache:masterfrom
xiangfu0:codex/debug-query-routing-flake

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Mar 26, 2026

Summary

  • keep QueryServer shutdown deterministic under Netty 4.2 by rejecting late child channels, draining tracked channels, and awaiting event-loop termination
  • close broker-side plain and TLS ServerChannels during QueryRouter shutdown so broker-to-server sockets do not outlive the broker
  • plumb peer host/port and endpointIdentificationAlgorithm through the Netty and AsyncHttpClient TLS paths used by the broker, Java client, JDBC client, and admin client

How to reproduce the issue

  • check out master after PR Bump Netty from 4.1.132 to 4.2.11 with compatibility fixes #17980 but before this PR
  • run ./mvnw -pl pinot-core -Dtest=QueryRoutingTest test repeatedly and observe intermittent failures in testValidResponse, testServerDown, or testSkipUnavailableServer
  • configure broker/server or Java/JDBC/admin HTTPS clients with endpointIdentificationAlgorithm=HTTPS
  • observe that the affected transports created SSLEngines without the peer host/port, so explicit hostname verification and SNI could not be applied consistently
  • stop a broker with active server channels and observe that broker-side Netty channel pools relied on pre-4.2 shutdown behavior and were not fully drained or awaited

Testing

  • ./mvnw -pl pinot-core,pinot-clients/pinot-java-client,pinot-clients/pinot-jdbc-client spotless:apply
  • ./mvnw -pl pinot-core,pinot-clients/pinot-java-client,pinot-clients/pinot-jdbc-client checkstyle:check
  • ./mvnw -pl pinot-core,pinot-clients/pinot-java-client,pinot-clients/pinot-jdbc-client license:format
  • ./mvnw -pl pinot-core,pinot-clients/pinot-java-client,pinot-clients/pinot-jdbc-client license:check
  • ./mvnw -pl pinot-core,pinot-clients/pinot-java-client,pinot-clients/pinot-jdbc-client -am -Dtest=ServerChannelsTest,QueryRoutingTest,SslContextProviderTest test -Dsurefire.failIfNoSpecifiedTests=false -rf :pinot-core

@xiangfu0 xiangfu0 added the flaky-test Tracks a test that intermittently fails label Mar 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a QueryServer shutdown race introduced/exposed after the Netty 4.2 upgrade by ensuring shutdown state is set early, newly accepted channels are rejected during teardown, and event loop groups are fully terminated before returning.

Changes:

  • Add a shutdown flag to reject/close newly accepted child channels during teardown.
  • Drain/close tracked client channels during shutdown with a bounded timeout.
  • Block on boss/worker EventLoopGroup termination (syncUninterruptibly) so peers observe shutdown promptly.

Comment on lines +197 to +201
for (SocketChannel ch : channels) {
long remainingMs = Math.max(1L, deadlineMs - System.currentTimeMillis());
if (!ch.close().awaitUninterruptibly(remainingMs)) {
LOGGER.warn("Timed out waiting for client channel to close during shutdown: {}", ch);
return;
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closeAcceptedChannels() returns immediately on the first channel that doesn't close within the remaining deadline. That stops attempting to close any other tracked channels and also skips the final "{} client channels" warning, which can leave additional client connections open and undermine the goal of draining channels before event loop shutdown. Consider initiating close() on all current channels first, then awaiting their closeFuture() up to the shared deadline, and avoid the early return (e.g., log per-channel timeouts but continue and emit the summary warning at the end).

Suggested change
for (SocketChannel ch : channels) {
long remainingMs = Math.max(1L, deadlineMs - System.currentTimeMillis());
if (!ch.close().awaitUninterruptibly(remainingMs)) {
LOGGER.warn("Timed out waiting for client channel to close during shutdown: {}", ch);
return;
// Initiate close on all channels first so they can progress in parallel.
for (SocketChannel ch : channels) {
ch.close();
}
// Now wait for each channel to close, bounded by the shared deadline.
for (SocketChannel ch : channels) {
long remainingMs = Math.max(1L, deadlineMs - System.currentTimeMillis());
if (!ch.closeFuture().awaitUninterruptibly(remainingMs)) {
LOGGER.warn("Timed out waiting for client channel to close during shutdown: {}", ch);

Copilot uses AI. Check for mistakes.
@xiangfu0 xiangfu0 force-pushed the codex/debug-query-routing-flake branch from 1ab8903 to 2ac1a11 Compare March 26, 2026 21:51
@xiangfu0 xiangfu0 changed the title Fix QueryServer shutdown race after Netty 4.2 Fix Netty 4.2 shutdown and TLS compatibility gaps Mar 26, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 62.77372% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.30%. Comparing base (ecaf0f8) to head (e739040).

Files with missing lines Patch % Lines
...rg/apache/pinot/core/transport/ServerChannels.java 66.66% 7 Missing and 2 partials ⚠️
...ient/JsonAsyncHttpPinotClientTransportFactory.java 22.22% 6 Missing and 1 partial ⚠️
...org/apache/pinot/client/utils/ConnectionUtils.java 37.50% 4 Missing and 1 partial ⚠️
...nt/controller/PinotControllerTransportFactory.java 37.50% 4 Missing and 1 partial ⚠️
...a/org/apache/pinot/core/transport/QueryServer.java 81.48% 2 Missing and 3 partials ⚠️
...main/java/org/apache/pinot/client/BrokerCache.java 20.00% 3 Missing and 1 partial ⚠️
...apache/pinot/client/admin/PinotAdminTransport.java 25.00% 3 Missing ⚠️
...main/java/org/apache/pinot/client/PinotDriver.java 0.00% 3 Missing ⚠️
...inot/client/JsonAsyncHttpPinotClientTransport.java 71.42% 2 Missing ⚠️
...ot/client/controller/PinotControllerTransport.java 71.42% 2 Missing ⚠️
... and 4 more
Additional details and impacted files
@@            Coverage Diff             @@
##             master   #17989    +/-   ##
==========================================
  Coverage     63.30%   63.30%            
  Complexity     1543     1543            
==========================================
  Files          3200     3201     +1     
  Lines        194169   194278   +109     
  Branches      29915    29930    +15     
==========================================
+ Hits         122914   122994    +80     
- Misses        61610    61626    +16     
- Partials       9645     9658    +13     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.27% <62.77%> (+<0.01%) ⬆️
java-21 63.28% <62.77%> (+0.01%) ⬆️
temurin 63.30% <62.77%> (+<0.01%) ⬆️
unittests 63.30% <62.77%> (+<0.01%) ⬆️
unittests1 55.54% <77.63%> (+<0.01%) ⬆️
unittests2 34.20% <27.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the codex/debug-query-routing-flake branch 2 times, most recently from a25f26e to de30465 Compare March 27, 2026 01:11
@xiangfu0 xiangfu0 requested a review from Copilot March 27, 2026 01:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Comment on lines 55 to +60
builder.setSslEngineFactory((config, peerHost, peerPort) -> {
SSLEngine engine = sslContext.createSSLEngine(peerHost, peerPort);
engine.setUseClientMode(true);
SSLParameters sslParameters = engine.getSSLParameters();
sslParameters.setEndpointIdentificationAlgorithm(endpointIdentificationAlgorithm);
engine.setSSLParameters(sslParameters);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpointIdentificationAlgorithm is set on SSLParameters unconditionally. Several call sites pass an empty string to mean "disabled"; in JSSE an empty-but-non-null algorithm can be treated as an unknown endpoint ID algorithm and fail the TLS handshake. Treat null/blank as "not set" (skip calling setEndpointIdentificationAlgorithm) and consider having the 3-arg configure(...) delegate pass null rather than "" to preserve prior behavior.

Copilot uses AI. Check for mistakes.
}
io.netty.channel.ChannelFuture closeFuture = channel.close();
if (closeFuture != null) {
closeFuture.syncUninterruptibly();
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closeChannel(...) now blocks with syncUninterruptibly() on the channel close future. This method is invoked from DirectOOMHandler.exceptionCaught(...) (runs on the Netty event-loop thread), so blocking here can stall the event loop or deadlock shutdown. Prefer non-blocking close (listener) and only await close completion from non-event-loop threads (e.g., in ServerChannels.shutDown() with an inEventLoop() guard).

Suggested change
closeFuture.syncUninterruptibly();
closeFuture.addListener(future -> {
if (!future.isSuccess()) {
LOGGER.warn("Failed to close channel for server routing instance: {}", _serverRoutingInstance,
future.cause());
}
});

Copilot uses AI. Check for mistakes.
Comment on lines +198 to +203
long remainingMs = Math.max(1L, deadlineMs - System.currentTimeMillis());
if (!ch.close().awaitUninterruptibly(remainingMs)) {
LOGGER.warn("Timed out waiting for client channel to close during shutdown: {}", ch);
return;
}
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first channel that times out, the method logs and returns, which stops attempting to close any remaining accepted channels. For deterministic shutdown it’s better to continue closing the rest and then emit a final warning if any channels are still open (e.g., track a timeout flag rather than early-returning).

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +98
_endpointIdentificationAlgorithm =
org.apache.pinot.client.utils.DriverUtils.getTlsConfigFromJDBCProps(properties)
.getEndpointIdentificationAlgorithm();
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses the fully-qualified org.apache.pinot.client.utils.DriverUtils inline, while other JDBC classes import it directly (e.g., pinot-clients/pinot-jdbc-client/src/main/java/org/apache/pinot/client/PinotDriver.java:41). Consider adding an import and using DriverUtils to keep usage consistent and improve readability.

Copilot uses AI. Check for mistakes.
@xiangfu0 xiangfu0 requested review from Copilot and removed request for Jackie-Jiang and yashmayya March 27, 2026 02:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

Comment on lines +89 to +93
if (endpointIdentificationAlgorithm != null && !endpointIdentificationAlgorithm.isBlank()) {
SSLParameters sslParameters = sslHandler.engine().getSSLParameters();
sslParameters.setEndpointIdentificationAlgorithm(endpointIdentificationAlgorithm);
sslHandler.engine().setSSLParameters(sslParameters);
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configureClientTlsHandler only sets the endpoint identification algorithm when the config value is non-blank; it never explicitly clears/disables it. If the underlying SSLEngine already has an endpoint identification algorithm set (e.g., via TLS stack defaults), then configuring TlsConfig.endpointIdentificationAlgorithm as blank/null will not disable hostname verification as intended. Consider always applying SSLParameters here and setting setEndpointIdentificationAlgorithm(null) when the config is blank/null so the behavior is deterministic.

Suggested change
if (endpointIdentificationAlgorithm != null && !endpointIdentificationAlgorithm.isBlank()) {
SSLParameters sslParameters = sslHandler.engine().getSSLParameters();
sslParameters.setEndpointIdentificationAlgorithm(endpointIdentificationAlgorithm);
sslHandler.engine().setSSLParameters(sslParameters);
}
SSLParameters sslParameters = sslHandler.engine().getSSLParameters();
if (endpointIdentificationAlgorithm == null || endpointIdentificationAlgorithm.isBlank()) {
// Clear any existing endpoint identification algorithm so behavior matches the Pinot config.
sslParameters.setEndpointIdentificationAlgorithm(null);
} else {
sslParameters.setEndpointIdentificationAlgorithm(endpointIdentificationAlgorithm);
}
sslHandler.engine().setSSLParameters(sslParameters);

Copilot uses AI. Check for mistakes.
Comment on lines +225 to +228
setSilentShutdown();
}
io.netty.channel.ChannelFuture closeFuture = channel.close();
if (closeFuture != null) {
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using the fully-qualified io.netty.channel.ChannelFuture type inline here; the rest of the file uses imports for Netty types. Import ChannelFuture at the top for consistency/readability.

Copilot uses AI. Check for mistakes.
Comment on lines 56 to 64
public PinotControllerTransport(Map<String, String> headers, String scheme, @Nullable SSLContext sslContext,
ConnectionTimeouts connectionTimeouts, TlsProtocols tlsProtocols, @Nullable String appId) {
ConnectionTimeouts connectionTimeouts, TlsProtocols tlsProtocols, @Nullable String appId,
String endpointIdentificationAlgorithm) {
_headers = headers;
_scheme = scheme;

DefaultAsyncHttpClientConfig.Builder builder = Dsl.config();
if (sslContext != null) {
builder.setSslContext(new JdkSslContext(sslContext, true, ClientAuth.OPTIONAL));
}
SSL_CONTEXT_PROVIDER.configure(builder, sslContext, tlsProtocols, endpointIdentificationAlgorithm);

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public PinotControllerTransport constructor signature changed by adding endpointIdentificationAlgorithm, which is a source/binary breaking change for any downstream code instantiating this transport directly. Consider adding an overloaded constructor with the previous signature that delegates to this one with a default (e.g., empty string / null) to preserve compatibility.

Copilot uses AI. Check for mistakes.
Comment on lines 76 to +86
public JsonAsyncHttpPinotClientTransport(Map<String, String> headers, String scheme, String extraOptionString,
boolean useMultistageEngine, @Nullable SSLContext sslContext, ConnectionTimeouts connectionTimeouts,
TlsProtocols tlsProtocols, @Nullable String appId) {
TlsProtocols tlsProtocols, @Nullable String appId, String endpointIdentificationAlgorithm) {
_brokerReadTimeout = connectionTimeouts.getReadTimeoutMs();
_headers = headers;
_scheme = scheme;
_extraOptionStr = StringUtils.isEmpty(extraOptionString) ? DEFAULT_EXTRA_QUERY_OPTION_STRING : extraOptionString;
_useMultistageEngine = useMultistageEngine;

Builder builder = Dsl.config();
SSL_CONTEXT_PROVIDER.configure(builder, sslContext, tlsProtocols);
SSL_CONTEXT_PROVIDER.configure(builder, sslContext, tlsProtocols, endpointIdentificationAlgorithm);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public constructor JsonAsyncHttpPinotClientTransport(Map, String, ...) now requires an extra endpointIdentificationAlgorithm parameter, which is a breaking API change for callers who instantiate this transport directly. Consider adding an overload matching the prior signature that delegates to this constructor with a default value so existing client code continues to compile/run unchanged.

Copilot uses AI. Check for mistakes.
@xiangfu0 xiangfu0 force-pushed the codex/debug-query-routing-flake branch from 20a6a15 to d3089a1 Compare March 27, 2026 03:19
@xiangfu0 xiangfu0 requested a review from Copilot March 27, 2026 03:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

SSLContext sslContext = SSLContext.getDefault();
builder.setSslContext(new io.netty.handler.ssl.JdkSslContext(sslContext, true,
io.netty.handler.ssl.ClientAuth.OPTIONAL));
TlsConfig tlsConfig = ConnectionUtils.getTlsConfigFromProperties(properties);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectionUtils.getSSLContextFromProperties() calls TlsUtils.installDefaultSSLSocketFactory(...), which mutates JVM-global SSL state (default HttpsURLConnection socket factory + static TlsUtils SSLContext). Introducing that side effect into PinotAdminTransport construction can interfere with other HTTPS clients in the same process; consider creating a dedicated SSLContext for AsyncHttpClient without installing it as the JVM default (or clearly documenting this global side effect).

Suggested change
TlsConfig tlsConfig = ConnectionUtils.getTlsConfigFromProperties(properties);
TlsConfig tlsConfig = ConnectionUtils.getTlsConfigFromProperties(properties);
// NOTE: ConnectionUtils.getSSLContextFromProperties(...) installs a JVM-global default SSLSocketFactory
// and updates the static TlsUtils SSLContext. Constructing PinotAdminTransport with HTTPS enabled
// therefore mutates process-wide SSL/TLS state and may affect other HTTPS clients in the same JVM.
// This behavior is currently relied upon by existing Pinot components; callers should be aware that
// using HTTPS here has global side effects on SSL configuration.

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +50
PinotControllerTransport transport =
new PinotControllerTransport(null, "https", sslContext, ConnectionTimeouts.create(1000, 1000, 1000),
TlsProtocols.defaultProtocols(false), null);

Field httpClientField = PinotControllerTransport.class.getDeclaredField("_httpClient");
httpClientField.setAccessible(true);
AsyncHttpClient httpClient = (AsyncHttpClient) httpClientField.get(transport);

assertNull(httpClient.getConfig().getSslEngineFactory().newSslEngine(httpClient.getConfig(), "localhost", 443)
.getSSLParameters().getEndpointIdentificationAlgorithm());

transport.close();
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transport.close() is called only at the end of the test; if an assertion fails earlier, the AsyncHttpClient may not be closed and can leak threads/resources. Wrap the assertion section in a try/finally to guarantee transport.close() executes.

Suggested change
PinotControllerTransport transport =
new PinotControllerTransport(null, "https", sslContext, ConnectionTimeouts.create(1000, 1000, 1000),
TlsProtocols.defaultProtocols(false), null);
Field httpClientField = PinotControllerTransport.class.getDeclaredField("_httpClient");
httpClientField.setAccessible(true);
AsyncHttpClient httpClient = (AsyncHttpClient) httpClientField.get(transport);
assertNull(httpClient.getConfig().getSslEngineFactory().newSslEngine(httpClient.getConfig(), "localhost", 443)
.getSSLParameters().getEndpointIdentificationAlgorithm());
transport.close();
PinotControllerTransport transport = null;
try {
transport =
new PinotControllerTransport(null, "https", sslContext, ConnectionTimeouts.create(1000, 1000, 1000),
TlsProtocols.defaultProtocols(false), null);
Field httpClientField = PinotControllerTransport.class.getDeclaredField("_httpClient");
httpClientField.setAccessible(true);
AsyncHttpClient httpClient = (AsyncHttpClient) httpClientField.get(transport);
assertNull(httpClient.getConfig().getSslEngineFactory().newSslEngine(httpClient.getConfig(), "localhost", 443)
.getSSLParameters().getEndpointIdentificationAlgorithm());
} finally {
if (transport != null) {
transport.close();
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +146
SocketChannel slowChannel = mock(SocketChannel.class);
SocketChannel fastChannel = mock(SocketChannel.class);
ChannelFuture slowCloseFuture = mock(ChannelFuture.class);
ChannelFuture fastCloseFuture = mock(ChannelFuture.class);

org.mockito.InOrder inOrder = inOrder(slowChannel, fastChannel, slowCloseFuture, fastCloseFuture);
when(slowChannel.close()).thenReturn(slowCloseFuture);
when(fastChannel.close()).thenReturn(fastCloseFuture);
when(slowChannel.closeFuture()).thenReturn(slowCloseFuture);
when(fastChannel.closeFuture()).thenReturn(fastCloseFuture);
when(slowCloseFuture.awaitUninterruptibly(anyLong())).thenReturn(false);
when(fastCloseFuture.awaitUninterruptibly(anyLong())).thenReturn(true);

server.closeAcceptedChannelSnapshot(new SocketChannel[]{slowChannel, fastChannel},
System.currentTimeMillis() + 1_000L);

inOrder.verify(slowChannel).close();
inOrder.verify(fastChannel).close();
inOrder.verify(slowChannel).closeFuture();
inOrder.verify(slowCloseFuture).awaitUninterruptibly(anyLong());
inOrder.verify(fastChannel).closeFuture();
inOrder.verify(fastCloseFuture).awaitUninterruptibly(anyLong());
verify(fastChannel).close();
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCloseAcceptedChannelSnapshotClosesRemainingChannelsAfterTimeout() creates a QueryServer (which allocates Netty event loop threads in the constructor) but never calls server.shutDown(). This can leak threads and make the test suite hang/flaky; add a try/finally (or similar) to ensure shutdown even though the server is never started.

Suggested change
SocketChannel slowChannel = mock(SocketChannel.class);
SocketChannel fastChannel = mock(SocketChannel.class);
ChannelFuture slowCloseFuture = mock(ChannelFuture.class);
ChannelFuture fastCloseFuture = mock(ChannelFuture.class);
org.mockito.InOrder inOrder = inOrder(slowChannel, fastChannel, slowCloseFuture, fastCloseFuture);
when(slowChannel.close()).thenReturn(slowCloseFuture);
when(fastChannel.close()).thenReturn(fastCloseFuture);
when(slowChannel.closeFuture()).thenReturn(slowCloseFuture);
when(fastChannel.closeFuture()).thenReturn(fastCloseFuture);
when(slowCloseFuture.awaitUninterruptibly(anyLong())).thenReturn(false);
when(fastCloseFuture.awaitUninterruptibly(anyLong())).thenReturn(true);
server.closeAcceptedChannelSnapshot(new SocketChannel[]{slowChannel, fastChannel},
System.currentTimeMillis() + 1_000L);
inOrder.verify(slowChannel).close();
inOrder.verify(fastChannel).close();
inOrder.verify(slowChannel).closeFuture();
inOrder.verify(slowCloseFuture).awaitUninterruptibly(anyLong());
inOrder.verify(fastChannel).closeFuture();
inOrder.verify(fastCloseFuture).awaitUninterruptibly(anyLong());
verify(fastChannel).close();
try {
SocketChannel slowChannel = mock(SocketChannel.class);
SocketChannel fastChannel = mock(SocketChannel.class);
ChannelFuture slowCloseFuture = mock(ChannelFuture.class);
ChannelFuture fastCloseFuture = mock(ChannelFuture.class);
org.mockito.InOrder inOrder = inOrder(slowChannel, fastChannel, slowCloseFuture, fastCloseFuture);
when(slowChannel.close()).thenReturn(slowCloseFuture);
when(fastChannel.close()).thenReturn(fastCloseFuture);
when(slowChannel.closeFuture()).thenReturn(slowCloseFuture);
when(fastChannel.closeFuture()).thenReturn(fastCloseFuture);
when(slowCloseFuture.awaitUninterruptibly(anyLong())).thenReturn(false);
when(fastCloseFuture.awaitUninterruptibly(anyLong())).thenReturn(true);
server.closeAcceptedChannelSnapshot(new SocketChannel[]{slowChannel, fastChannel},
System.currentTimeMillis() + 1_000L);
inOrder.verify(slowChannel).close();
inOrder.verify(fastChannel).close();
inOrder.verify(slowChannel).closeFuture();
inOrder.verify(slowCloseFuture).awaitUninterruptibly(anyLong());
inOrder.verify(fastChannel).closeFuture();
inOrder.verify(fastCloseFuture).awaitUninterruptibly(anyLong());
verify(fastChannel).close();
} finally {
server.shutDown();
}

Copilot uses AI. Check for mistakes.

Assert.assertEquals(a, b);
Assert.assertEquals(a.hashCode(), b.hashCode());

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equalsAndHashCodeShouldIncludeAllowedProtocolsAndEndpointIdentificationAlgorithm() no longer verifies that changes to allowedProtocols affect equals()/hashCode(). The test name claims both fields are covered, but the assertions only exercise endpointIdentificationAlgorithm; add an assertion that mutating allowedProtocols makes the objects unequal (in addition to the endpoint algorithm check).

Suggested change
// Changing allowedProtocols should affect equals/hashCode
b.setAllowedProtocols(new String[]{"TLSv1.3"});
Assert.assertNotEquals(a, b);
// Changing endpointIdentificationAlgorithm should also affect equals/hashCode

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +56
* @param tlsProtocols configured TLS protocol list
* @param endpointIdentificationAlgorithm endpoint identification algorithm for hostname verification
* @return the same builder for chaining
*/
default DefaultAsyncHttpClientConfig.Builder configure(DefaultAsyncHttpClientConfig.Builder builder,
@Nullable SSLContext sslContext, TlsProtocols tlsProtocols, @Nullable String endpointIdentificationAlgorithm) {
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new configure(..., TlsProtocols tlsProtocols, @Nullable String endpointIdentificationAlgorithm) overload is being called with tlsProtocols = null (e.g., admin transport). To avoid an unclear contract and potential NPEs in other SslContextProvider implementations, either annotate tlsProtocols as @Nullable (and document that null means “don’t configure protocols”), or require a non-null value at call sites.

Suggested change
* @param tlsProtocols configured TLS protocol list
* @param endpointIdentificationAlgorithm endpoint identification algorithm for hostname verification
* @return the same builder for chaining
*/
default DefaultAsyncHttpClientConfig.Builder configure(DefaultAsyncHttpClientConfig.Builder builder,
@Nullable SSLContext sslContext, TlsProtocols tlsProtocols, @Nullable String endpointIdentificationAlgorithm) {
* @param tlsProtocols configured TLS protocol list; {@code null} means protocols should not be configured
* @param endpointIdentificationAlgorithm endpoint identification algorithm for hostname verification
* @return the same builder for chaining
*/
default DefaultAsyncHttpClientConfig.Builder configure(DefaultAsyncHttpClientConfig.Builder builder,
@Nullable SSLContext sslContext, @Nullable TlsProtocols tlsProtocols,
@Nullable String endpointIdentificationAlgorithm) {
if (tlsProtocols == null) {
// Do not configure protocols when none are provided.
return builder;
}

Copilot uses AI. Check for mistakes.
@xiangfu0 xiangfu0 requested a review from Copilot March 27, 2026 04:20
@xiangfu0 xiangfu0 review requested due to automatic review settings March 27, 2026 05:16
@xiangfu0 xiangfu0 force-pushed the codex/debug-query-routing-flake branch from 90a2dc0 to fec8d01 Compare March 27, 2026 06:31
@xiangfu0 xiangfu0 requested review from Copilot March 27, 2026 06:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Comment on lines +94 to +98
// NOTE: ConnectionUtils.getSSLContextFromProperties(...) installs a JVM-global default SSLSocketFactory
// and updates the shared TlsUtils SSLContext. PinotAdminTransport has historically inherited that process-
// wide side effect from the shared helper, so callers enabling HTTPS here should treat the TLS config as
// affecting other HttpsURLConnection-based clients in the same JVM.
SSLContext sslContext = ConnectionUtils.getSSLContextFromProperties(properties);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PinotAdminTransport now calls ConnectionUtils.getSSLContextFromProperties(properties), which installs a JVM-global default SSLSocketFactory and updates the shared TlsUtils SSLContext. This is a behavioral change from the previous approach (SSLContext.getDefault()) and can unintentionally affect other HTTPS clients in the same JVM. Consider constructing an SSLContext for this AsyncHttpClient instance without mutating global defaults (e.g., build from the extracted TlsConfig/SSLFactory directly), or keep the prior per-client SSLContext behavior; if the global side effect is intentional, the comment should be updated to reflect that it is newly introduced here and document the impact explicitly.

Suggested change
// NOTE: ConnectionUtils.getSSLContextFromProperties(...) installs a JVM-global default SSLSocketFactory
// and updates the shared TlsUtils SSLContext. PinotAdminTransport has historically inherited that process-
// wide side effect from the shared helper, so callers enabling HTTPS here should treat the TLS config as
// affecting other HttpsURLConnection-based clients in the same JVM.
SSLContext sslContext = ConnectionUtils.getSSLContextFromProperties(properties);
// Use the JVM default SSLContext to avoid mutating global TLS configuration here; TlsConfig is still
// consulted for endpoint identification behavior via the SslContextProvider.
SSLContext sslContext = SSLContext.getDefault();

Copilot uses AI. Check for mistakes.
@xiangfu0 xiangfu0 force-pushed the codex/debug-query-routing-flake branch from 623a964 to b9d11b3 Compare March 27, 2026 08:44
Address the Netty 4.2 regression and follow-up compatibility issues on the PR branch in a single commit.\n\nThis squashes the previous PR stack into one change that:\n- fixes the QueryServer shutdown race and accepted-channel draining\n- resolves remaining Netty 4.2 compatibility issues in transport code\n- preserves TLS endpoint identification compatibility across client transports\n- incorporates the Copilot follow-up fixes and related test cleanup\n- documents the PinotAdminTransport TLS side effects
@xiangfu0 xiangfu0 force-pushed the codex/debug-query-routing-flake branch from b9d11b3 to e739040 Compare March 29, 2026 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flaky-test Tracks a test that intermittently fails

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants