[grid] Close pre-handshake race in WebSocket proxy#17435
Conversation
Frames received from the upstream WebSocket before the client-side handshake completes used to traverse MessageOutboundConverter via the fallback consumer. After onUpgradeComplete strips the converter, an in-flight frame on the upstream listener thread could land in a pipeline that no longer had its Message-layer handlers and silently drop. Move the channel reference and a pre-handshake buffer into the listener: pre-handshake frames are queued, the buffer drains under a lock when the channel is handed over, and post-handshake calls take the lock-free fast path.
Review Summary by QodoBuffer pre-handshake WebSocket frames to prevent race condition
WalkthroughsDescription• Buffer pre-handshake WebSocket frames in DirectForwardingListener to prevent silent drops • Drain buffered frames in order when client channel becomes available post-handshake • Replace fallback Message consumer with lock-protected frame queue for deterministic ordering • Add unit test validating pre-handshake frame buffering and post-handshake fast path Diagramflowchart LR
A["Pre-handshake frames arrive"] -->|enqueue| B["DirectForwardingListener buffer"]
C["onUpgradeComplete fires"] -->|drain in order| B
B -->|write to channel| D["Client receives frames"]
E["Post-handshake frames"] -->|volatile read| F["Fast path direct write"]
F --> D
File Changes1. java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java
|
Code Review by Qodo
1.
|
Address two issues raised on SeleniumHQ#17435: - Pre-handshake onClose/onError used to discard the close details and publish the channel for normal forwarding when the handshake later landed, leaving the client open indefinitely. Record the code/reason and, on onUpgrade, drain any frames the upstream had queued and then write the close frame followed by closing the channel. - The pending deque had no cap, so a stalled handshake while the upstream keeps producing could grow it without bound. Cap at 128 frames; on overflow drop the buffer, latch a 1009 terminal state, and close the upstream so the channel sees a clean close once the upgrade fires.
|
Pushed
The Ruby Windows test failure is unrelated — |
|
Persistent review updated to latest commit bf52afc |
If pre-handshake onClose kept the buffered ref-counted WebSocketFrames "for a future onUpgrade drain" and the client-side handshake never landed (client disconnected mid-handshake, or the upgrade itself failed and triggered the upstream close), those frames were never released — a Netty buffer leak. Make onClose symmetric with onError: drop the buffer on pre-handshake close, while still recording the code/reason so onUpgrade can surface a clean close to the client if the handshake does fire later. The trade-off is small: a client that just received the 101 response cannot usefully consume frames if we are about to close them immediately afterward.
|
Good catch — pushed
Made Existing test updated to reflect the new behaviour (only the close frame on the wire after pre-handshake close + upgrade). |
|
Persistent review updated to latest commit 0cd57dd |
Summary
Frames received from the upstream WebSocket before the client-side handshake completes used to travel through
MessageOutboundConvertervia a fallbackConsumer<Message>. AfteronUpgradeCompleterewires the pipeline (removing the Message-layer handlers), a frame in flight on the upstream listener thread could land in a pipeline that no longer had its converter and silently drop.Move the client
Channelreference and a pre-handshake buffer intoDirectForwardingListeneritself:onText/onBinarycalls queue aWebSocketFrameonto a per-listener deque under a lock.FrameProxyConsumer.onUpgradeCompleteinvokeslistener.onUpgrade(channel), which drains the buffer in arrival order and then publishes the channel asvolatile.onClose/onErrorpre-handshake release buffered frames and latch aclosedflag so any in-flight call doesn't leak ref-counted frames.A focused
EmbeddedChannel-based test pins the buffer-then-drain order; the existing mediumProxyWebsocketTestcontinues to cover the end-to-end Router→Node path.Test plan
🤖 Generated with Claude Code