[grid] Apply the WebSocket frame fast path on the Node#17545
Conversation
Review Summary by QodoApply WebSocket frame fast path on Node side
WalkthroughsDescription• Implement WebSocket frame fast path on Node side to bypass Message layer • Add DirectForwardingListener for direct frame forwarding to client channel • Buffer pre-handshake frames and drain on upgrade completion • Add overflow protection with 128-frame limit and clean close on buffer overflow • Preserve session heartbeats and connection release semantics Diagramflowchart LR
A["Upstream WebSocket"] -->|onText/onBinary| B["DirectForwardingListener"]
B -->|pre-handshake| C["Frame Buffer"]
B -->|post-handshake| D["WebSocketFrameProxy"]
C -->|onUpgrade| D
D -->|direct write| E["Client Channel"]
B -->|onClose/onError| F["Clean Close Frame"]
F -->|write| E
File Changes1. java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java
|
Code Review by Qodo
1. UTF-8 truncation can expand
|
40168a7 to
94b1257
Compare
|
Persistent review updated to latest commit 94b1257 |
|
Addressed in 94b1257: added a The Router-side |
| private static String truncateCloseReason(String reason) { | ||
| if (reason == null) { | ||
| return ""; | ||
| } | ||
| byte[] bytes = reason.getBytes(UTF_8); | ||
| if (bytes.length <= 123) { | ||
| return reason; | ||
| } | ||
| byte[] truncated = Arrays.copyOf(bytes, 123); | ||
| Arrays.fill(truncated, 120, 123, (byte) '.'); | ||
| return new String(truncated, UTF_8); | ||
| } |
There was a problem hiding this comment.
1. Utf-8 truncation can expand 🐞 Bug ≡ Correctness
DirectForwardingListener.truncateCloseReason truncates a UTF-8 byte array to 123 bytes and then builds a String; if the cut lands mid multi-byte sequence, Java will decode with U+FFFD replacement characters that can re-encode to more than 123 bytes and still break CloseWebSocketFrame encoding.
Agent Prompt
## Issue description
`truncateCloseReason` truncates the UTF-8 *bytes* and then decodes them back into a `String`. If the truncation splits a multi-byte UTF-8 codepoint, decoding will insert a U+FFFD replacement character (3 bytes in UTF-8), which can make the resulting string exceed the 123-byte WebSocket close-reason limit when it is later encoded by Netty.
This defeats the intended hardening: a non-ASCII close reason near the boundary can still cause `CloseWebSocketFrame` encoding/validation failures and prevent a clean close/teardown.
## Issue Context
`onClose` uses `truncateCloseReason(reason)` to produce `safeReason`, then constructs `new CloseWebSocketFrame(code, safeReason)`.
## Fix Focus Areas
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[524-598]
## Implementation notes
- Replace the byte-truncate-then-decode approach with a truncation that guarantees the *final* UTF-8 encoded size is `<= 123`.
- Example approach: iterate over the input string’s code points (or progressively shorten a substring) until `candidate.getBytes(UTF_8).length <= 123`, then return that candidate (optionally appending "..." while staying within the limit).
- Prefer a linear-time approach using a `CharsetEncoder` writing into a 123-byte `ByteBuffer`, ensuring you never end on an incomplete sequence.
- Add a unit test that uses multi-byte characters (e.g., repeated `é` or another 2-byte UTF-8 character) so truncation is forced to cut mid-sequence, and assert the resulting reason’s UTF-8 byte length is `<= 123`.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
94b1257 to
72a9bcc
Compare
|
Persistent review updated to latest commit 72a9bcc |
|
Pushed 72a9bcc — the bot is right, the byte-truncate-then-decode pattern I copied from Replaced the helper with a The existing test was replaced with one that builds a 200-byte reason out of The follow-up I owe for the Router-side listener (in |
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11, "[grid] Router WebSocket handle dropped close frames, idle disconnects, high-latency proxying", SeleniumHQ#17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side. The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node. Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed. The hardening that the Router-side listener picked up in 8d8cf64 (2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy", SeleniumHQ#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes. Close-frame reasons coming from the upstream are now truncated to the 123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses a CharsetEncoder writing into a 120-byte buffer so it stops at a clean character boundary on overflow — a naive byte-truncate-then- decode could split a multi-byte sequence, produce a U+FFFD replacement on decode, and re-encode back over 123 bytes, breaking the close frame. The helper lives as a public static on WebSocketFrameProxy because both DirectForwardingListener classes already depend on that class. The Router-side listener that landed in SeleniumHQ#17435 had the same unchecked path; apply the helper there too so both proxies share the same safe behaviour. The Node-specific behaviour is preserved: - Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake. - The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here. - VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active. The existing ProxyNodeWebsocketsTest continues to exercise the slot accounting, including the regression from SeleniumHQ#17197 where onError without a follow-on onClose used to leak the slot. New unit tests in NodeDirectForwardingListenerTest pin the per-frame heartbeat, the buffer-then-drain ordering, the surface-and-teardown behaviour on a pre-handshake close, the overflow path's clean release of the session slot, and the safe truncation of an overlong upstream close reason that contains multi-byte UTF-8 characters. The shared helper has a focused unit test alongside it in WebSocketFrameProxyTest.
72a9bcc to
ed555f3
Compare
|
Persistent review updated to latest commit ed555f3 |
|
Pushed ed555f3 — extended the scope of the PR per Simon's note: extracted the safe truncation to a public static `WebSocketFrameProxy.truncateCloseReason` and applied it to the Router-side `DirectForwardingListener` in the same change. Both listeners now share the same helper, which uses `CharsetEncoder` into a fixed 120-byte buffer so it cannot leave a partial multi-byte sequence behind. A focused unit test next to the helper covers the multi-byte case directly; the existing Node integration test continues to exercise the call site. The Router side had the same bug as the Node side — it landed unchanged in #17435 because the original `WebSocketUpgradeHandler.exceptionCaught` pattern was unsafe and we copied it. Fixing both at once means there is no transient window with one side safe and the other still broken. PR title and description still describe this as the Node-side fast path PR. Happy to add a follow-up sentence to the description noting the Router-side truncation if you'd like, or split into two commits within the PR — but since you said "make the fix at the same time" I assumed one squash commit was preferred. |
Summary
The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (#17197). Once the client-side handshake completes, an inbound
WebSocketFrameProxyforwards each NettyWebSocketFramestraight to the upstream WebSocket, and the outboundDirectForwardingListenerwrites upstream replies directly to the client channel. Together those removed the per-frameMessageallocation and the executor hop inWebSocketMessageHandleron the Router side.The Node still did the full round-trip through
MessageInboundConverter,WebSocketMessageHandler, the registeredConsumer<Message>, andMessageOutboundConverterin both directions for every frame. Each frame allocated aTextMessageorBinaryMessageand hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node.Apply the same
PostUpgradeHookpattern on the Node side: the consumer returned fromProxyNodeWebsocketsinstalls aWebSocketFrameProxyafter the handshake so inbound frames forward straight to the browser-side WebSocket, and aDirectForwardingListenerwrites outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed.The hardening that the Router-side listener picked up in 8d8cf64 (#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late
onUpgradecan write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes.Close-frame reasons coming from the upstream are now truncated to the 123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses a
CharsetEncoderwriting into a 120-byte buffer so it stops at a clean character boundary on overflow — a naive byte-truncate-then-decode could split a multi-byte sequence, produce aU+FFFDreplacement on decode, and re-encode back over 123 bytes, breaking the close frame. The helper lives as a public static onWebSocketFrameProxybecause bothDirectForwardingListenerclasses already depend on that class. The Router-side listener that landed in #17435 had the same unchecked path; apply the helper there too so both proxies share the same safe behaviour.The Node-specific behaviour is preserved:
sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake.connectionReleasedCAS still guards a singlenode.releaseConnectioncall across the close and error paths, including the overflow path introduced here.🤖 Generated with Claude Code