Skip to content

[grid] Apply the WebSocket frame fast path on the Node#17545

Open
shs96c wants to merge 1 commit into
SeleniumHQ:trunkfrom
shs96c:grid-node-frame-fast-path
Open

[grid] Apply the WebSocket frame fast path on the Node#17545
shs96c wants to merge 1 commit into
SeleniumHQ:trunkfrom
shs96c:grid-node-frame-fast-path

Conversation

@shs96c
Copy link
Copy Markdown
Member

@shs96c shs96c commented May 22, 2026

Summary

The Router has had a direct frame-forwarding path between the Netty pipeline and the upstream JDK WebSocket since db9b07a (#17197). Once the client-side handshake completes, an inbound WebSocketFrameProxy forwards each Netty WebSocketFrame straight to the upstream WebSocket, and the outbound DirectForwardingListener writes upstream replies directly to the client channel. Together those removed the per-frame Message allocation and the executor hop in WebSocketMessageHandler on the Router side.

The Node still did the full round-trip through MessageInboundConverter, WebSocketMessageHandler, the registered Consumer<Message>, and MessageOutboundConverter in both directions for every frame. Each frame allocated a TextMessage or BinaryMessage and hopped onto the channel executor on delivery. For a busy CDP or VNC session that is measurable allocation and executor-queue pressure on the Node.

Apply the same PostUpgradeHook pattern on the Node side: the consumer returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after the handshake so inbound frames forward straight to the browser-side WebSocket, and a DirectForwardingListener writes outbound frames directly to the client channel. Frames received before the handshake are buffered in arrival order and drained on handover, so a frame cannot land in a pipeline that has already had its Message-layer handlers removed.

The hardening that the Router-side listener picked up in 8d8cf64 (#17435) is mirrored on the Node listener: the pre-handshake buffer is capped at 128 frames with a 1009 close recorded on overflow; the close code and reason are recorded on pre-handshake close or error so a late onUpgrade can write a clean close frame to the client and tear the channel down rather than leaving it open; and the buffer is released on close so ref-counted frames cannot leak if the handshake never completes.

Close-frame reasons coming from the upstream are now truncated to the 123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses a CharsetEncoder writing into a 120-byte buffer so it stops at a clean character boundary on overflow — a naive byte-truncate-then-decode could split a multi-byte sequence, produce a U+FFFD replacement on decode, and re-encode back over 123 bytes, breaking the close frame. The helper lives as a public static on WebSocketFrameProxy because both DirectForwardingListener classes already depend on that class. The Router-side listener that landed in #17435 had the same unchecked path; apply the helper there too so both proxies share the same safe behaviour.

The Node-specific behaviour is preserved:

  • Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire per frame, both pre- and post-handshake.
  • The connectionReleased CAS still guards a single node.releaseConnection call across the close and error paths, including the overflow path introduced here.
  • VNC sessions still install a no-op heartbeat consumer so VNC traffic does not mark the session as recently active.

🤖 Generated with Claude Code

@selenium-ci selenium-ci added B-grid Everything grid and server related C-java Java Bindings B-build Includes scripting, bazel and CI integrations labels May 22, 2026
@shs96c shs96c marked this pull request as ready for review May 22, 2026 14:38
@qodo-code-review
Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Apply WebSocket frame fast path on Node side

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implement WebSocket frame fast path on Node side to bypass Message layer
• Add DirectForwardingListener for direct frame forwarding to client channel
• Buffer pre-handshake frames and drain on upgrade completion
• Add overflow protection with 128-frame limit and clean close on buffer overflow
• Preserve session heartbeats and connection release semantics
Diagram
flowchart LR
  A["Upstream WebSocket"] -->|onText/onBinary| B["DirectForwardingListener"]
  B -->|pre-handshake| C["Frame Buffer"]
  B -->|post-handshake| D["WebSocketFrameProxy"]
  C -->|onUpgrade| D
  D -->|direct write| E["Client Channel"]
  B -->|onClose/onError| F["Clean Close Frame"]
  F -->|write| E

Loading

File Changes

1. java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java ✨ Enhancement +278/-93

Implement frame fast path with buffering and pipeline rewiring

• Refactored ForwardingListener into DirectForwardingListener with frame buffering and direct
 channel writes
• Added FrameProxyConsumer class implementing Consumer and PostUpgradeHook for pipeline
 rewiring
• Implemented pre-handshake frame buffering with 128-frame overflow protection and 1009 close code
• Added onUpgrade callback to install WebSocketFrameProxy and remove Message-layer handlers
• Preserved session heartbeat firing and connection release semantics across handshake boundary
• Added close code/reason recording for pre-handshake terminal states

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java


2. java/test/org/openqa/selenium/grid/node/NodeDirectForwardingListenerTest.java 🧪 Tests +267/-0

Add DirectForwardingListener unit tests

• New test file with comprehensive coverage of DirectForwardingListener behavior
• Test pre-handshake frame buffering and in-order draining on upgrade
• Test pre-handshake close handling with proper close frame surfacing
• Test buffer overflow protection with 128-frame limit and clean close
• Verify session heartbeat firing and connection release on overflow

java/test/org/openqa/selenium/grid/node/NodeDirectForwardingListenerTest.java


3. java/src/org/openqa/selenium/grid/node/BUILD.bazel Dependencies +4/-0

Add Netty dependencies for frame fast path

• Added dependency on //java/src/org/openqa/selenium/netty/server for PostUpgradeHook and
 WebSocketFrameProxy
• Added Netty artifact dependencies: netty-buffer, netty-codec-http, netty-transport

java/src/org/openqa/selenium/grid/node/BUILD.bazel


View more (1)
4. java/test/org/openqa/selenium/grid/node/BUILD.bazel Dependencies +2/-0

Add Netty test dependencies

• Added Netty artifact dependencies: netty-codec-http, netty-transport for test support

java/test/org/openqa/selenium/grid/node/BUILD.bazel


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 22, 2026

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (0)

Grey Divider


Action required

1. UTF-8 truncation can expand 🐞 Bug ≡ Correctness
Description
DirectForwardingListener.truncateCloseReason truncates a UTF-8 byte array to 123 bytes and then
builds a String; if the cut lands mid multi-byte sequence, Java will decode with U+FFFD replacement
characters that can re-encode to more than 123 bytes and still break CloseWebSocketFrame encoding.
Code

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[R587-598]

Evidence
onClose relies on truncateCloseReason to ensure the close reason is within the RFC 6455 123-byte
UTF-8 limit before constructing a Netty CloseWebSocketFrame. However, truncateCloseReason
truncates raw UTF-8 bytes and decodes them with new String(truncated, UTF_8); if truncation splits
a multi-byte code point, the decoded string will contain replacement characters, which can re-encode
to more than 123 bytes and break the close-frame encoding again.

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[524-548]
java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[587-598]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`truncateCloseReason` truncates the UTF-8 *bytes* and then decodes them back into a `String`. If the truncation splits a multi-byte UTF-8 codepoint, decoding will insert a U+FFFD replacement character (3 bytes in UTF-8), which can make the resulting string exceed the 123-byte WebSocket close-reason limit when it is later encoded by Netty.

This defeats the intended hardening: a non-ASCII close reason near the boundary can still cause `CloseWebSocketFrame` encoding/validation failures and prevent a clean close/teardown.

## Issue Context
`onClose` uses `truncateCloseReason(reason)` to produce `safeReason`, then constructs `new CloseWebSocketFrame(code, safeReason)`.

## Fix Focus Areas
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[524-598]

## Implementation notes
- Replace the byte-truncate-then-decode approach with a truncation that guarantees the *final* UTF-8 encoded size is `<= 123`.
 - Example approach: iterate over the input string’s code points (or progressively shorten a substring) until `candidate.getBytes(UTF_8).length <= 123`, then return that candidate (optionally appending "..." while staying within the limit).
 - Prefer a linear-time approach using a `CharsetEncoder` writing into a 123-byte `ByteBuffer`, ensuring you never end on an incomplete sequence.
- Add a unit test that uses multi-byte characters (e.g., repeated `é` or another 2-byte UTF-8 character) so truncation is forced to cut mid-sequence, and assert the resulting reason’s UTF-8 byte length is `<= 123`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Close reason allocates bytes 🐞 Bug ⛨ Security ⭐ New
Description
WebSocketFrameProxy.truncateCloseReason uses reason.getBytes(UTF_8) to check if truncation is
needed, which allocates a byte[] for the entire (potentially untrusted) close reason. A peer that
sends an excessively large close reason can cause large allocations/GC pressure (or OOM) in the
close/error path before truncation is applied.
Code

java/src/org/openqa/selenium/netty/server/WebSocketFrameProxy.java[R190-193]

Evidence
The truncation helper explicitly calls reason.getBytes(UTF_8) for the length check, which
allocates a full byte array. The close reason value is passed directly from upstream WebSocket
onClose callbacks into Node/Router listeners and then into truncateCloseReason, so the input can
be larger than 123 bytes (the exact case the helper is designed to handle).

java/src/org/openqa/selenium/netty/server/WebSocketFrameProxy.java[186-199]
java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[521-546]
java/src/org/openqa/selenium/remote/http/jdk/JdkHttpClient.java[227-235]
java/src/org/openqa/selenium/grid/router/ProxyWebsocketsIntoGrid.java[332-357]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`WebSocketFrameProxy.truncateCloseReason` currently does a UTF-8 byte-length check via `reason.getBytes(UTF_8)`, which allocates an array proportional to the *full* close reason size. This defeats the purpose of safe truncation when the input is extremely large.

## Issue Context
Close reasons originate from upstream WebSocket peers (e.g., via `java.net.http.WebSocket.Listener.onClose`) and are not size-capped before reaching `truncateCloseReason`. The helper should avoid any work proportional to the full input size.

## Fix Focus Areas
- java/src/org/openqa/selenium/netty/server/WebSocketFrameProxy.java[186-199]

## Recommended change
Replace the `reason.getBytes(UTF_8).length <= 123` check with a bounded encode attempt using `CharsetEncoder` into a small buffer (e.g., 124 bytes) and detect overflow / incomplete consumption. For example:
- Allocate a `ByteBuffer` of size 124.
- Use `encoder.encode(CharBuffer.wrap(reason), buf, true)` and verify the encoder did not overflow and fully consumed the input.
- If it fits, return `reason` without allocating a full byte array; otherwise perform the existing truncation into 120 bytes and append `...`.

Keep the existing behavior and tests; consider adding a test that passes an extremely large reason and asserts the method does not allocate proportional memory (if feasible) or at least does not call `getBytes`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Overlong close reason 🐞 Bug ☼ Reliability
Description
DirectForwardingListener constructs Netty CloseWebSocketFrame using the upstream-provided close
reason without enforcing the 123-byte UTF-8 limit. If the reason exceeds that limit, close-frame
construction/encoding can fail and prevent the intended clean close/teardown behavior.
Code

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[R455-457]

Evidence
The new fast-path code writes CloseWebSocketFrame with the raw upstream reason / recorded
closeReason, while WebSocketUpgradeHandler explicitly documents and enforces the 123-byte UTF-8
limit for close reasons, implying this constraint must be respected to avoid protocol/encoding
failures.

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[439-458]
java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[521-543]
java/src/org/openqa/selenium/netty/server/WebSocketUpgradeHandler.java[245-256]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ProxyNodeWebsockets.DirectForwardingListener` writes `CloseWebSocketFrame(code, reason)` using `reason` received from the upstream WebSocket (and also the recorded pre-handshake reason). WebSocket close reason strings are constrained to **max 123 bytes UTF-8**, and the codebase already truncates close reasons to this limit in `WebSocketUpgradeHandler.exceptionCaught`. Without truncation here, an overlong upstream reason can break the close path.

## Issue Context
- Affects both:
 - pre-handshake terminal close surfaced during `onUpgrade(Channel)`
 - post-handshake upstream `onClose(int code, String reason)`

## Fix Focus Areas
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[439-458]
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[521-543]

## Implementation notes
- Add a small helper (preferably shared/reused, or copied from `WebSocketUpgradeHandler`) to truncate a `String` to <= 123 bytes when encoded as UTF-8.
- Apply it when:
 - recording `closeReason` (optional, but keeps stored state safe)
 - constructing `CloseWebSocketFrame` in `onUpgrade` and `onClose`.
- Keep behavior identical for short reasons.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit ed555f3

Results up to commit 40168a7


🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)


Remediation recommended
1. Overlong close reason 🐞 Bug ☼ Reliability
Description
DirectForwardingListener constructs Netty CloseWebSocketFrame using the upstream-provided close
reason without enforcing the 123-byte UTF-8 limit. If the reason exceeds that limit, close-frame
construction/encoding can fail and prevent the intended clean close/teardown behavior.
Code

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[R455-457]

Evidence
The new fast-path code writes CloseWebSocketFrame with the raw upstream reason / recorded
closeReason, while WebSocketUpgradeHandler explicitly documents and enforces the 123-byte UTF-8
limit for close reasons, implying this constraint must be respected to avoid protocol/encoding
failures.

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[439-458]
java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[521-543]
java/src/org/openqa/selenium/netty/server/WebSocketUpgradeHandler.java[245-256]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`ProxyNodeWebsockets.DirectForwardingListener` writes `CloseWebSocketFrame(code, reason)` using `reason` received from the upstream WebSocket (and also the recorded pre-handshake reason). WebSocket close reason strings are constrained to **max 123 bytes UTF-8**, and the codebase already truncates close reasons to this limit in `WebSocketUpgradeHandler.exceptionCaught`. Without truncation here, an overlong upstream reason can break the close path.

## Issue Context
- Affects both:
 - pre-handshake terminal close surfaced during `onUpgrade(Channel)`
 - post-handshake upstream `onClose(int code, String reason)`

## Fix Focus Areas
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[439-458]
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[521-543]

## Implementation notes
- Add a small helper (preferably shared/reused, or copied from `WebSocketUpgradeHandler`) to truncate a `String` to <= 123 bytes when encoded as UTF-8.
- Apply it when:
 - recording `closeReason` (optional, but keeps stored state safe)
 - constructing `CloseWebSocketFrame` in `onUpgrade` and `onClose`.
- Keep behavior identical for short reasons.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Results up to commit 94b1257


🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)


Action required
1. UTF-8 truncation can expand 🐞 Bug ≡ Correctness
Description
DirectForwardingListener.truncateCloseReason truncates a UTF-8 byte array to 123 bytes and then
builds a String; if the cut lands mid multi-byte sequence, Java will decode with U+FFFD replacement
characters that can re-encode to more than 123 bytes and still break CloseWebSocketFrame encoding.
Code

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[R587-598]

Evidence
onClose relies on truncateCloseReason to ensure the close reason is within the RFC 6455 123-byte
UTF-8 limit before constructing a Netty CloseWebSocketFrame. However, truncateCloseReason
truncates raw UTF-8 bytes and decodes them with new String(truncated, UTF_8); if truncation splits
a multi-byte code point, the decoded string will contain replacement characters, which can re-encode
to more than 123 bytes and break the close-frame encoding again.

java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[524-548]
java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[587-598]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`truncateCloseReason` truncates the UTF-8 *bytes* and then decodes them back into a `String`. If the truncation splits a multi-byte UTF-8 codepoint, decoding will insert a U+FFFD replacement character (3 bytes in UTF-8), which can make the resulting string exceed the 123-byte WebSocket close-reason limit when it is later encoded by Netty.

This defeats the intended hardening: a non-ASCII close reason near the boundary can still cause `CloseWebSocketFrame` encoding/validation failures and prevent a clean close/teardown.

## Issue Context
`onClose` uses `truncateCloseReason(reason)` to produce `safeReason`, then constructs `new CloseWebSocketFrame(code, safeReason)`.

## Fix Focus Areas
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[524-598]

## Implementation notes
- Replace the byte-truncate-then-decode approach with a truncation that guarantees the *final* UTF-8 encoded size is `<= 123`.
 - Example approach: iterate over the input string’s code points (or progressively shorten a substring) until `candidate.getBytes(UTF_8).length <= 123`, then return that candidate (optionally appending "..." while staying within the limit).
 - Prefer a linear-time approach using a `CharsetEncoder` writing into a 123-byte `ByteBuffer`, ensuring you never end on an incomplete sequence.
- Add a unit test that uses multi-byte characters (e.g., repeated `é` or another 2-byte UTF-8 character) so truncation is forced to cut mid-sequence, and assert the resulting reason’s UTF-8 byte length is `<= 123`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Qodo Logo

@shs96c shs96c force-pushed the grid-node-frame-fast-path branch from 40168a7 to 94b1257 Compare May 22, 2026 14:51
@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 22, 2026

Persistent review updated to latest commit 94b1257

@shs96c
Copy link
Copy Markdown
Member Author

shs96c commented May 22, 2026

Addressed in 94b1257: added a truncateCloseReason helper mirroring the byte-level truncation already used by WebSocketUpgradeHandler.exceptionCaught, and applied it at both the storage site (the recorded closeReason field) and the post-handshake close-frame write. The pre-handshake onUpgrade path reads the already-truncated field. A focused test asserts that an overlong upstream reason produces a close frame whose reasonText is within the 123-byte UTF-8 limit.

The Router-side DirectForwardingListener introduced in #17435 has the identical issue (the bot just didn't flag it then). I'll send a follow-up PR for that file applying the same helper once this lands.

Comment on lines +587 to +598
private static String truncateCloseReason(String reason) {
if (reason == null) {
return "";
}
byte[] bytes = reason.getBytes(UTF_8);
if (bytes.length <= 123) {
return reason;
}
byte[] truncated = Arrays.copyOf(bytes, 123);
Arrays.fill(truncated, 120, 123, (byte) '.');
return new String(truncated, UTF_8);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Utf-8 truncation can expand 🐞 Bug ≡ Correctness

DirectForwardingListener.truncateCloseReason truncates a UTF-8 byte array to 123 bytes and then
builds a String; if the cut lands mid multi-byte sequence, Java will decode with U+FFFD replacement
characters that can re-encode to more than 123 bytes and still break CloseWebSocketFrame encoding.
Agent Prompt
## Issue description
`truncateCloseReason` truncates the UTF-8 *bytes* and then decodes them back into a `String`. If the truncation splits a multi-byte UTF-8 codepoint, decoding will insert a U+FFFD replacement character (3 bytes in UTF-8), which can make the resulting string exceed the 123-byte WebSocket close-reason limit when it is later encoded by Netty.

This defeats the intended hardening: a non-ASCII close reason near the boundary can still cause `CloseWebSocketFrame` encoding/validation failures and prevent a clean close/teardown.

## Issue Context
`onClose` uses `truncateCloseReason(reason)` to produce `safeReason`, then constructs `new CloseWebSocketFrame(code, safeReason)`.

## Fix Focus Areas
- java/src/org/openqa/selenium/grid/node/ProxyNodeWebsockets.java[524-598]

## Implementation notes
- Replace the byte-truncate-then-decode approach with a truncation that guarantees the *final* UTF-8 encoded size is `<= 123`.
  - Example approach: iterate over the input string’s code points (or progressively shorten a substring) until `candidate.getBytes(UTF_8).length <= 123`, then return that candidate (optionally appending "..." while staying within the limit).
  - Prefer a linear-time approach using a `CharsetEncoder` writing into a 123-byte `ByteBuffer`, ensuring you never end on an incomplete sequence.
- Add a unit test that uses multi-byte characters (e.g., repeated `é` or another 2-byte UTF-8 character) so truncation is forced to cut mid-sequence, and assert the resulting reason’s UTF-8 byte length is `<= 123`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@shs96c shs96c force-pushed the grid-node-frame-fast-path branch from 94b1257 to 72a9bcc Compare May 22, 2026 15:04
@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 22, 2026

Persistent review updated to latest commit 72a9bcc

@shs96c
Copy link
Copy Markdown
Member Author

shs96c commented May 22, 2026

Pushed 72a9bcc — the bot is right, the byte-truncate-then-decode pattern I copied from WebSocketUpgradeHandler is unsafe for non-ASCII reasons. If the cut lands mid multi-byte sequence, Java's UTF-8 decoder inserts U+FFFD (3 bytes when re-encoded), which can push the string back over 123 bytes when Netty re-encodes it for the wire.

Replaced the helper with a CharsetEncoder writing into a 120-byte ByteBuffer. encode() stops at a clean character boundary on overflow, so we never leave a partial sequence behind. The trailing "..." (three ASCII bytes) keeps the truncation marker and the final UTF-8 encoding at most 123 bytes regardless of the input.

The existing test was replaced with one that builds a 200-byte reason out of 'é' (a two-byte UTF-8 character), so the truncation is forced to cut where a naïve byte-level approach would split a codepoint. The test asserts both that the encoded length is ≤ 123 bytes and that the result contains no replacement character.

The follow-up I owe for the Router-side listener (in ProxyWebsocketsIntoGrid, on trunk via #17435) will reuse the same safe helper.

The Router has had a direct frame-forwarding path between the Netty
pipeline and the upstream JDK WebSocket since db9b07a (2026-03-11,
"[grid] Router WebSocket handle dropped close frames, idle disconnects,
high-latency proxying", SeleniumHQ#17197). Once the client-side handshake
completes, an inbound WebSocketFrameProxy forwards each Netty
WebSocketFrame straight to the upstream WebSocket, and the outbound
DirectForwardingListener writes upstream replies directly to the
client channel. Together those removed the per-frame Message
allocation and the executor hop in WebSocketMessageHandler on the
Router side.

The Node still did the full round-trip through MessageInboundConverter,
WebSocketMessageHandler, the registered Consumer<Message>, and
MessageOutboundConverter in both directions for every frame. Each
frame allocated a TextMessage or BinaryMessage and hopped onto the
channel executor on delivery. For a busy CDP or VNC session that is
measurable allocation and executor-queue pressure on the Node.

Apply the same PostUpgradeHook pattern on the Node side: the consumer
returned from ProxyNodeWebsockets installs a WebSocketFrameProxy after
the handshake so inbound frames forward straight to the browser-side
WebSocket, and a DirectForwardingListener writes outbound frames
directly to the client channel. Frames received before the handshake
are buffered in arrival order and drained on handover, so a frame
cannot land in a pipeline that has already had its Message-layer
handlers removed.

The hardening that the Router-side listener picked up in 8d8cf64
(2026-05-14, "[grid] Close pre-handshake race in WebSocket proxy",
SeleniumHQ#17435) is mirrored on the Node listener: the pre-handshake buffer is
capped at 128 frames with a 1009 close recorded on overflow; the
close code and reason are recorded on pre-handshake close or error so
a late onUpgrade can write a clean close frame to the client and tear
the channel down rather than leaving it open; and the buffer is
released on close so ref-counted frames cannot leak if the handshake
never completes.

Close-frame reasons coming from the upstream are now truncated to the
123-byte UTF-8 cap that RFC 6455 §5.5.1 imposes. The truncation uses
a CharsetEncoder writing into a 120-byte buffer so it stops at a
clean character boundary on overflow — a naive byte-truncate-then-
decode could split a multi-byte sequence, produce a U+FFFD replacement
on decode, and re-encode back over 123 bytes, breaking the close
frame. The helper lives as a public static on WebSocketFrameProxy
because both DirectForwardingListener classes already depend on that
class. The Router-side listener that landed in SeleniumHQ#17435 had the same
unchecked path; apply the helper there too so both proxies share the
same safe behaviour.

The Node-specific behaviour is preserved:

- Session-activity heartbeats (sessionConsumer.accept(sessionId)) fire
  per frame, both pre- and post-handshake.
- The connectionReleased CAS still guards a single
  node.releaseConnection call across the close and error paths,
  including the overflow path introduced here.
- VNC sessions still install a no-op heartbeat consumer so VNC
  traffic does not mark the session as recently active.

The existing ProxyNodeWebsocketsTest continues to exercise the slot
accounting, including the regression from SeleniumHQ#17197 where onError without
a follow-on onClose used to leak the slot. New unit tests in
NodeDirectForwardingListenerTest pin the per-frame heartbeat, the
buffer-then-drain ordering, the surface-and-teardown behaviour on a
pre-handshake close, the overflow path's clean release of the session
slot, and the safe truncation of an overlong upstream close reason
that contains multi-byte UTF-8 characters. The shared helper has a
focused unit test alongside it in WebSocketFrameProxyTest.
@shs96c shs96c force-pushed the grid-node-frame-fast-path branch from 72a9bcc to ed555f3 Compare May 22, 2026 15:16
@qodo-code-review
Copy link
Copy Markdown
Contributor

qodo-code-review Bot commented May 22, 2026

Persistent review updated to latest commit ed555f3

@shs96c
Copy link
Copy Markdown
Member Author

shs96c commented May 22, 2026

Pushed ed555f3 — extended the scope of the PR per Simon's note: extracted the safe truncation to a public static `WebSocketFrameProxy.truncateCloseReason` and applied it to the Router-side `DirectForwardingListener` in the same change. Both listeners now share the same helper, which uses `CharsetEncoder` into a fixed 120-byte buffer so it cannot leave a partial multi-byte sequence behind. A focused unit test next to the helper covers the multi-byte case directly; the existing Node integration test continues to exercise the call site.

The Router side had the same bug as the Node side — it landed unchanged in #17435 because the original `WebSocketUpgradeHandler.exceptionCaught` pattern was unsafe and we copied it. Fixing both at once means there is no transient window with one side safe and the other still broken.

PR title and description still describe this as the Node-side fast path PR. Happy to add a follow-up sentence to the description noting the Router-side truncation if you'd like, or split into two commits within the PR — but since you said "make the fix at the same time" I assumed one squash commit was preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations B-grid Everything grid and server related C-java Java Bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants