fix: proxy WebSocket upgrades through DevMode tunnel#18
Conversation
Go's httputil.ReverseProxy does not relay WebSocket upgrades — it silently drops the 101 Switching Protocols handshake. Additionally, the TLS config preferred HTTP/2 via ALPN (NextProtos: h2, http/1.1), which strips the hop-by-hop Connection/Upgrade headers and does not support Hijack() on its ResponseWriter. Together these two issues cause WebSocket connections through the DevMode public tunnel to open successfully (TLS + TCP handshake) but never deliver any frames to the local Hub server. Changes: - Remove h2 from NextProtos so all inbound connections use HTTP/1.1, preserving WebSocket upgrade headers and enabling Hijack() - Add proxyWebSocket() handler that detects upgrade requests, hijacks the inbound connection, dials the local backend, writes the HTTP/1.1 upgrade request, and bridges bytes bidirectionally - Add isWebSocketUpgrade() helper that checks both standard HTTP/1.1 headers and Sec-WebSocket-Version for cross-protocol robustness Fixes agentuity/sdk#1382
📝 WalkthroughWalkthroughRemoved HTTP/2 from TLS ALPN and added explicit WebSocket upgrade detection with a new proxy path that performs a raw TCP HTTP/1.1 Upgrade to the local backend, hijacks the client connection, and relays bidirectional streams; also adjusted localPort mapping and updated module versions. Changes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/stack/stack.go`:
- Around line 250-254: When hijack (hj.Hijack()) fails, the code currently
closes backendConn, logs and returns without completing the HTTP response;
change this to write a 5xx HTTP response (e.g., 502 Bad Gateway) before
returning. Specifically, after backendConn.Close() and before returning in the
error branch of clientConn, clientBuf, err := hj.Hijack(), call the
ResponseWriter (hj) to write a 5xx status and a brief body (so the client sees
an HTTP error), then log the error via logger.Error and return; keep references
to hj, backendConn and logger in the updated branch so the response is sent and
resources are closed.
- Around line 560-571: The isWebSocketUpgrade function currently treats any
request with Sec-WebSocket-Version as an upgrade signal; restrict this so
Sec-WebSocket-Version only triggers an upgrade when the request is actually an
HTTP/2-over-TLS connection. Modify the Sec-WebSocket-Version check in
isWebSocketUpgrade to require r.ProtoMajor == 2 and r.TLS != nil (i.e., only
consider it an upgrade when on HTTP/2 over TLS) before returning true, leaving
the existing HTTP/1.1 Connection/Upgrade checks unchanged.
- Around line 205-222: The WebSocket handler currently copies all inbound
headers into outReq (the loop over r.Header) which causes hop-by-hop headers and
X-Forwarded-/Forwarded headers to be forwarded; modify the header-sanitization
in that block so it first skips/removes the standard hop-by-hop headers
(Connection, Proxy-Connection, Keep-Alive, TE, Trailer, Transfer-Encoding,
Upgrade) and any X-Forwarded-* or Forwarded headers when copying from r.Header
into outReq.Header, then explicitly set only the required WebSocket headers on
outReq.Header (e.g., Connection: Upgrade and Upgrade: websocket) and set
outReq.Host only when appropriate (remove redundant X-Forwarded-Host restoration
if you keep X-Forwarded-Host omitted), matching httputil.ReverseProxy’s
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 869f43b0-fb2d-465a-9c99-acbf6c39563b
📒 Files selected for processing (1)
internal/stack/stack.go
jhaynie
left a comment
There was a problem hiding this comment.
LGTM . you should be able to build it and copy it in to your $HOME/.config/agentuity and replace the current version to test -- if you haven't already done that. does this fix the issue?
…ure, tighten WS detection - Sanitize hop-by-hop and X-Forwarded-* headers in the WebSocket proxy to match httputil.ReverseProxy behavior (RFC 9110 §7.6.1) - Return HTTP 500 when Hijack() fails instead of silently dropping the connection - Remove overly broad Sec-WebSocket-Version fallback in isWebSocketUpgrade — since TLS is HTTP/1.1 only, we only need the standard Connection: Upgrade + Upgrade: websocket check with proper multi-token Connection header parsing
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/stack/stack.go`:
- Around line 312-319: The comment block describing WebSocket upgrade handling
is outdated; update it to reflect that isWebSocketUpgrade now only checks
HTTP/1.1 Connection/Upgrade headers (it no longer inspects HTTP/2 or
Sec-WebSocket-Version), and keep the note that Go's httputil.ReverseProxy does
not handle WebSocket hijacking/bridging so upgrades require special handling;
reference the isWebSocketUpgrade function and the surrounding WebSocket upgrade
handling code when making the comment edits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e0c13188-82d0-4333-80b3-150b8db564f4
📒 Files selected for processing (1)
internal/stack/stack.go
📜 Review details
🔇 Additional comments (3)
internal/stack/stack.go (3)
110-116: Reasonable protocol narrowing.Pinning this listener to a single HTTP mode makes the new tunnel path much easier to reason about.
562-577: Nice tightening of the upgrade predicate.Requiring
Upgrade: websocketand parsing the comma-separatedConnectiontokens removes the earlier false-positive path cleanly.
179-285: Reconsider the custom WebSocket implementation against native ReverseProxy support.The premise that
httputil.ReverseProxydoes not support101 Switching Protocolsresponses is outdated. Go'sReverseProxyhas natively supported WebSocket upgrades since Go 1.12 through itshandleUpgradeResponsemechanism, which hijacks the frontend connection and proxies bidirectional data. While hop-by-hop headers (includingUpgrade) are stripped by default, they can be restored via theDirectorcallback.Even with
NextProtospinned tohttp/1.1, relying on the standard library's upgrade handling would align with Go's idiomatic patterns. Evaluate whether this custom bridge offers meaningful advantages over the stdlib approach, or if it can be safely removed and replaced with appropriateDirectorconfiguration.
| // WebSocket upgrade requests need special handling. | ||
| // Go's httputil.ReverseProxy does NOT relay WebSocket | ||
| // upgrades — it cannot hijack and bridge the bidirectional | ||
| // stream. Additionally, when the inbound side is HTTP/2 | ||
| // (negotiated via ALPN), the hop-by-hop Connection/Upgrade | ||
| // headers are stripped. We detect WebSocket requests via | ||
| // both the HTTP/1.1 Upgrade header and the HTTP/2 | ||
| // :protocol pseudo-header / Sec-WebSocket-Version signal. |
There was a problem hiding this comment.
Update this comment to match the current code.
It still describes HTTP/2 and Sec-WebSocket-Version detection, but isWebSocketUpgrade now only checks the HTTP/1.1 Connection/Upgrade headers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/stack/stack.go` around lines 312 - 319, The comment block describing
WebSocket upgrade handling is outdated; update it to reflect that
isWebSocketUpgrade now only checks HTTP/1.1 Connection/Upgrade headers (it no
longer inspects HTTP/2 or Sec-WebSocket-Version), and keep the note that Go's
httputil.ReverseProxy does not handle WebSocket hijacking/bridging so upgrades
require special handling; reference the isWebSocketUpgrade function and the
surrounding WebSocket upgrade handling code when making the comment edits.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/stack/stack.go (3)
304-322: Consider logging write errors during buffer flush.The
Writecalls on lines 310 and 320 ignore errors. While subsequentio.Copywould likely fail too, logging these errors would aid debugging if data is silently dropped during the handoff.♻️ Optional improvement
if backendBuf.Buffered() > 0 { buffered := make([]byte, backendBuf.Buffered()) n, _ := backendBuf.Read(buffered) if n > 0 { logger.Debug("WS proxy: flushing %d buffered backend bytes to client", n) - clientConn.Write(buffered[:n]) + if _, err := clientConn.Write(buffered[:n]); err != nil { + logger.Error("WS proxy: failed to flush backend buffer: %v", err) + } } } // Flush any buffered data from the hijacked reader to the backend. if clientBuf != nil && clientBuf.Reader.Buffered() > 0 { buffered := make([]byte, clientBuf.Reader.Buffered()) n, _ := clientBuf.Read(buffered) if n > 0 { logger.Debug("WS proxy: flushing %d buffered client bytes to backend", n) - backendConn.Write(buffered[:n]) + if _, err := backendConn.Write(buffered[:n]); err != nil { + logger.Error("WS proxy: failed to flush client buffer: %v", err) + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/stack/stack.go` around lines 304 - 322, The flush code ignores errors from clientConn.Write and backendConn.Write; update the two write calls in the block that reads from backendBuf and clientBuf (referencing backendBuf.Buffered, clientBuf.Reader.Buffered, backendConn, clientConn) to capture the (n, err) returned by Write and, if err != nil, log the error with logger.Error (include context like "WS proxy: flushing X buffered backend/client bytes to client/backend" and the err) so write failures are visible for debugging; do not change control flow besides logging (i.e., keep attempting the subsequent io.Copy).
232-240: UseDialContextto respect request context cancellation.
net.DialTimeoutignores the request context. If the client disconnects during the dial, the connection attempt continues unnecessarily. Using anet.DialerwithDialContextrespects context cancellation.♻️ Proposed fix
// Dial the local backend via raw TCP so we can relay the full // HTTP/1.1 upgrade handshake + subsequent WebSocket frames. backendAddr := fmt.Sprintf("127.0.0.1:%d", urls.LocalPort) - backendConn, err := net.DialTimeout("tcp", backendAddr, 5*time.Second) + dialer := &net.Dialer{Timeout: 5 * time.Second} + backendConn, err := dialer.DialContext(r.Context(), "tcp", backendAddr) if err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/stack/stack.go` around lines 232 - 240, Replace the call to net.DialTimeout with a context-aware dial using a net.Dialer and DialContext so the request context cancellation is respected: create a net.Dialer (setting Timeout if you want the same 5s behavior) and call dialer.DialContext(ctx, "tcp", backendAddr) rather than net.DialTimeout; use the existing request context (r.Context()) and propagate/handle the returned error the same way (logging via logger.Error and returning a Bad Gateway) while assigning the resulting connection to backendConn as before.
252-269: Consider closing response body explicitly in error path to satisfy linter.The
bodycloselinter flags this becausebackendResp.Bodyis not explicitly closed. WhilebackendConn.Close()implicitly releases resources, adding an explicit close in the non-101 path would silence the linter and make the intent clearer.♻️ Optional fix for linter compliance
if backendResp.StatusCode != http.StatusSwitchingProtocols { + backendResp.Body.Close() backendConn.Close() logger.Error("WS proxy: backend did not upgrade, got %d", backendResp.StatusCode) http.Error(w, "Bad Gateway", http.StatusBadGateway) return }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/stack/stack.go` around lines 252 - 269, The linter warns because backendResp.Body isn't explicitly closed when the backend does not return 101; update the non-101 branch in the WS proxy logic to call backendResp.Body.Close() before closing backendConn and returning (i.e., add backendResp.Body.Close() in the block that checks backendResp.StatusCode != http.StatusSwitchingProtocols), ensuring the response body is released and the linter is satisfied while keeping the existing logging and http.Error behavior for backendConn, backendResp and outReq.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/start.go`:
- Around line 74-77: The hardcoded remap that silently overrides user-specified
port (the localPort variable in cmd/start.go) needs visibility and tracking:
when you keep the workaround (if localPort == 3502 { localPort = 3500 }) add a
clear log message that reports the original requested port and the remapped port
(e.g., "requested port 3502 remapped to 3500 due to SDK workaround") using the
project's logger or fmt.Printf so users see it, and create a tracking issue in
the repo referencing the FIXME/SDK bug and link that issue ID in the FIXME
comment so the workaround can be removed later.
---
Nitpick comments:
In `@internal/stack/stack.go`:
- Around line 304-322: The flush code ignores errors from clientConn.Write and
backendConn.Write; update the two write calls in the block that reads from
backendBuf and clientBuf (referencing backendBuf.Buffered,
clientBuf.Reader.Buffered, backendConn, clientConn) to capture the (n, err)
returned by Write and, if err != nil, log the error with logger.Error (include
context like "WS proxy: flushing X buffered backend/client bytes to
client/backend" and the err) so write failures are visible for debugging; do not
change control flow besides logging (i.e., keep attempting the subsequent
io.Copy).
- Around line 232-240: Replace the call to net.DialTimeout with a context-aware
dial using a net.Dialer and DialContext so the request context cancellation is
respected: create a net.Dialer (setting Timeout if you want the same 5s
behavior) and call dialer.DialContext(ctx, "tcp", backendAddr) rather than
net.DialTimeout; use the existing request context (r.Context()) and
propagate/handle the returned error the same way (logging via logger.Error and
returning a Bad Gateway) while assigning the resulting connection to backendConn
as before.
- Around line 252-269: The linter warns because backendResp.Body isn't
explicitly closed when the backend does not return 101; update the non-101
branch in the WS proxy logic to call backendResp.Body.Close() before closing
backendConn and returning (i.e., add backendResp.Body.Close() in the block that
checks backendResp.StatusCode != http.StatusSwitchingProtocols), ensuring the
response body is released and the linter is satisfied while keeping the existing
logging and http.Error behavior for backendConn, backendResp and outReq.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 490f741e-6407-404b-af15-0e4a7be72a0f
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
cmd/start.gogo.modinternal/stack/stack.go
📜 Review details
🧰 Additional context used
🪛 golangci-lint (2.11.4)
internal/stack/stack.go
[error] 255-255: response body must be closed
(bodyclose)
[error] 245-245: Error return value of backendConn.Close is not checked
(errcheck)
[error] 257-257: Error return value of backendConn.Close is not checked
(errcheck)
[error] 265-265: Error return value of backendConn.Close is not checked
(errcheck)
[error] 291-291: Error return value of clientConn.Close is not checked
(errcheck)
[error] 310-310: Error return value of clientConn.Write is not checked
(errcheck)
[error] 320-320: Error return value of backendConn.Write is not checked
(errcheck)
[error] 328-328: Error return value of clientConn.Close is not checked
(errcheck)
[error] 235-235: net.DialTimeout must not be called. use (*net.Dialer).DialContext with (*net.Dialer).Timeout
(noctx)
[error] 298-298: QF1008: could remove embedded field "Writer" from selector
(staticcheck)
🔇 Additional comments (5)
internal/stack/stack.go (4)
362-369: Comment is inconsistent with actual implementation.The comment states WebSocket detection uses "both the HTTP/1.1 Upgrade header and the HTTP/2 :protocol pseudo-header / Sec-WebSocket-Version signal," but
isWebSocketUpgrade(lines 621-631) only checks the HTTP/1.1Connection: Upgrade+Upgrade: websocketheaders. Update the comment to match the implementation.📝 Suggested comment update
- // WebSocket upgrade requests need special handling. - // Go's httputil.ReverseProxy does NOT relay WebSocket - // upgrades — it cannot hijack and bridge the bidirectional - // stream. Additionally, when the inbound side is HTTP/2 - // (negotiated via ALPN), the hop-by-hop Connection/Upgrade - // headers are stripped. We detect WebSocket requests via - // both the HTTP/1.1 Upgrade header and the HTTP/2 - // :protocol pseudo-header / Sec-WebSocket-Version signal. + // WebSocket upgrade requests need special handling. + // Go's httputil.ReverseProxy does NOT relay WebSocket + // upgrades — it cannot hijack and bridge the bidirectional + // stream. Since TLS is configured with HTTP/1.1 only (no + // h2 in NextProtos), we detect upgrades via the standard + // Connection: Upgrade + Upgrade: websocket headers.
111-117: LGTM!The HTTP/1.1-only ALPN configuration is correct for WebSocket support. The comment clearly explains why HTTP/2 must be excluded (hop-by-hop header stripping and no Hijacker support).
577-596: LGTM!The conservative connection pool settings (
MaxGravityPeers: 1,StreamsPerGravity: 1,UseMultiConnect: false) are appropriate for DevMode local development, simplifying the connection model while maintaining reliability.
616-631: LGTM!The
isWebSocketUpgradeimplementation correctly detects WebSocket upgrades per RFC 6455. The comma-separated token parsing for theConnectionheader handles multi-value cases like"keep-alive, Upgrade"properly.go.mod (1)
3-68: LGTM!Dependency updates look reasonable. The Go toolchain bump to 1.26.2 and library upgrades (OpenTelemetry, gRPC, go-common) align with the WebSocket proxy feature additions in this PR.
| // FIXME: undo this once we fix the SDK | ||
| if localPort == 3502 { | ||
| localPort = 3500 | ||
| } |
There was a problem hiding this comment.
Track this workaround in an issue to ensure it's addressed.
This hardcoded port remapping silently overrides user configuration when --port 3502 is passed. While the FIXME comment is good, consider:
- Adding a log message so users know their port was remapped
- Creating a tracking issue linked to the SDK fix mentioned in the FIXME
Would you like me to open an issue to track this technical debt, or add logging to make the remapping visible to users?
📝 Optional: Add visibility for the remapping
// FIXME: undo this once we fix the SDK
if localPort == 3502 {
+ logger.Warn("remapping port 3502 → 3500 (SDK workaround)")
localPort = 3500
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // FIXME: undo this once we fix the SDK | |
| if localPort == 3502 { | |
| localPort = 3500 | |
| } | |
| // FIXME: undo this once we fix the SDK | |
| if localPort == 3502 { | |
| logger.Warn("remapping port 3502 → 3500 (SDK workaround)") | |
| localPort = 3500 | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/start.go` around lines 74 - 77, The hardcoded remap that silently
overrides user-specified port (the localPort variable in cmd/start.go) needs
visibility and tracking: when you keep the workaround (if localPort == 3502 {
localPort = 3500 }) add a clear log message that reports the original requested
port and the remapped port (e.g., "requested port 3502 remapped to 3500 due to
SDK workaround") using the project's logger or fmt.Printf so users see it, and
create a tracking issue in the repo referencing the FIXME/SDK bug and link that
issue ID in the FIXME comment so the workaround can be removed later.
Go's httputil.ReverseProxy does not relay WebSocket upgrades — it silently drops the 101 Switching Protocols handshake. Additionally, the TLS config preferred HTTP/2 via ALPN (NextProtos: h2, http/1.1), which strips the hop-by-hop Connection/Upgrade headers and does not support Hijack() on its ResponseWriter.
Together these two issues cause WebSocket connections through the DevMode public tunnel to open successfully (TLS + TCP handshake) but never deliver any frames to the local Hub server.
Changes:
Fixes agentuity/sdk#1382
Summary by CodeRabbit
New Features
Refactor
Behavior
Chores