fix(proxy/test): eliminate startup race in TestChangeResponseHeader#2443
fix(proxy/test): eliminate startup race in TestChangeResponseHeader#2443ValentaTomas wants to merge 2 commits intomainfrom
Conversation
The test started internal/masked HTTP servers via http.Server.ListenAndServe
in goroutines without waiting for the listener to come up, then fired a
request through the proxy. If the proxy connected to the internal server
before its goroutine had finished binding to the port, the proxy would
return a non-2xx error and the retry loop (which only retries on
ECONNREFUSED of the *outer* request to the proxy) would not retry it,
producing intermittent CI failures on slower runners.
It also used hardcoded ports 30090/30091/30092 which can collide with
anything else on the box.
Switch the internal and masked servers to the same Listen(":0") + Serve
pattern the rest of this file already uses, and derive maskedHost from the
listener address. The proxy itself still uses ListenAndServe with a fixed
port (its constructor takes a port number), so the existing ECONNREFUSED
retry on the outer request is preserved.
No behavioural change to the proxy or to what the test asserts.
PR SummaryLow Risk Overview Reviewed by Cursor Bugbot for commit a5c8686. Bugbot is set up for automated code reviews on this repo. Configure here. |
| require.NoError(t, err) | ||
|
|
||
| // The proxy created via New() uses ListenAndServe with a fixed port. Close | ||
| // the placeholder listener so the proxy can bind, accepting that there's | ||
| // a tiny TOCTOU window — the request retry below covers it. | ||
| require.NoError(t, proxyListener.Close()) | ||
|
|
||
| client := &http.Client{} | ||
|
|
||
| proxy := New(proxyPort, 1, time.Second, func(_ *http.Request) (*pool.Destination, error) { | ||
| return &pool.Destination{ | ||
| Url: internalURL, |
There was a problem hiding this comment.
🔴 The PR fixes the startup race for internalListener and maskedListener by using Serve(listener), but leaves a residual TOCTOU window for the proxy port: it pre-binds proxyListener, closes it, then calls proxy.ListenAndServe() which tries to re-bind the same port — a window where another process can steal it. The fix is trivial: drop proxyListener.Close() and call proxy.Serve(proxyListener) instead, the same pattern already used by newTestProxyWithConnLimit() in the same file.
Extended reasoning...
What the bug is and how it manifests
TestChangeResponseHeader pre-binds proxyListener on an ephemeral port solely to discover a free port number, then immediately releases it with require.NoError(t, proxyListener.Close()) (proxy_test.go:917) before calling proxy.ListenAndServe(t.Context()). ListenAndServe internally calls lisCfg.Listen(ctx, "tcp", p.Addr) (proxy.go:97-105) to re-bind to that same port. Between the Close() and the re-bind there is a genuine TOCTOU window during which any other OS process or parallel test can steal the ephemeral port.
The specific code path that triggers it
proxyListener.Close()releases the port to the OS (proxy_test.go:917)- [TOCTOU window — another goroutine or process can call
bind()on the same port] proxy.ListenAndServe(t.Context())callslisCfg.Listen("tcp", ":proxyPort")— may returnEADDRINUSE
Why existing code does not prevent it
The PR comment itself acknowledges this: "the existing ECONNREFUSED retry on the outer request covers that one race window." However, the retry loop (proxy_test.go lines ~976-997) only retries when errors.Is(err, syscall.ECONNREFUSED). A bind failure in the proxy goroutine produces a non-http.ErrServerClosed error, causing the goroutine's assert.ErrorIs(t, err, http.ErrServerClosed) assertion to fail immediately, and the proxy never starts. The main test thread then exhausts all 50 ECONNREFUSED retries on a port that may be bound by a port-stealer (returning connection refused or a wrong response), and fails at require.NotNil(t, rsp).
What the impact would be
This is a test-only flaw and port-stealing on loopback is low-probability in isolation, but this PR's stated purpose is to eliminate flaky startup races. On a heavily loaded CI runner with many parallel test processes the probability rises. The PR closes two such races (internal and masked servers) while inadvertently leaving this one open.
How to fix it
Drop require.NoError(t, proxyListener.Close()) and call proxy.Serve(proxyListener) instead of proxy.ListenAndServe(t.Context()). Proxy.Serve(l net.Listener) already exists at proxy.go:107 and is already used by newTestProxyWithConnLimit() in the same test file (proxy_test.go:172): go func() { proxy.Serve(l) }(). No changes to proxy.go are required.
Step-by-step proof
- Test runs on a loaded CI runner; another parallel test's ephemeral-port lookup coincidentally receives the same port number just released by
proxyListener.Close(). - That other test calls
bind(:proxyPort)— succeeds, because the port is temporarily free. proxy.ListenAndServe(t.Context())callslisCfg.Listen("tcp", ":proxyPort")— returnsEADDRINUSE.- The goroutine at proxy_test.go:933 receives a non-
http.ErrServerClosederror;assert.ErrorIsfires a test failure. - The proxy never accepts connections; the main test's client receives
ECONNREFUSED(or a response from the wrong server) on every retry. - After 50x20ms = 1s the loop exits;
rspis still nil;require.NotNil(t, rsp)fails. - With the fix (
proxy.Serve(proxyListener)), the port is held continuously from the moment it is bound — the TOCTOU window is eliminated entirely.
| req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, proxyURL.String(), nil) | ||
| require.NoError(t, err) | ||
| req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort)) | ||
| req.Header.Set("e2b-testing", "test123") |
There was a problem hiding this comment.
🟣 The req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort)) call on line 976 is dead code — Go's net/http client explicitly excludes Header["Host"] from outgoing requests (via reqWriteExcludeHeader in net/http/request.go), so the proxy never sees "localhost" as the Host. This is a pre-existing issue that the PR touched by reordering it after require.NoError. The test still passes but only exercises the URL-derived host (127.0.0.1:<proxyPort>), not the "localhost" host the author appears to have intended; to fix, replace req.Header.Set("Host", ...) with req.Host = fmt.Sprintf("localhost:%d", proxyPort) and update the X-Forwarded-Host assertion to match.
Extended reasoning...
What the bug is
In TestChangeResponseHeader, line 976 calls req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort)). Go's net/http client maintains a map called reqWriteExcludeHeader (in net/http/request.go, line 99) that lists headers the client must never forward from req.Header — and "Host" is one of them. Instead, the actual Host header written on the wire is taken from req.Host (if non-empty) or req.URL.Host. Because only req.URL is set (to http://127.0.0.1:<proxyPort>), the Host on the wire is always 127.0.0.1:<proxyPort>, regardless of what req.Header["Host"] contains.
The specific code path
client.Do(req)callsreq.write(), which callsr.Header.writeSubset(w, reqWriteExcludeHeader, ...)— this skips the"Host"key entirely.- The proxy receives the request with
Host: 127.0.0.1:<proxyPort>, stores it asr.In.Host. pool/client.goline 114 setsX-Forwarded-Hostfromr.In.Host(whenMaskRequestHostis set).- The assertion on line 1014 checks
fmt.Sprintf("127.0.0.1:%d", proxyPort)— which matches, but only because it equals the URL host, not the "localhost" value the deadHeader.Setwas trying to inject.
Why existing code doesn't prevent it
Nothing in the test or the proxy validates that the Header.Set call had any effect. The assertion uses the URL-derived IP address directly, so the test silently validates a different behavior than the one the author intended. There is no compile-time or runtime error when req.Header["Host"] is set and then silently ignored.
Impact
The test passes and its assertions are internally consistent, but it does not exercise the proxy's Host-forwarding behavior with a localhost hostname. Code reviewers reading the test are misled into thinking it tests the "localhost host" path; in reality it only tests the 127.0.0.1 path.
How to fix it
Replace:
req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort))with:
req.Host = fmt.Sprintf("localhost:%d", proxyPort)and update the X-Forwarded-Host assertion from fmt.Sprintf("127.0.0.1:%d", proxyPort) to fmt.Sprintf("localhost:%d", proxyPort) to actually test the intended behavior. Alternatively, remove the dead line entirely and rely on the URL host.
Step-by-step proof
req.URLis set tohttp://127.0.0.1:<proxyPort>, soreq.URL.Host = "127.0.0.1:<proxyPort>".req.Hostis never set, so it remains empty string ("").req.Header.Set("Host", "localhost:<proxyPort>")writes intoreq.Header["Host"].client.Do(req)callsr.write()→r.Header.writeSubset(w, reqWriteExcludeHeader). BecausereqWriteExcludeHeader["Host"] = true, the"Host"entry inreq.Headeris skipped.- The actual Host line written to the wire (net/http/request.go ~line 610):
r.URL.Host→"127.0.0.1:<proxyPort>". - The proxy sees
Host: 127.0.0.1:<proxyPort>, copies it toX-Forwarded-Host. - The test assertion
assert.Equal(t, fmt.Sprintf("127.0.0.1:%d", proxyPort), data.Headers.Get("X-Forwarded-Host"))passes — but never tests thelocalhostpath.
Keep the proxy listener bound through startup and use req.Host so the test exercises the forwarded localhost host it intends to verify.
TestChangeResponseHeaderwas intermittently failing on ARM64 CI (example). The internal and masked HTTP servers were started withListenAndServein goroutines, so the test could fire requests through the proxy before those servers had bound their listeners. The retry loop only coveredECONNREFUSEDon the outer request, so the race surfaced asexpected: 200.Pre-bind listeners with
Listen(":0")andServe(listener)for both backends;maskedHostis derived from the listener address. The proxy keeps its fixed-portListenAndServe(its constructor takes a port); the existingECONNREFUSEDretry covers that and is tightened from 10×100ms to 50×20ms.No behaviour change to the proxy or the test assertions.