Skip to content

fix(proxy/test): eliminate startup race in TestChangeResponseHeader#2443

Open
ValentaTomas wants to merge 2 commits intomainfrom
fix/proxy-test-startup-race
Open

fix(proxy/test): eliminate startup race in TestChangeResponseHeader#2443
ValentaTomas wants to merge 2 commits intomainfrom
fix/proxy-test-startup-race

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented Apr 18, 2026

TestChangeResponseHeader was intermittently failing on ARM64 CI (example). The internal and masked HTTP servers were started with ListenAndServe in goroutines, so the test could fire requests through the proxy before those servers had bound their listeners. The retry loop only covered ECONNREFUSED on the outer request, so the race surfaced as expected: 200.

Pre-bind listeners with Listen(":0") and Serve(listener) for both backends; maskedHost is derived from the listener address. The proxy keeps its fixed-port ListenAndServe (its constructor takes a port); the existing ECONNREFUSED retry covers that and is tightened from 10×100ms to 50×20ms.

No behaviour change to the proxy or the test assertions.

The test started internal/masked HTTP servers via http.Server.ListenAndServe
in goroutines without waiting for the listener to come up, then fired a
request through the proxy. If the proxy connected to the internal server
before its goroutine had finished binding to the port, the proxy would
return a non-2xx error and the retry loop (which only retries on
ECONNREFUSED of the *outer* request to the proxy) would not retry it,
producing intermittent CI failures on slower runners.

It also used hardcoded ports 30090/30091/30092 which can collide with
anything else on the box.

Switch the internal and masked servers to the same Listen(":0") + Serve
pattern the rest of this file already uses, and derive maskedHost from the
listener address. The proxy itself still uses ListenAndServe with a fixed
port (its constructor takes a port number), so the existing ECONNREFUSED
retry on the outer request is preserved.

No behavioural change to the proxy or to what the test asserts.
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 18, 2026

PR Summary

Low Risk
Low risk since this only changes test setup logic to use pre-bound listeners and a tighter retry loop; production proxy behavior is untouched. Main risk is test flakiness if the new Serve(listener) goroutines don’t shut down cleanly under failure paths.

Overview
Makes TestChangeResponseHeader deterministic under parallel/CI runs by pre-binding ephemeral listeners for the internal, masked, and proxy servers and running them via Serve(listener) (with explicit cleanup), rather than relying on ListenAndServe and hardcoded ports; the request host/X-Forwarded-Host expectations and the connection-refused retry cadence are updated accordingly to match the new setup.

Reviewed by Cursor Bugbot for commit a5c8686. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment on lines +912 to 923
require.NoError(t, err)

// The proxy created via New() uses ListenAndServe with a fixed port. Close
// the placeholder listener so the proxy can bind, accepting that there's
// a tiny TOCTOU window — the request retry below covers it.
require.NoError(t, proxyListener.Close())

client := &http.Client{}

proxy := New(proxyPort, 1, time.Second, func(_ *http.Request) (*pool.Destination, error) {
return &pool.Destination{
Url: internalURL,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The PR fixes the startup race for internalListener and maskedListener by using Serve(listener), but leaves a residual TOCTOU window for the proxy port: it pre-binds proxyListener, closes it, then calls proxy.ListenAndServe() which tries to re-bind the same port — a window where another process can steal it. The fix is trivial: drop proxyListener.Close() and call proxy.Serve(proxyListener) instead, the same pattern already used by newTestProxyWithConnLimit() in the same file.

Extended reasoning...

What the bug is and how it manifests

TestChangeResponseHeader pre-binds proxyListener on an ephemeral port solely to discover a free port number, then immediately releases it with require.NoError(t, proxyListener.Close()) (proxy_test.go:917) before calling proxy.ListenAndServe(t.Context()). ListenAndServe internally calls lisCfg.Listen(ctx, "tcp", p.Addr) (proxy.go:97-105) to re-bind to that same port. Between the Close() and the re-bind there is a genuine TOCTOU window during which any other OS process or parallel test can steal the ephemeral port.

The specific code path that triggers it

  1. proxyListener.Close() releases the port to the OS (proxy_test.go:917)
  2. [TOCTOU window — another goroutine or process can call bind() on the same port]
  3. proxy.ListenAndServe(t.Context()) calls lisCfg.Listen("tcp", ":proxyPort") — may return EADDRINUSE

Why existing code does not prevent it

The PR comment itself acknowledges this: "the existing ECONNREFUSED retry on the outer request covers that one race window." However, the retry loop (proxy_test.go lines ~976-997) only retries when errors.Is(err, syscall.ECONNREFUSED). A bind failure in the proxy goroutine produces a non-http.ErrServerClosed error, causing the goroutine's assert.ErrorIs(t, err, http.ErrServerClosed) assertion to fail immediately, and the proxy never starts. The main test thread then exhausts all 50 ECONNREFUSED retries on a port that may be bound by a port-stealer (returning connection refused or a wrong response), and fails at require.NotNil(t, rsp).

What the impact would be

This is a test-only flaw and port-stealing on loopback is low-probability in isolation, but this PR's stated purpose is to eliminate flaky startup races. On a heavily loaded CI runner with many parallel test processes the probability rises. The PR closes two such races (internal and masked servers) while inadvertently leaving this one open.

How to fix it

Drop require.NoError(t, proxyListener.Close()) and call proxy.Serve(proxyListener) instead of proxy.ListenAndServe(t.Context()). Proxy.Serve(l net.Listener) already exists at proxy.go:107 and is already used by newTestProxyWithConnLimit() in the same test file (proxy_test.go:172): go func() { proxy.Serve(l) }(). No changes to proxy.go are required.

Step-by-step proof

  1. Test runs on a loaded CI runner; another parallel test's ephemeral-port lookup coincidentally receives the same port number just released by proxyListener.Close().
  2. That other test calls bind(:proxyPort) — succeeds, because the port is temporarily free.
  3. proxy.ListenAndServe(t.Context()) calls lisCfg.Listen("tcp", ":proxyPort") — returns EADDRINUSE.
  4. The goroutine at proxy_test.go:933 receives a non-http.ErrServerClosed error; assert.ErrorIs fires a test failure.
  5. The proxy never accepts connections; the main test's client receives ECONNREFUSED (or a response from the wrong server) on every retry.
  6. After 50x20ms = 1s the loop exits; rsp is still nil; require.NotNil(t, rsp) fails.
  7. With the fix (proxy.Serve(proxyListener)), the port is held continuously from the moment it is bound — the TOCTOU window is eliminated entirely.

Comment on lines 974 to 977
req, err := http.NewRequestWithContext(t.Context(), http.MethodGet, proxyURL.String(), nil)
require.NoError(t, err)
req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort))
req.Header.Set("e2b-testing", "test123")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 The req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort)) call on line 976 is dead code — Go's net/http client explicitly excludes Header["Host"] from outgoing requests (via reqWriteExcludeHeader in net/http/request.go), so the proxy never sees "localhost" as the Host. This is a pre-existing issue that the PR touched by reordering it after require.NoError. The test still passes but only exercises the URL-derived host (127.0.0.1:<proxyPort>), not the "localhost" host the author appears to have intended; to fix, replace req.Header.Set("Host", ...) with req.Host = fmt.Sprintf("localhost:%d", proxyPort) and update the X-Forwarded-Host assertion to match.

Extended reasoning...

What the bug is

In TestChangeResponseHeader, line 976 calls req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort)). Go's net/http client maintains a map called reqWriteExcludeHeader (in net/http/request.go, line 99) that lists headers the client must never forward from req.Header — and "Host" is one of them. Instead, the actual Host header written on the wire is taken from req.Host (if non-empty) or req.URL.Host. Because only req.URL is set (to http://127.0.0.1:<proxyPort>), the Host on the wire is always 127.0.0.1:<proxyPort>, regardless of what req.Header["Host"] contains.

The specific code path

  1. client.Do(req) calls req.write(), which calls r.Header.writeSubset(w, reqWriteExcludeHeader, ...) — this skips the "Host" key entirely.
  2. The proxy receives the request with Host: 127.0.0.1:<proxyPort>, stores it as r.In.Host.
  3. pool/client.go line 114 sets X-Forwarded-Host from r.In.Host (when MaskRequestHost is set).
  4. The assertion on line 1014 checks fmt.Sprintf("127.0.0.1:%d", proxyPort) — which matches, but only because it equals the URL host, not the "localhost" value the dead Header.Set was trying to inject.

Why existing code doesn't prevent it

Nothing in the test or the proxy validates that the Header.Set call had any effect. The assertion uses the URL-derived IP address directly, so the test silently validates a different behavior than the one the author intended. There is no compile-time or runtime error when req.Header["Host"] is set and then silently ignored.

Impact

The test passes and its assertions are internally consistent, but it does not exercise the proxy's Host-forwarding behavior with a localhost hostname. Code reviewers reading the test are misled into thinking it tests the "localhost host" path; in reality it only tests the 127.0.0.1 path.

How to fix it

Replace:

req.Header.Set("Host", fmt.Sprintf("localhost:%d", proxyPort))

with:

req.Host = fmt.Sprintf("localhost:%d", proxyPort)

and update the X-Forwarded-Host assertion from fmt.Sprintf("127.0.0.1:%d", proxyPort) to fmt.Sprintf("localhost:%d", proxyPort) to actually test the intended behavior. Alternatively, remove the dead line entirely and rely on the URL host.

Step-by-step proof

  1. req.URL is set to http://127.0.0.1:<proxyPort>, so req.URL.Host = "127.0.0.1:<proxyPort>".
  2. req.Host is never set, so it remains empty string ("").
  3. req.Header.Set("Host", "localhost:<proxyPort>") writes into req.Header["Host"].
  4. client.Do(req) calls r.write()r.Header.writeSubset(w, reqWriteExcludeHeader). Because reqWriteExcludeHeader["Host"] = true, the "Host" entry in req.Header is skipped.
  5. The actual Host line written to the wire (net/http/request.go ~line 610): r.URL.Host"127.0.0.1:<proxyPort>".
  6. The proxy sees Host: 127.0.0.1:<proxyPort>, copies it to X-Forwarded-Host.
  7. The test assertion assert.Equal(t, fmt.Sprintf("127.0.0.1:%d", proxyPort), data.Headers.Get("X-Forwarded-Host")) passes — but never tests the localhost path.

Keep the proxy listener bound through startup and use req.Host so the test exercises the forwarded localhost host it intends to verify.
@ValentaTomas ValentaTomas assigned levb and unassigned djeebus Apr 18, 2026
@ValentaTomas ValentaTomas requested review from levb and removed request for dobrac and jakubno April 18, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants