Skip to content

Reduce per-connection memory overhead#414

Merged
9seconds merged 2 commits into9seconds:masterfrom
dolonet:optimize-per-connection-overhead
Mar 29, 2026
Merged

Reduce per-connection memory overhead#414
9seconds merged 2 commits into9seconds:masterfrom
dolonet:optimize-per-connection-overhead

Conversation

@dolonet
Copy link
Copy Markdown
Contributor

@dolonet dolonet commented Mar 28, 2026

Summary

  • sync.Pool for relay buffersvar buf [16379]byte on the goroutine stack forces Go runtime to grow it to 32 KB (next power of two). Pooled buffers keep stacks small (~2-4 KB).
  • sync.Pool for doppelganger buffer — same issue with [16384]byte in conn.start().
  • context.AfterFunc replaces idle goroutines in proxy.ServeConn and relay.Relay that existed only to <-ctx.Done() and close connections.

Benchmark (VPS, 1 vCPU, 961 MB RAM, real mtg binary, domain fronting to echo)

Connections Before RSS After RSS Δ RSS Before MB/s After MB/s Before OK After OK
1000 × 1 MB 64.6 MB 70.8 MB +9.5% 101 112 1000/1000 1000/1000
2000 × 512 KB 80.4 MB 71.4 MB -11% 87 67 2000/2000 2000/2000
3000 × 256 KB 72.2 MB 49.6 MB -31% 10 ⚠️ 63 2754/3000 3000/3000
500 × 100 KB 17.2 MB 15.5 MB -10% 27 27 500/500 500/500

At moderate concurrency (~1000) the difference is within noise. At 3000 connections the unmodified binary breaks down: 246 connection timeouts, throughput drops to 10 MB/s. With this change: zero failures, 63 MB/s, 31% lower RSS.

Test plan

  • go test ./... passes
  • Load tested on VPS with 500 / 1000 / 2000 / 3000 concurrent connections
  • Verified both pool buffers and AfterFunc behavior under load

Closes #412

dolonet added 2 commits March 28, 2026 13:24
- Use sync.Pool for relay buffers instead of stack-allocated arrays.
  A [16379]byte on the goroutine stack forces Go to grow it to 32KB
  (next power of two). Pooled buffers keep goroutine stacks small.

- Same fix for doppelganger write buffer ([16384]byte in conn.start).

- Replace idle goroutines with context.AfterFunc in proxy.ServeConn
  and relay.Relay. These goroutines existed only to wait on ctx.Done()
  and close connections. AfterFunc achieves the same without allocating
  a goroutine until the context is actually cancelled.

Net effect: at 3000 concurrent connections on a 1-vCPU/961MB VPS,
the unmodified binary drops 246 connections and falls to 10 MB/s.
With these changes: zero failures, 63 MB/s, 31% lower RSS.

Closes 9seconds#412
@9seconds 9seconds merged commit 6b51de8 into 9seconds:master Mar 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Оптимизация потребления памяти на соединение (~40%)

2 participants