Skip to content

fix(daemon): set SO_LINGER 0 on accepted sockets to avoid CLOSE_WAIT leaks (partial #32)#87

Merged
zackees merged 1 commit intomainfrom
fix/32-so-linger-accepted-sockets
Apr 18, 2026
Merged

fix(daemon): set SO_LINGER 0 on accepted sockets to avoid CLOSE_WAIT leaks (partial #32)#87
zackees merged 1 commit intomainfrom
fix/32-so-linger-accepted-sockets

Conversation

@zackees
Copy link
Copy Markdown
Member

@zackees zackees commented Apr 18, 2026

Summary

  • Set SO_LINGER=0 on the daemon's listening socket before listen(2) so accepted client sockets inherit a zero linger and force an immediate RST on close instead of going through FIN / CLOSE_WAIT / TIME_WAIT.
  • After a hard-kill of the daemon, this prevents the kernel from leaking dangling CLOSE_WAIT state on client sockets that outlives the daemon itself and would otherwise block a fresh instance from re-binding the port.
  • Minimal (18-line) change inside the existing bind_listener_with_retry path.

Why listener-inheritance (and not a custom accept loop)

axum::serve in axum 0.7 owns the accept loop inside its IntoFuture impl and hands each accepted TcpStream straight to hyper-util with no per-connection hook. Replacing that loop means reimplementing graceful shutdown + serve_connection_with_upgrades plumbing.

SO_LINGER is inherited by accept(2) from the listening socket on all three target platforms (Linux sock_copy, macOS/BSD sonewconn, Windows AFD.sys). Setting it once on the listener covers every accepted connection and keeps the change surgical. Axum 0.8's Listener trait would allow a cleaner per-connection hook; we can revisit when we upgrade.

Test plan

  • uv run cargo test -p fbuild-daemon --test port_recovery -- --ignored passes (hard-kills a daemon holding an open TCP connection, then asserts a second daemon re-binds the port cleanly).
  • uv run cargo check --workspace --all-targets clean.
  • uv run cargo clippy --workspace --all-targets -- -D warnings clean.
  • uv run cargo fmt --all -- --check clean.

Remaining on #32

  • Process containment via the running-process crate (Windows Job Object kill-on-close + POSIX process group / Linux parent-death). Still open; intentionally deferred to a separate PR since it touches every subprocess spawn site.
  • SO_LINGER 0 on accepted sockets (this PR)
  • SetConsoleCtrlHandler for Windows CTRL_CLOSE / LOGOFF / SHUTDOWN (already merged in PR feat(daemon): Windows console ctrl handler for graceful shutdown (#18) #61)

Closes: no (partial progress only). Tracking stays on #32 until process containment lands.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

🤖 Generated with Claude Code

…leaks (partial #32)

Set SO_LINGER=0 on the daemon's listening socket before `listen(2)` so
that accepted client sockets inherit a zero linger and force an
immediate RST on close instead of going through FIN / CLOSE_WAIT /
TIME_WAIT. After a hard-kill of the daemon, this prevents the kernel
from leaking dangling CLOSE_WAIT state on client sockets that outlives
the daemon itself and would otherwise block a fresh instance from
re-binding the port.

SO_LINGER is inherited from the listener by accept(2) on Linux, macOS,
and Windows (AFD.sys), so setting it once on the listener covers every
accepted connection without needing to hook axum 0.7's internal
accept loop (axum 0.7 hands accept off to hyper-util inside `serve`
with no per-connection extension point).

Verified with the pre-existing regression test at
crates/fbuild-daemon/tests/port_recovery.rs (run with `--ignored`),
which hard-kills a daemon holding an open TCP connection and asserts
a second daemon can re-bind the port cleanly.

Remaining on #32: process containment via the `running-process` crate
(Windows Job Object + POSIX process group parent-death). The
SetConsoleCtrlHandler piece landed earlier in PR #61.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 18, 2026

Warning

Rate limit exceeded

@zackees has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 56 minutes and 13 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 56 minutes and 13 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28fb1f65-bcd2-473e-9a90-af9973245292

📥 Commits

Reviewing files that changed from the base of the PR and between 41982b2 and 5bf790b.

📒 Files selected for processing (1)
  • crates/fbuild-daemon/src/main.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/32-so-linger-accepted-sockets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zackees zackees merged commit ba25a5d into main Apr 18, 2026
74 of 76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant