Skip to content

Defer TCP socket connect() until DO output gate clears.#6533

Open
vaibhavSharma-cf wants to merge 1 commit intomainfrom
vaibhav/socket-output-gate
Open

Defer TCP socket connect() until DO output gate clears.#6533
vaibhavSharma-cf wants to merge 1 commit intomainfrom
vaibhav/socket-output-gate

Conversation

@vaibhavSharma-cf
Copy link
Copy Markdown

Adds TCP_SOCKET_CONNECT_OUTPUT_GATE autogate that defers httpClient->connect() until IoContext::waitForOutputLocks() resolves. Socket is returned immediately backed by kj::newPromisedStream so reads/writes queue until the real connection arrives. Connect task lives in ConnectionData alongside tlsStarter to prevent
TlsStarterCallback UAF on Socket GC.

@vaibhavSharma-cf vaibhavSharma-cf requested review from a team as code owners April 8, 2026 19:21
@vaibhavSharma-cf vaibhavSharma-cf self-assigned this Apr 8, 2026
@vaibhavSharma-cf
Copy link
Copy Markdown
Author

/bonk review

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

  • [HIGH] Use-after-free: addressStr captured as non-owning kj::StringPtr in coroutine, then the owning kj::String is moved away — the coroutine will dereference freed memory when it resumes after waitForOutputLocks().
  • [MEDIUM] ConnectionData member ordering: connectTask is destroyed after tlsStarter — while currently safe due to coroutine cancellation semantics, this contradicts the stated intent and is fragile against future changes. Moving connectTask after tlsStarter would make the safety explicit.
  • [LOW] Stale comment on connectImpl — the original TODO was removed but the replacement comment is misleading about where the gate wait actually occurs.

This review was generated by an AI assistant and may contain errors. Please verify all findings independently.

Comment on lines 189 to 192
kj::Maybe<kj::Promise<void>> connectTask;
// tlsStarter must be declared after connectionStream so that it is destroyed first,
// since it holds a reference that keeps the connection alive.
kj::Own<kj::TlsStarterCallback> tlsStarter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Destruction order concern: C++ destroys members in reverse declaration order, so the current order is:

  1. tlsStarter destroyed first (declared last)
  2. connectTask destroyed second
  3. watchForDisconnectTask destroyed third
  4. connectionStream destroyed last

This means tlsStarter is freed before connectTask is cancelled. The PR description says the purpose of storing connectTask here is to "cancel the connect task before tlsStarter is destroyed" — but the current declaration order achieves the opposite.

In practice this is currently safe because cancelling a KJ coroutine just destroys the coroutine frame without running more code, so the dangling httpConnectSettings.tlsStarter pointer inside the frame is never dereferenced during destruction. However, this is a fragile invariant — if anyone later adds cleanup logic or attaches destructors that touch tlsStarter, this becomes a real UAF.

Suggestion: move connectTask after tlsStarter to match the stated intent:

Suggested change
kj::Maybe<kj::Promise<void>> connectTask;
// tlsStarter must be declared after connectionStream so that it is destroyed first,
// since it holds a reference that keeps the connection alive.
kj::Own<kj::TlsStarterCallback> tlsStarter;
kj::Maybe<kj::Promise<void>> connectTask;
// tlsStarter must be declared after connectionStream so that it is destroyed first,
// since it holds a reference that keeps the connection alive.
kj::Own<kj::TlsStarterCallback> tlsStarter;

should become:

    kj::Own<kj::TlsStarterCallback> tlsStarter;
    // connectTask must be declared after tlsStarter so that it is destroyed first,
    // since the coroutine frame holds a raw pointer to tlsStarter via httpConnectSettings.
    kj::Maybe<kj::Promise<void>> connectTask;

Note: this would also require reordering the constructor initializer list accordingly.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Apr 8, 2026

Review posted on PR #6533 with 3 inline comments. Here's a summary of the findings:

[HIGH] Use-after-free on addressStr (sockets.c++:299): The doConnect coroutine captures addressStr as a non-owning kj::StringPtr, but the owning kj::String is moved into setupSocket() on line 323 after the coroutine is created. When the coroutine resumes after co_await ioContext.waitForOutputLocks(), it dereferences a dangling pointer. Fix: have the coroutine take ownership of the string (change parameter to kj::String).

[MEDIUM] ConnectionData member destruction order (sockets.h:189-192): The declared order puts connectTask before tlsStarter, meaning tlsStarter is destroyed first (reverse order). This contradicts the PR description's stated intent of cancelling the connect task before tlsStarter is destroyed. While currently safe due to coroutine cancellation semantics, the ordering should be swapped to make the safety guarantee explicit and resilient to future changes.

[LOW] Misleading function name (sockets.c++:354): The comment on connectImpl explains the output gate is now handled inside connectImplNoOutputLock, but the function name implies it does not handle output locks.

github run

name = "main",
worker = (
modules = [
(name = "worker", esModule = embed "socket-output-gate-test.js"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also get a no-autogate variant of this test, just to show the behavior change when the autogate is not enabled?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this test case as it can be the case that putCompleted is set to true immediately after storage put, making this test flaky. Instead I have added test case specific to connectImplNoOutputLock with blocking/unblocking output gate.

Copy link
Copy Markdown
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for doing this? Why do we need to prevent network outputs while storage writes are pending?

It also doesn't seem like that is what is happening in this PR. It is actually blocking socket connections completely, shouldn't only writes be blocked instead? Focusing on blocking writes may also simplify the implementation, since connect itself is synchronous, so the code to block it is quite involved and requires these fulfillers (which I don't think are ideal).

@vaibhavSharma-cf vaibhavSharma-cf force-pushed the vaibhav/socket-output-gate branch from afc70fa to 2093ffb Compare April 9, 2026 15:08
Adds TCP_SOCKET_CONNECT_OUTPUT_GATE autogate that defers httpClient->connect()
until IoContext::waitForOutputLocks() resolves. Socket is returned immediately
backed by kj::newPromisedStream so reads/writes queue until the real connection
arrives.
@vaibhavSharma-cf vaibhavSharma-cf force-pushed the vaibhav/socket-output-gate branch from 2093ffb to 828423e Compare April 9, 2026 15:10
@kentonv
Copy link
Copy Markdown
Member

kentonv commented Apr 9, 2026

What is the rationale for doing this? Why do we need to prevent network outputs while storage writes are pending?

That's how output gates work. We don't want applications to have to wait for writes to be confirmed durable before we allow them to continue executing. Instead, we make it impossible to observe that the app has continued executing until the write has become durable. If the write ends up failing, then we prevent the rest of the world from ever knowing that the app kept running -- so from the PoV of the rest of the world, the app stopped exactly where the failed write was performed. This is a huge optimization because it allows an app to perform several writes in rapid succession with only one round trip to durable storage (SRS followers).

https://blog.cloudflare.com/durable-objects-easy-fast-correct-choose-three/

shouldn't only writes be blocked instead?

Anything that can be observed by the outside world needs to be blocked. That means we have to block both writes and connection initiation -- the remote end can observe that it has received a connection, and in theory you could imagine a protocol where merely connecting to a remote service serves to notify it of something, thus having side effects.

return result;

if (util::Autogate::isEnabled(util::AutogateKey::TCP_SOCKET_CONNECT_OUTPUT_GATE)) {
// Deferred-connect path: return a Socket backed by a promised stream immediately, deferring
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make this a lot simpler. Use newPromisedWorkerInterface() to wrap the underlying output channel that connect() is being called on, then fulfill the promise when the output gate resolves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants