Skip to content

Have httpd session own the SendspinConnection#57

Merged
kahrendt merged 3 commits into
mainfrom
fix-connection-safety
May 18, 2026
Merged

Have httpd session own the SendspinConnection#57
kahrendt merged 3 commits into
mainfrom
fix-connection-safety

Conversation

@kahrendt
Copy link
Copy Markdown
Contributor

Currently, if a connection is dropped (due to WiFi issues or changing servers) and there are still messages waiting to be sent in the httpd queue, it will hard crash the device due a use after free issue. This PR inverts the ownership model so that the httpd session is the primary owner of the SendspinConnection. This should avoid this type of crash in the future, as the httpd task will only free it after it knows its safe.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the connection ownership model for ESP httpd WebSocket sessions so that queued httpd work items can’t outlive (and dereference) freed SendspinServerConnection objects after a disconnect/reconnect scenario.

Changes:

  • Switch SendspinWsServer new-connection callbacks from std::unique_ptr to std::shared_ptr (host + ESP) to support shared ownership.
  • On ESP, pin each accepted SendspinServerConnection to the httpd session via httpd_sess_set_ctx, and make websocket/worker paths look the connection up via httpd_sess_get_ctx.
  • Simplify ConnectionManager teardown by removing the “dying connection” holding slot and relying on the session-pinned refcount on ESP.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/host/ws_server.h Updates new-connection callback type to shared_ptr.
src/host/ws_server.cpp Creates server connections with make_shared to match new callback ownership.
src/esp/ws_server.h Documents session-pinned ownership model; makes find-connection setter a no-op stub on ESP.
src/esp/ws_server.cpp Pins connection shared_ptr into httpd session ctx; routes handler via session ctx lookup.
src/esp/server_connection.h Updates async time worker contract documentation for session lookup.
src/esp/server_connection.cpp Converts async send workers to resolve connection via session ctx at execution time.
src/connection_manager.h Changes server on-new-connection API to shared_ptr; removes dying-connection fields.
src/connection_manager.cpp Adapts to shared_ptr server connections and removes dying-connection release logic.
docs/internals.md Updates internal documentation to match new ownership / teardown behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/esp/ws_server.cpp Outdated
Comment thread src/esp/server_connection.cpp Outdated
@kahrendt kahrendt merged commit 6098563 into main May 18, 2026
4 checks passed
@kahrendt kahrendt deleted the fix-connection-safety branch May 18, 2026 14:02
kahrendt added a commit that referenced this pull request May 22, 2026
This is a follow-up to #57. That PR stopped the use-after-free crash by making the httpd session the owner of the `SendspinConnection`, but a few related races remain in how queued messages get sent when a connection is reused or replaced. This fixes three of them.

## Changes

**Bind ESP async sends to the connection that queued them.** The httpd async send workers identified their target by `(server, sockfd)` and re-resolved the connection when the worker ran. If the connection closed before the worker ran and the socket fd was recycled by a new connection, the worker could send a frame to the wrong (or torn-down) peer. The work items now hold a `weak_ptr<SendspinServerConnection>` and `lock()` it in the worker. It resolves to the exact connection that queued the work, or null if it's gone, in which case the worker no-ops cleanly. The arg structs hold non-trivial members now, so they're placement-new constructed and explicitly destroyed before `platform_free`.

**Gate sends on the client/hello.** Adds an `allow_before_hello` parameter to `send_text_message`. Async transports drop any frame queued before `client_hello_sent_` is set, preserving the "hello is always first" invariant even if a stale frame gets queued during teardown. Exactly two messages opt out: the hello itself (otherwise it would gate its own send and deadlock the handshake) and goodbye (a rejected connection is told to leave before it ever sends a hello). Synchronous host/IXWebSocket transports ignore the flag.

**Track hello retries per connection.** The retry timer was a single slot, so when a handoff candidate arrived mid-handshake, arming its retry overwrote the current connection's pending hello and the first connection never got retried. It's now one entry per managed connection.

Also updates `docs/internals.md`. The "Server connection ownership" and "Graceful Disconnect" sections still described the old `(server, sockfd)` worker lookup along with a few small comment updates.

### Thread-safety fixes from review

- `client_hello_sent_` / `server_hello_received_` are written on network threads and read from the main loop via `is_handshake_complete()`, so they're now `std::atomic<bool>` (matching the existing `message_dispatch_enabled_`). This was a pre-existing race, but the new pre-hello gate reads the same flag, so it's worth resolving here.
- The destructor cleared `hello_retries_` outside `conn_ptr_mutex_`; moved it inside, matching every other access.

## Testing

Unit tests aren't applicable because this touches the ESP code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants