Skip to content

[NotBug]: Container network becomes unreachable after system restarts #1321

@adiled

Description

@adiled

I have done the following

  • I have searched the existing issues
  • If possible, I've reproduced the issue using the 'main' branch of this project

Steps to reproduce

  1. Create a network and start a container with --publish port forwarding:
    container network create mynet --subnet 192.168.100.0/24
    container create --name mynginx --net mynet --publish 127.0.0.1:8000:8000 docker.io/library/nginx:alpine
    container start mynginx
    
  2. Verify the container is reachable from the host via the gateway IP (192.168.100.1) and via the published port (127.0.0.1:8000). Both work.
  3. Restart the Mac (full system restart, not just sleep/wake).
  4. After login, verify the container daemon is running (container system status).
  5. Attempt to reach the container via the gateway IP or published port.

Current behavior

After a system restart, the container reports as "running" via container list, but:

  • The gateway IP (e.g. 192.168.100.1) is no longer reachable from inside the container. Requests from the container to host services through the gateway hang indefinitely.
  • Published ports on the host still accept TCP connections (the NIO socket forwarder binds successfully), but requests hang because the forwarder cannot route traffic to the container's vmnet IP.
  • container stop hangs indefinitely (related to [Bug]: Unable to stop a container when it's frozen. #576). The graceful shutdown path sends SIGTERM, waits, sends SIGKILL, then calls lc.stop(), but the communication with the VM appears to be broken, so none of these complete.
  • container exec also hangs.
  • The only recovery is to kill the container's underlying process via kill -9 <pid>, then container rm, then recreate.

This is consistently reproducible across system restarts. It does not require VPN changes (#1307), though that issue may share the same root cause (vmnet interface state not surviving system state changes).

Expected behavior

Containers should either remain network-reachable after a system restart, or the runtime should detect the broken state and either recover automatically or transition the container to a stopped/error state so that container stop and container rm work without hanging.

Environment

- OS: macOS 26.3.2 (Tahoe)
- Hardware: Apple Silicon (M-series)
- Container: container CLI version 0.10.0

Analysis

Looking at the source code, the vmnet network is created once during ReservedVmnetNetwork.start() via vmnet_network_create() and the resulting vmnet_network_ref is stored in a Mutex<State>. There is no mechanism to detect that the underlying vmnet interface has become invalid after a system restart, and no reconnection or re-creation logic.

The SandboxService.stop() path calls gracefulStopContainer() which relies on communicating with the VM via the container agent. When the VM's network is in a broken state, lc.wait() and lc.stop() appear to block indefinitely, making the container unrecoverable through normal commands.

A potential approach could be to detect stale container/network state on daemon startup after a system restart and either re-establish vmnet interfaces or clean up containers that are in an unrecoverable state.

Relevant log output

# Container appears running but is unreachable
$ container list
ID           IMAGE                              STATUS    NETWORKS
mynginx      docker.io/library/nginx:alpine     running   mynet

# Attempting to stop hangs (must be killed with timeout)
$ timeout 10 container stop mynginx
# (no output, killed by timeout)

# Gateway IP unreachable from container context
$ container exec mynginx -- wget -q -O - --timeout=5 http://192.168.100.1:9090/
# (hangs indefinitely)

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions