Skip to content

Add container cleanup to handle SIGTERM/SIGINT scenarios#234

Merged
jacobtruman merged 1 commit into
adobe:mainfrom
jacobtruman:main
Mar 31, 2026
Merged

Add container cleanup to handle SIGTERM/SIGINT scenarios#234
jacobtruman merged 1 commit into
adobe:mainfrom
jacobtruman:main

Conversation

@jacobtruman
Copy link
Copy Markdown
Member

What does this PR do?

Adds signal-based container cleanup to prevent orphaned Docker containers when buildrunner is terminated externally (e.g. SIGTERM from a CI system aborting a superseded build).

Today, buildrunner relies on Python finally blocks to clean up containers. When the process receives SIGTERM, Python's default handler calls os._exit() immediately — finally blocks never execute, and containers running /usr/sbin/init (systemd) or /run.sh (sshd) are left running indefinitely. Over time these accumulate and consume disk, memory, and network resources on build agents.

This change:

  • Adds a global container registry that tracks every container buildrunner starts
  • Installs SIGTERM/SIGINT handlers that force-remove all registered containers before exiting
  • Registers an atexit hook as a safety net for normal exits where cleanup might be missed
  • Covers all 5 container creation paths: build containers, service containers, SSH agent, Docker daemon proxy, source containers, and the multiplatform registry container

What issues does this PR fix or reference?

Fixes orphaned Docker containers, overlay mounts, and virtual network interfaces accumulating on build agents when builds are aborted or superseded.

Previous Behavior

When buildrunner received SIGTERM (e.g. CI system aborting a build), the process terminated immediately without cleaning up Docker containers. Containers running systemd or sshd continued running indefinitely. Each aborted build left behind 2+ containers, associated overlay mounts, and veth interfaces.

New Behavior

When buildrunner receives SIGTERM or SIGINT, a signal handler force-removes all Docker containers started by this buildrunner process, then exits. The handler is signal-safe: no locks, no logging (uses stderr), and uses os._exit() to avoid racing with finally blocks. On normal completion, containers are unregistered as they're cleaned up through the existing code paths, so the signal handler has nothing to do.

Merge requirements satisfied?

  • I have updated the documentation or no documentation changes are required.
  • I have added tests to cover my changes.
  • I have updated the base version in pyproject.toml (if appropriate).

@jacobtruman jacobtruman force-pushed the main branch 5 times, most recently from 946f3b2 to d596be3 Compare March 31, 2026 03:33
@jacobtruman jacobtruman merged commit 78657f4 into adobe:main Mar 31, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants