Add container cleanup to handle SIGTERM/SIGINT scenarios#234
Merged
Conversation
946f3b2 to
d596be3
Compare
mheidbrink
approved these changes
Mar 31, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds signal-based container cleanup to prevent orphaned Docker containers when buildrunner is terminated externally (e.g. SIGTERM from a CI system aborting a superseded build).
Today, buildrunner relies on Python finally blocks to clean up containers. When the process receives SIGTERM, Python's default handler calls os._exit() immediately — finally blocks never execute, and containers running /usr/sbin/init (systemd) or /run.sh (sshd) are left running indefinitely. Over time these accumulate and consume disk, memory, and network resources on build agents.
This change:
What issues does this PR fix or reference?
Fixes orphaned Docker containers, overlay mounts, and virtual network interfaces accumulating on build agents when builds are aborted or superseded.
Previous Behavior
When buildrunner received SIGTERM (e.g. CI system aborting a build), the process terminated immediately without cleaning up Docker containers. Containers running systemd or sshd continued running indefinitely. Each aborted build left behind 2+ containers, associated overlay mounts, and veth interfaces.
New Behavior
When buildrunner receives SIGTERM or SIGINT, a signal handler force-removes all Docker containers started by this buildrunner process, then exits. The handler is signal-safe: no locks, no logging (uses stderr), and uses os._exit() to avoid racing with finally blocks. On normal completion, containers are unregistered as they're cleaned up through the existing code paths, so the signal handler has nothing to do.
Merge requirements satisfied?
versionin pyproject.toml (if appropriate).