Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Prune dangling images on RESTART_CI_DOCKER_BEFORE_RUN #27777

Merged
merged 2 commits into from May 31, 2023

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented May 30, 2023

This should prevent the persistent workers from running out of disk space. Containers are already removed, but not images. This is required since CI images are built and cached.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 30, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK hebasto
Stale ACK stickies-v

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@DrahtBot DrahtBot added the Tests label May 30, 2023
@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

This should fix the out-of-space error, see https://cirrus-ci.com/task/6118746146734080?logs=ci#L4903

@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

Looks like it is working, but not claiming a lot of space:

https://cirrus-ci.com/task/6656469207089152?logs=ci#L408

Total reclaimed space: 949.4MB

@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

Not sure what is going on. Seems there are unreachable zombie containers in the overlay2 directory. Let's try podman instead.

@hebasto
Copy link
Member

hebasto commented May 30, 2023

Looks like it is working, but not claiming a lot of space:

cirrus-ci.com/task/6656469207089152?logs=ci#L408

Total reclaimed space: 949.4MB

What amount do you expect?

@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

I already wiped all servers, but I was expecting that about 50GB of zombie stuff is cleared:

# du -sh /var/lib/docker/overlay2/ 
53G	/var/lib/docker/overlay2/

@hebasto
Copy link
Member

hebasto commented May 30, 2023

Not sure what is going on. Seems there are unreachable zombie containers in the overlay2 directory. Let's try podman instead.

Maybe add:

docker volume prune --force

?

@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

I listed all images, volumes, containers, networks, and the log file(s) and the list was empty, apart from 3 small image layers.

@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

(CI is green, but it will probably take 6 months to find out if podman is running out of storage space as well)

@hebasto
Copy link
Member

hebasto commented May 30, 2023

(CI is green, but it will probably take 6 months to find out if podman is running out of storage space as well)

Could explicit logging of disk space usage help in the meanwhile?

@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

This is already done, see https://cirrus-ci.com/task/4965521926389760?logs=ci#L356

@hebasto
Copy link
Member

hebasto commented May 30, 2023

This is already done, see cirrus-ci.com/task/4965521926389760?logs=ci#L356

Right. I mean, no need to wait for 6 month. If numbers are the same after a week then this PR works as intended, no?

@maflcko
Copy link
Member Author

maflcko commented May 30, 2023

Ok, I'll check back a week after merge.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa12307

systemctl restart docker
podman container kill --all # Similar to "systemctl restart docker"
echo "Prune all dangling images"
docker image prune --force
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be prudent to use the --filter option here to only prune images created by the CI? This may be especially helpful to people running the CI locally/unsandboxed. We'd just need to add a --label "bitcoin-ci" argument to docker build.

Suggested change
docker image prune --force
docker image prune --force --filter label=bitcoin-ci

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do dangling image (layer)s have a label? Anyway, this isn't for local runs, only for the persistent runners, guarded via RESTART_CI_DOCKER_BEFORE_RUN. Happy to rename RESTART_CI_DOCKER_BEFORE_RUN to DANGER_RESTART_CI_DOCKER_BEFORE_RUN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do dangling image (layer)s have a label?

Yeah they do, just tested it to confirm.

guarded via RESTART_CI_DOCKER_BEFORE_RUN

Fair enough, looks like this is unlikely to be hit locally. I don't think renaming to DANGER_RESTART_CI_DOCKER_BEFORE_RUN is an improvement, so happy to keep as is. Just thought adding a label and only applying batch operations to our own images is a good practice, e.g. if in the future we remove the RESTART_CI_DOCKER_BEFORE_RUN condition, given that it's almost no overhead. But up to you, it's no big deal and I don't have a strong opinion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, could make sense in a follow-up to apply the label to both places and then unguard the image prune from DANGER_RESTART_CI_DOCKER_BEFORE_RUN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #27793

@@ -42,7 +42,9 @@ if [ -z "$DANGER_RUN_CI_ON_HOST" ]; then

if [ -n "${RESTART_CI_DOCKER_BEFORE_RUN}" ] ; then
echo "Restart docker before run to stop and clear all containers started with --rm"
systemctl restart docker
podman container kill --all # Similar to "systemctl restart docker"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this works? Even though the CLIs have a compatible syntax, I think they operate on their own set of resources. I'm pretty sure podman ps right before this line won't show anything, so I think no containers are killed after this line? At least, that's what I get when I try locally spinning up a container with docker run and then try to show/kill it with podman ps or podman container kill --all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already switched all machines to podman-docker, because it is not possible to install docker.io and podman-docker at the same time. So docker run is actually just an alias for podman run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can't comment on that. Since there is no mention of podman anywhere else in the codebase/documentation, I think this may make maintaining the CI more opaque, but perhaps that's an okay tradeoff.

Copy link
Member Author

@maflcko maflcko May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't expect anyone to use RESTART_CI_DOCKER_BEFORE_RUN beside the persistent workers, so it is an undocumented setting. Anyone using it is expected to infer the documentation from the two echo calls and two lines of code.

systemctl restart docker
podman container kill --all # Similar to "systemctl restart docker"
echo "Prune all dangling images"
docker image prune --force
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do dangling image (layer)s have a label?

Yeah they do, just tested it to confirm.

guarded via RESTART_CI_DOCKER_BEFORE_RUN

Fair enough, looks like this is unlikely to be hit locally. I don't think renaming to DANGER_RESTART_CI_DOCKER_BEFORE_RUN is an improvement, so happy to keep as is. Just thought adding a label and only applying batch operations to our own images is a good practice, e.g. if in the future we remove the RESTART_CI_DOCKER_BEFORE_RUN condition, given that it's almost no overhead. But up to you, it's no big deal and I don't have a strong opinion.

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa9c65a

I've not used podman and don't fully understand the need for fa12307, so I'm hoping that other people do and can comment on that so I don't need to dive into it. Otherwise, happy to revisit this later.

@@ -42,7 +42,9 @@ if [ -z "$DANGER_RUN_CI_ON_HOST" ]; then

if [ -n "${RESTART_CI_DOCKER_BEFORE_RUN}" ] ; then
echo "Restart docker before run to stop and clear all containers started with --rm"
systemctl restart docker
podman container kill --all # Similar to "systemctl restart docker"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I can't comment on that. Since there is no mention of podman anywhere else in the codebase/documentation, I think this may make maintaining the CI more opaque, but perhaps that's an okay tradeoff.

@DrahtBot DrahtBot requested a review from stickies-v May 30, 2023 17:35
@fanquake fanquake merged commit 08722f2 into bitcoin:master May 31, 2023
16 checks passed
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 31, 2023
…EFORE_RUN

fa12307 ci: Use podman for persistent workers (MarcoFalke)
fa9c65a ci: Prune dangling images on RESTART_CI_DOCKER_BEFORE_RUN (MarcoFalke)

Pull request description:

  This should prevent the persistent workers from running out of disk space. Containers are already removed, but not images. This is required since CI images are built and cached.

ACKs for top commit:
  hebasto:
    ACK fa12307

Tree-SHA512: 07c4faec57d659d1762e4e6d776c882ee48d4bac6ce6d438d56d9ab13277be3e39d6aa38816165a5a3e0938ac5d47674ee2921b6e115a4bb54e3e4910b34c4b6
@maflcko maflcko deleted the 2305-ci-prune-images-dangling- branch June 1, 2023 13:07
@@ -42,7 +42,9 @@ if [ -z "$DANGER_RUN_CI_ON_HOST" ]; then

if [ -n "${RESTART_CI_DOCKER_BEFORE_RUN}" ] ; then
echo "Restart docker before run to stop and clear all containers started with --rm"
systemctl restart docker
podman container kill --all # Similar to "systemctl restart docker"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this doesn't work some of the time:

++ echo 'Restart docker before run to stop and clear all containers started with --rm'
Restart docker before run to stop and clear all containers started with --rm
++ podman container kill --all
++ echo 'Prune all dangling images'
Prune all dangling images
++ docker image prune --force
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
+++ docker run --rm --interactive --detach --tty --mount type=bind,src=/tmp/cirrus-build,dst=/ro_base,readonly --mount type=volume,src=ci_native_qt5_ccache,dst=/tmp/ccache_dir --mount type=volume,src=ci_native_qt5_depends,dst=/tmp/cirrus-build/depends --mount type=volume,src=ci_native_qt5_previous_releases,dst=/tmp/cirrus-build/releases/x86_64-pc-linux-gnu -w /tmp/cirrus-build --env-file /tmp/env --name ci_native_qt5 ci_native_qt5
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2023-06-05T06:45:57+02:00" level=warning msg="The input device is not a TTY. The --tty and --interactive flags might not work properly"
Error: error creating container storage: the container name "ci_native_qt5" is already in use by "1b176e38eff6bdc928f2e213260cb9c8abb9efaa5cd6b0cccd09033a1b707601". You have to remove that container to be able to reuse that name.: that name is already in use
++ CI_CONTAINER_ID=
Exit status: 125

fanquake added a commit to bitcoin-core/gui that referenced this pull request Jun 12, 2023
faaa627 ci: Use podman stop over podman kill (MarcoFalke)

Pull request description:

  This should avoid a race where the kill is not done when spinning up the new container. podman stop waits 10 seconds by default.

  Fixes bitcoin/bitcoin#27777 (comment)

ACKs for top commit:
  dimitaracev:
    ACK [faaa627](bitcoin/bitcoin@faaa627)

Tree-SHA512: d46a32429629dcfa711a2d0abe79100f5593d72f8e5eded7b505b0f270e28abfeba0a96bec43e9951a76a96bb23fe2c7092433e5e0c66510e3e3b6c3cb58f4db
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 12, 2023
faaa627 ci: Use podman stop over podman kill (MarcoFalke)

Pull request description:

  This should avoid a race where the kill is not done when spinning up the new container. podman stop waits 10 seconds by default.

  Fixes bitcoin#27777 (comment)

ACKs for top commit:
  dimitaracev:
    ACK [faaa627](bitcoin@faaa627)

Tree-SHA512: d46a32429629dcfa711a2d0abe79100f5593d72f8e5eded7b505b0f270e28abfeba0a96bec43e9951a76a96bb23fe2c7092433e5e0c66510e3e3b6c3cb58f4db
@fanquake fanquake mentioned this pull request Jun 15, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 15, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 15, 2023
fanquake added a commit that referenced this pull request Jun 16, 2023
6233049 ci: Switch to `amd64` container in "ARM" task (Hennadii Stepanov)
d845a3e test: add coverage for `/deploymentinfo` passing a blockhash (brunoerg)
72ead86 rest: bugfix, fix crash error when calling `/deploymentinfo` (brunoerg)
6f7a0ae ci: Use podman stop over podman kill (MarcoFalke)
de56daa ci: Use podman for persistent workers (MarcoFalke)
71f626e ci: Prune dangling images on RESTART_CI_DOCKER_BEFORE_RUN (MarcoFalke)

Pull request description:

  Currently backports:
  * #27777
  * #27844
  * #27853
  * #27886

  Effectively also backports: #27562.

ACKs for top commit:
  stickies-v:
    ACK 6233049

Tree-SHA512: d0f3d5c4cd0cf9792f3270078b49ad6661e28e7a883f91e5981e8cfe95d01c6e415762ee2e3a5fea17d198299989a8b5412961c2cc788cef975b2ee45d84550c
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Sep 26, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Sep 26, 2023
@fanquake
Copy link
Member

Backported for 24.x in #28535.

fanquake added a commit that referenced this pull request Oct 6, 2023
9077f21 depends: fix unusable memory_resource in macos qt build (fanquake)
dccacf0 build, macos: Fix `qt` package build with new Xcode 15 linker (Hennadii Stepanov)
4359649 ci: Switch to `amd64` container in "ARM" task (Hennadii Stepanov)
805f98b ci: Nuke Android APK task, Use credits for tsan (MarcoFalke)
cb5512d test: ensure old fee_estimate.dat not read on restart and flushed (ismaelsadeeq)
01f8ee4 tx fees, policy: read stale fee estimates with a regtest-only option (ismaelsadeeq)
1c98029 tx fees, policy: do not read estimates of old fee_estimates.dat (ismaelsadeeq)
77979e0 tx fees, policy: periodically flush fee estimates to fee_estimates.dat (ismaelsadeeq)
67b6d99 Do not use std::vector = {} to release memory (Pieter Wuille)
defdc15 ci: Use podman stop over podman kill (MarcoFalke)
7f1357d ci: Use podman for persistent workers (MarcoFalke)
0db69a3 ci: Prune dangling images on RESTART_CI_DOCKER_BEFORE_RUN (MarcoFalke)

Pull request description:

  Backports to the 24.x branch. Currently:
  * #27622
  * #27777
  * #27834
  * #27844
  * #27886
  * #28452
  * #28543
  * #28571

ACKs for top commit:
  stickies-v:
    ACK 9077f21

Tree-SHA512: abaafc9a048b67b494993134fd332457ea52695ec007b963c283f962ec40c3b6b3a7e98407481be55d3271a595088a0281cc84b79dad4f24d260381ea0153076
fanquake added a commit to bitcoin-core/gui that referenced this pull request Oct 15, 2023
…ng images selectively

e44c574 ci: always prune all dangling bitcoin-ci-test images (stickies-v)
ce16997 ci: add label to docker images (stickies-v)

Pull request description:

  Follow-up from bitcoin/bitcoin#27777 (comment).

  Labeling the docker images produced by the CI allows us/the user to apply batch operations to all images (including dangling ones) produced by the ci without affecting other, non-bitcoin-ci images. With labeling, we can safely always prune dangling bitcoin-ci-test images without checking for `RESTART_CI_DOCKER_BEFORE_RUN`, which we enable on our persistent runners.

ACKs for top commit:
  fanquake:
    utACK e44c574

Tree-SHA512: 1009fb1be78fbc80b5341ba92eac2991e77d050e1ab6048d1d9a65af73413a6be7afc1e1c764eb3f347f363af31245b93fdb38f6ac016d775aad4a0f36e4c98f
Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Oct 21, 2023
…s selectively

e44c574 ci: always prune all dangling bitcoin-ci-test images (stickies-v)
ce16997 ci: add label to docker images (stickies-v)

Pull request description:

  Follow-up from bitcoin#27777 (comment).

  Labeling the docker images produced by the CI allows us/the user to apply batch operations to all images (including dangling ones) produced by the ci without affecting other, non-bitcoin-ci images. With labeling, we can safely always prune dangling bitcoin-ci-test images without checking for `RESTART_CI_DOCKER_BEFORE_RUN`, which we enable on our persistent runners.

ACKs for top commit:
  fanquake:
    utACK e44c574

Tree-SHA512: 1009fb1be78fbc80b5341ba92eac2991e77d050e1ab6048d1d9a65af73413a6be7afc1e1c764eb3f347f363af31245b93fdb38f6ac016d775aad4a0f36e4c98f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants