Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman cleanup isn't complete- copy of email from pat tovo #54

Closed
lastephey opened this issue Apr 20, 2023 · 0 comments
Closed

podman cleanup isn't complete- copy of email from pat tovo #54

lastephey opened this issue Apr 20, 2023 · 0 comments

Comments

@lastephey
Copy link
Collaborator

Copy of 3/31/23 email from Pat Tovo

Hi everyone. I'm following up on the issue that I found with the fuse-overlayfs-wrap

script. I was using a VM for testing when I found the issue, and, tried the same thing on

a real system, and, didn't reproduce it. But, it is a timing issue and could still occur.

The issue occurs when running a podman container using a squashed image where the umount call

in the fuse-overlayfs-wrap script fails. There is no indication to the user

that the umount fails. I only noticed it by seeing squashfuse processes that don't exit. Since I was running many "podman run" commands in a loop, I saw a large number of these processes and

figured out that they were still there because the "umount" failed.

podtst@standard:~> !pod

podman run --userns=keep-id --rm hello hello.sh

Hello World!

podtst@standard:~> ps -ef |grep squash

podtst 9331 1 0 20:40 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

podtst 9399 9113 0 20:41 pts/3 00:00:00 grep squash

podtst@standard:~> !podman

podman run --userns=keep-id --rm hello hello.sh

Hello World!

podtst@standard:~> !ps

ps -ef |grep squash

podtst 9331 1 0 20:40 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

podtst 9425 1 0 20:41 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

podtst 9480 9113 0 20:41 pts/3 00:00:00 grep squash

I turned on logging in /usr/bin/fuse-overlayfs-wrap but there are no errors for the umount:

standard:/tmp # cat fow-0.log

In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash

Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

So, I changed /usr/bin/fuse-overlayfs-wrap to log the output from umount:

from:

umount $3

to:

umount -v $3 >> $LOG 2>&1

podtst@standard:~> !pod

podman run --userns=keep-id --rm hello hello.sh

Hello World!

standard:/tmp # cat fow-0.log

In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash

Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash

Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

umount: /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX: target is busy.

I changed the fuse-overlayfs-wrap to check for the return code from umount, and repeat

after a sleep. This fixes the issue (it only needs to sleep once for 1 sec on my VM).

This is the code that I used:


UMOUNT_WAIT_RETRIES=${UMOUNT_WAIT_RETRIES:-"5"}

UMOUNT_WAIT_DELAY=${UMOUNT_WAIT_DELAY:-"1"}

if [ "$1" = "wait" ] ; then

inotifywait -e delete $2/etc

for i in $(seq $UMOUNT_WAIT_RETRIES); do

    umount -v $3 >> $LOG 2>&1

    if [ $? -ne 0 ]; then

        # Sleep to let podman clean up

        echo "Retry umount after sleep $UMOUNT_WAIT_DELAY second(s)" >> $LOG

        sleep $UMOUNT_WAIT_DELAY

    else

        break

    fi

done

exit

fi


If the umount never happens and you manually kill the squashfuse processes, you see errors like:

podman run --userns=keep-id --rm hello hello.sh

Error: error creating container storage: creating read-write layer with ID "f9be212358e76bde48c1115fcf54b53f354b79d061544ee5cc81eda6b18e43cb": Stat /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff: transport endpoint is not connected

and you need to:

rmdir /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff

mkdir /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff

Or do:

podman system reset

Just wanted to make you aware of this in case you run into it.

Regards,

Pat Tovo

patricia.tovo@hpe.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant