podman cleanup isn't complete- copy of email from pat tovo #54

lastephey · 2023-04-20T15:34:27Z

Copy of 3/31/23 email from Pat Tovo

Hi everyone. I'm following up on the issue that I found with the fuse-overlayfs-wrap

script. I was using a VM for testing when I found the issue, and, tried the same thing on

a real system, and, didn't reproduce it. But, it is a timing issue and could still occur.

The issue occurs when running a podman container using a squashed image where the umount call

in the fuse-overlayfs-wrap script fails. There is no indication to the user

that the umount fails. I only noticed it by seeing squashfuse processes that don't exit. Since I was running many "podman run" commands in a loop, I saw a large number of these processes and

figured out that they were still there because the "umount" failed.

podtst@standard:~> !pod

podman run --userns=keep-id --rm hello hello.sh

Hello World!

podtst@standard:~> ps -ef |grep squash

podtst 9331 1 0 20:40 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

podtst 9399 9113 0 20:41 pts/3 00:00:00 grep squash

podtst@standard:~> !podman

podman run --userns=keep-id --rm hello hello.sh

Hello World!

podtst@standard:~> !ps

ps -ef |grep squash

podtst 9331 1 0 20:40 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

podtst 9425 1 0 20:41 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

podtst 9480 9113 0 20:41 pts/3 00:00:00 grep squash

I turned on logging in /usr/bin/fuse-overlayfs-wrap but there are no errors for the umount:

standard:/tmp # cat fow-0.log

In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash

Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

So, I changed /usr/bin/fuse-overlayfs-wrap to log the output from umount:

from:

umount $3

to:

umount -v $3 >> $LOG 2>&1

podtst@standard:~> !pod

podman run --userns=keep-id --rm hello hello.sh

Hello World!

standard:/tmp # cat fow-0.log

In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash

Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash

Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX

umount: /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX: target is busy.

I changed the fuse-overlayfs-wrap to check for the return code from umount, and repeat

after a sleep. This fixes the issue (it only needs to sleep once for 1 sec on my VM).

This is the code that I used:

UMOUNT_WAIT_RETRIES=${UMOUNT_WAIT_RETRIES:-"5"}

UMOUNT_WAIT_DELAY=${UMOUNT_WAIT_DELAY:-"1"}

if [ "$1" = "wait" ] ; then

inotifywait -e delete $2/etc

for i in $(seq $UMOUNT_WAIT_RETRIES); do

    umount -v $3 >> $LOG 2>&1

    if [ $? -ne 0 ]; then

        # Sleep to let podman clean up

        echo "Retry umount after sleep $UMOUNT_WAIT_DELAY second(s)" >> $LOG

        sleep $UMOUNT_WAIT_DELAY

    else

        break

    fi

done

exit

fi

If the umount never happens and you manually kill the squashfuse processes, you see errors like:

podman run --userns=keep-id --rm hello hello.sh

Error: error creating container storage: creating read-write layer with ID "f9be212358e76bde48c1115fcf54b53f354b79d061544ee5cc81eda6b18e43cb": Stat /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff: transport endpoint is not connected

and you need to:

rmdir /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff

mkdir /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff

Or do:

podman system reset

Just wanted to make you aware of this in case you run into it.

Regards,

Pat Tovo

patricia.tovo@hpe.com

The text was updated successfully, but these errors were encountered:

This was referenced May 10, 2023

add pat's suggested fuse wrapper fix #62

Merged

Interactive job: cannot restart service after SIGINT sent to process #55

Closed

lastephey closed this as completed Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman cleanup isn't complete- copy of email from pat tovo #54

podman cleanup isn't complete- copy of email from pat tovo #54

lastephey commented Apr 20, 2023

podman cleanup isn't complete- copy of email from pat tovo #54

podman cleanup isn't complete- copy of email from pat tovo #54

Comments

lastephey commented Apr 20, 2023