You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone. I'm following up on the issue that I found with the fuse-overlayfs-wrap
script. I was using a VM for testing when I found the issue, and, tried the same thing on
a real system, and, didn't reproduce it. But, it is a timing issue and could still occur.
The issue occurs when running a podman container using a squashed image where the umount call
in the fuse-overlayfs-wrap script fails. There is no indication to the user
that the umount fails. I only noticed it by seeing squashfuse processes that don't exit. Since I was running many "podman run" commands in a loop, I saw a large number of these processes and
figured out that they were still there because the "umount" failed.
I turned on logging in /usr/bin/fuse-overlayfs-wrap but there are no errors for the umount:
standard:/tmp # cat fow-0.log
In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash
Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
So, I changed /usr/bin/fuse-overlayfs-wrap to log the output from umount:
from:
umount $3
to:
umount -v $3 >> $LOG 2>&1
podtst@standard:~> !pod
podman run --userns=keep-id --rm hello hello.sh
Hello World!
standard:/tmp # cat fow-0.log
In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash
Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash
Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
umount: /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX: target is busy.
I changed the fuse-overlayfs-wrap to check for the return code from umount, and repeat
after a sleep. This fixes the issue (it only needs to sleep once for 1 sec on my VM).
This is the code that I used:
UMOUNT_WAIT_RETRIES=${UMOUNT_WAIT_RETRIES:-"5"}
UMOUNT_WAIT_DELAY=${UMOUNT_WAIT_DELAY:-"1"}
if [ "$1" = "wait" ] ; then
inotifywait -e delete $2/etc
for i in $(seq $UMOUNT_WAIT_RETRIES); do
umount -v $3 >> $LOG 2>&1
if [ $? -ne 0 ]; then
# Sleep to let podman clean up
echo "Retry umount after sleep $UMOUNT_WAIT_DELAY second(s)" >> $LOG
sleep $UMOUNT_WAIT_DELAY
else
break
fi
done
exit
fi
If the umount never happens and you manually kill the squashfuse processes, you see errors like:
podman run --userns=keep-id --rm hello hello.sh
Error: error creating container storage: creating read-write layer with ID "f9be212358e76bde48c1115fcf54b53f354b79d061544ee5cc81eda6b18e43cb": Stat /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff: transport endpoint is not connected
Copy of 3/31/23 email from Pat Tovo
Hi everyone. I'm following up on the issue that I found with the fuse-overlayfs-wrap
script. I was using a VM for testing when I found the issue, and, tried the same thing on
a real system, and, didn't reproduce it. But, it is a timing issue and could still occur.
The issue occurs when running a podman container using a squashed image where the umount call
in the fuse-overlayfs-wrap script fails. There is no indication to the user
that the umount fails. I only noticed it by seeing squashfuse processes that don't exit. Since I was running many "podman run" commands in a loop, I saw a large number of these processes and
figured out that they were still there because the "umount" failed.
podtst@standard:~> !pod
podman run --userns=keep-id --rm hello hello.sh
Hello World!
podtst@standard:~> ps -ef |grep squash
podtst 9331 1 0 20:40 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
podtst 9399 9113 0 20:41 pts/3 00:00:00 grep squash
podtst@standard:~> !podman
podman run --userns=keep-id --rm hello hello.sh
Hello World!
podtst@standard:~> !ps
ps -ef |grep squash
podtst 9331 1 0 20:40 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
podtst 9425 1 0 20:41 ? 00:00:00 /usr/bin/squashfuse /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
podtst 9480 9113 0 20:41 pts/3 00:00:00 grep squash
I turned on logging in /usr/bin/fuse-overlayfs-wrap but there are no errors for the umount:
standard:/tmp # cat fow-0.log
In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash
Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
So, I changed /usr/bin/fuse-overlayfs-wrap to log the output from umount:
from:
to:
podtst@standard:~> !pod
podman run --userns=keep-id --rm hello hello.sh
Hello World!
standard:/tmp # cat fow-0.log
In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash
Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
In fow /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX.squash
Mount suash /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX
umount: /lus/snx11010/ptovo/podtst/storage/overlay/l/OLHH2JWX7AIUDH4ZKDZNQCYJQX: target is busy.
I changed the fuse-overlayfs-wrap to check for the return code from umount, and repeat
after a sleep. This fixes the issue (it only needs to sleep once for 1 sec on my VM).
This is the code that I used:
UMOUNT_WAIT_RETRIES=${UMOUNT_WAIT_RETRIES:-"5"}
UMOUNT_WAIT_DELAY=${UMOUNT_WAIT_DELAY:-"1"}
if [ "$1" = "wait" ] ; then
fi
If the umount never happens and you manually kill the squashfuse processes, you see errors like:
podman run --userns=keep-id --rm hello hello.sh
Error: error creating container storage: creating read-write layer with ID "f9be212358e76bde48c1115fcf54b53f354b79d061544ee5cc81eda6b18e43cb": Stat /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff: transport endpoint is not connected
and you need to:
rmdir /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff
mkdir /lus/snx11010/ptovo/podtst/storage/overlay/5ef6a27b2f4f810b3cb99bb259632877897c6f3257c15db6532f18b18189941f/diff
Or do:
podman system reset
Just wanted to make you aware of this in case you run into it.
Regards,
Pat Tovo
patricia.tovo@hpe.com
The text was updated successfully, but these errors were encountered: