You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a long running operation it seems (refreshing many images?):
$ lxc operation list; TZ=UTC date
+--------------------------------------+------+-----------------+---------+------------+----------------------+
| ID | TYPE | DESCRIPTION | STATUS | CANCELABLE | CREATED |
+--------------------------------------+------+-----------------+---------+------------+----------------------+
| 87f6266a-1559-4e68-a04b-144aa48c3a5b | TASK | Updating images | RUNNING | NO | 2024/05/08 16:58 UTC |
+--------------------------------------+------+-----------------+---------+------------+----------------------+
Wed May 8 17:29:01 UTC 2024
And a VM (v1) running.
If I try to stop LXD (sudo snap stop lxd) it seems that LXD will wait ~5 minutes for the operation to complete before asking the instance to stop. This grace period waiting for the operation(s) to finish is taking away a lot of time for the instance(s) to do an orderly shutdown. The snap stop logic is:
# Shutdown the daemons
## LXD
echo "=> Stopping LXD (with instance shutdown)"
echo host-shutdown > "${SNAP_COMMON}/state"
if [ -n "${PID}" ] && kill -0 "${PID}" 2>/dev/null; then
if ! kill -30 "${PID}"; then
echo "==> Failed to signal LXD to shutdown"
fi
DEAD=0
for _ in $(seq 540); do
if ! kill -0 "${PID}" 2>/dev/null; then
DEAD=1
echo "==> Stopped LXD"
break
fi
sleep 1
done
if [ "${DEAD}" = "0" ]; then
echo "==> Forcefully stopping LXD after 9 minutes wait"
if kill -9 "${PID}" 2>/dev/null; then
echo "==> Stopped LXD"
else
echo "==> Failed to stop LXD"
fi
fi
fi
So after sending signal 30, we have 540 seconds (9 minutes) before a hard kill (9) is sent. Waiting 5 minutes before asking the instances to cleanly stop means they have to do so inside of ~4 minutes. This is likely too little if there are many Windows VMs potentially applying Windows Updates during shutdown (=> CPU/IO congestion).
Here is the lxc monitor --pretty during the stop:
INFO [2024-05-08T13:30:26-04:00] Received signal signal="power failure"
INFO [2024-05-08T13:30:26-04:00] Starting shutdown sequence signal="power failure"
ERROR [2024-05-08T13:30:26-04:00] Failed updating images err="context canceled"
INFO [2024-05-08T13:30:26-04:00] Waiting for 1 operation(s) to finish
INFO [2024-05-08T13:31:26-04:00] Waiting for 1 operation(s) to finish
INFO [2024-05-08T13:32:26-04:00] Waiting for 1 operation(s) to finish
INFO [2024-05-08T13:33:26-04:00] Waiting for 1 operation(s) to finish
INFO [2024-05-08T13:34:26-04:00] Waiting for 1 operation(s) to finish
INFO [2024-05-08T13:35:26-04:00] Stopping daemon storage volumes
DEBUG [2024-05-08T13:35:26-04:00] UnmountCustomVolume started driver=zfs pool=default project=default volName=backups
DEBUG [2024-05-08T13:35:26-04:00] UnmountCustomVolume finished driver=zfs pool=default project=default volName=backups
DEBUG [2024-05-08T13:35:26-04:00] Unmounted ZFS dataset dev=default/custom/default_backups driver=zfs path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_backups pool=default volName=default_backups
DEBUG [2024-05-08T13:35:26-04:00] UnmountCustomVolume started driver=zfs pool=default project=default volName=images
DEBUG [2024-05-08T13:35:26-04:00] Failed to unmount attempt=0 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:26-04:00] Failed to unmount attempt=1 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:27-04:00] Failed to unmount attempt=2 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:27-04:00] Failed to unmount attempt=3 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:28-04:00] Failed to unmount attempt=4 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:28-04:00] Failed to unmount attempt=5 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:29-04:00] Failed to unmount attempt=6 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:29-04:00] Failed to unmount attempt=7 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:30-04:00] Failed to unmount attempt=8 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:30-04:00] Failed to unmount attempt=9 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:31-04:00] Failed to unmount attempt=10 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:31-04:00] Failed to unmount attempt=11 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:32-04:00] Failed to unmount attempt=12 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:32-04:00] Failed to unmount attempt=13 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:33-04:00] Failed to unmount attempt=14 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:33-04:00] Failed to unmount attempt=15 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:34-04:00] Failed to unmount attempt=16 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:34-04:00] Failed to unmount attempt=17 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:35-04:00] Failed to unmount attempt=18 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
DEBUG [2024-05-08T13:35:35-04:00] Failed to unmount attempt=19 err="device or resource busy" path=/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images
ERROR [2024-05-08T13:35:36-04:00] Failed to unmount image and backup volumes err="Failed to unmount images storage: Failed to unmount storage volume \"default/images\": Failed to unmount \"/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images\": device or resource busy"
DEBUG [2024-05-08T13:35:36-04:00] UnmountCustomVolume finished driver=zfs pool=default project=default volName=images
DEBUG [2024-05-08T13:35:36-04:00] Shutdown started instance=v1 instanceType=virtual-machine project=default timeout=30s
DEBUG [2024-05-08T13:35:36-04:00] Instance operation lock created action=stop instance=v1 project=default reusable=true
DEBUG [2024-05-08T13:35:36-04:00] Shutdown request sent to instance instance=v1 instanceType=virtual-machine project=default
DEBUG [2024-05-08T13:35:37-04:00] Waiting for VM process to finish instance=v1 instanceType=virtual-machine project=default
DEBUG [2024-05-08T13:35:37-04:00] Instance stopped instance=v1 instanceType=virtual-machine project=default reason=guest-shutdown target=stop
DEBUG [2024-05-08T13:35:37-04:00] onStop hook started instance=v1 instanceType=virtual-machine project=default target=stop
DEBUG [2024-05-08T13:35:37-04:00] Instance operation lock inherited for stop action=stop instance=v1 instanceType=virtual-machine project=default
DEBUG [2024-05-08T13:35:37-04:00] QMP monitor stopped path=/var/snap/lxd/common/lxd/logs/v1/qemu.monitor
DEBUG [2024-05-08T13:35:37-04:00] VM process finished instance=v1 instanceType=virtual-machine project=default
DEBUG [2024-05-08T13:35:37-04:00] Stopping device device=root instance=v1 instanceType=virtual-machine project=default type=disk
DEBUG [2024-05-08T13:35:37-04:00] Stopping device device=eth0 instance=v1 instanceType=virtual-machine project=default type=nic
DEBUG [2024-05-08T13:35:37-04:00] UnmountInstance started driver=zfs instance=v1 pool=default project=default
DEBUG [2024-05-08T13:35:37-04:00] Unmounted ZFS dataset dev=default/virtual-machines/v1 driver=zfs path=/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/v1 pool=default volName=v1
DEBUG [2024-05-08T13:35:37-04:00] UnmountInstance finished driver=zfs instance=v1 pool=default project=default
DEBUG [2024-05-08T13:35:37-04:00] Deactivated ZFS volume dev=default/virtual-machines/v1.block driver=zfs pool=default volName=v1
DEBUG [2024-05-08T13:35:38-04:00] onStop hook finished instance=v1 instanceType=virtual-machine project=default target=stop
DEBUG [2024-05-08T13:35:38-04:00] Shutdown finished instance=v1 instanceType=virtual-machine project=default timeout=30s
DEBUG [2024-05-08T13:35:38-04:00] Instance operation lock finished action=stop err="<nil>" instance=v1 project=default reusable=true
INFO [2024-05-08T13:35:38-04:00] Action: instance-shutdown, Source: /1.0/instances/v1
INFO [2024-05-08T13:35:38-04:00] Stopping networks
DEBUG [2024-05-08T13:35:38-04:00] Stop driver=bridge network=juju0 project=default
DEBUG [2024-05-08T13:35:38-04:00] Deleting firewall driver=bridge network=juju0 project=default
DEBUG [2024-05-08T13:35:38-04:00] Stop driver=bridge network=lxdbr0 project=default
DEBUG [2024-05-08T13:35:38-04:00] Deleting firewall driver=bridge network=lxdbr0 project=default
INFO [2024-05-08T13:35:38-04:00] Stopping storage pools
DEBUG [2024-05-08T13:35:38-04:00] Unmount started driver=zfs pool=default
ERROR [2024-05-08T13:35:38-04:00] Unable to unmount storage pool err="Failed to run: zpool export default: exit status 1 (cannot unmount '/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images': pool or dataset is busy)" pool=default
DEBUG [2024-05-08T13:35:38-04:00] Unmount finished driver=zfs pool=default
DEBUG [2024-05-08T13:35:38-04:00] Unmount started driver=btrfs pool=juju-btrfs
DEBUG [2024-05-08T13:35:38-04:00] Unmount finished driver=btrfs pool=juju-btrfs
DEBUG [2024-05-08T13:35:38-04:00] Unmount started driver=zfs pool=juju-zfs
DEBUG [2024-05-08T13:35:39-04:00] Cancel ongoing or future gRPC connection attempts
DEBUG [2024-05-08T13:35:39-04:00] Unmount finished driver=zfs pool=juju-zfs
INFO [2024-05-08T13:35:39-04:00] Closing the database
INFO [2024-05-08T13:35:39-04:00] Stop database gateway
INFO [2024-05-08T13:35:39-04:00] Unmounting temporary filesystems
INFO [2024-05-08T13:35:39-04:00] Closing socket socket=/var/snap/lxd/common/lxd/devlxd/sock type="devlxd socket"
INFO [2024-05-08T13:35:39-04:00] Closing socket socket="host(2):24419" type="VM socket"
INFO [2024-05-08T13:35:39-04:00] Done unmounting temporary filesystems
INFO [2024-05-08T13:35:39-04:00] Closing socket socket=/var/snap/lxd/common/lxd/unix.socket type="REST API Unix socket"
Error: websocket: close 1006 (abnormal closure): unexpected EOF
$ journalctl -fu snap.lxd.daemon
...
May 08 13:30:26 sdeziel-lemur systemd[1]: Stopping Service for snap application lxd.daemon...
May 08 13:30:26 sdeziel-lemur lxd.daemon[2568375]: => Stop reason is: host shutdown
May 08 13:30:26 sdeziel-lemur lxd.daemon[2568375]: => Stopping LXD (with instance shutdown)
May 08 13:30:26 sdeziel-lemur lxd.daemon[2564894]: time="2024-05-08T13:30:26-04:00" level=error msg="Failed updating images" err="context canceled"
May 08 13:35:36 sdeziel-lemur lxd.daemon[2564894]: time="2024-05-08T13:35:36-04:00" level=error msg="Failed to unmount image and backup volumes" err="Failed to unmount images storage: Failed to unmount storage volume \"default/images\": Failed to unmount \"/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images\": device or resource busy"
May 08 13:35:38 sdeziel-lemur lxd.daemon[2564894]: time="2024-05-08T13:35:38-04:00" level=error msg="Unable to unmount storage pool" err="Failed to run: zpool export default: exit status 1 (cannot unmount '/var/snap/lxd/common/lxd/storage-pools/default/custom/default_images': pool or dataset is busy)" pool=default
May 08 13:35:39 sdeziel-lemur lxd.daemon[2564746]: => LXD exited cleanly
May 08 13:35:39 sdeziel-lemur lxd.daemon[2568375]: ==> Stopped LXD
May 08 13:35:39 sdeziel-lemur lxd.daemon[2568375]: => Stopping LXCFS
May 08 13:35:39 sdeziel-lemur lxd.daemon[2564882]: Running destructor lxcfs_exit
May 08 13:35:39 sdeziel-lemur lxd.daemon[2568375]: ==> Stopped LXCFS
May 08 13:35:39 sdeziel-lemur lxd.daemon[2568375]: => Cleaning up PID files
May 08 13:35:39 sdeziel-lemur lxd.daemon[2568375]: => Cleaning up namespaces
May 08 13:35:39 sdeziel-lemur lxd.daemon[2568375]: => All done
May 08 13:35:39 sdeziel-lemur systemd[1]: snap.lxd.daemon.service: Deactivated successfully.
May 08 13:35:39 sdeziel-lemur systemd[1]: Stopped Service for snap application lxd.daemon.
Note: the umount issue is probably unrelated, except maybe for causing the refresh operation to take ~forever.
Additional information:
$ snap list lxd
Name Version Rev Tracking Publisher Notes
lxd git-696b610 28614 latest/edge canonical✓ -
The text was updated successfully, but these errors were encountered:
I have a long running operation it seems (refreshing many images?):
And a VM (
v1
) running.If I try to stop LXD (
sudo snap stop lxd
) it seems that LXD will wait ~5 minutes for the operation to complete before asking the instance to stop. This grace period waiting for the operation(s) to finish is taking away a lot of time for the instance(s) to do an orderly shutdown. The snap stop logic is:So after sending signal
30
, we have 540 seconds (9 minutes) before a hard kill (9
) is sent. Waiting 5 minutes before asking the instances to cleanly stop means they have to do so inside of ~4 minutes. This is likely too little if there are many Windows VMs potentially applying Windows Updates during shutdown (=> CPU/IO congestion).Here is the
lxc monitor --pretty
during the stop:Note: the
umount
issue is probably unrelated, except maybe for causing the refresh operation to take ~forever.Additional information:
The text was updated successfully, but these errors were encountered: