Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LXD 5.21.1 Failed to ensure schema: not an error #13326

Closed
6 tasks
chuegel opened this issue Apr 14, 2024 · 22 comments
Closed
6 tasks

LXD 5.21.1 Failed to ensure schema: not an error #13326

chuegel opened this issue Apr 14, 2024 · 22 comments
Assignees
Labels
5.21 LTS Bug Confirmed to be a bug
Milestone

Comments

@chuegel
Copy link

chuegel commented Apr 14, 2024

Required information

  • Distribution: Ubuntu
  • Distribution version: 18.04
  • The output of "snap list --all lxd core20 core22 core24 snapd":
snap list --all lxd core20 core22 core24 snapd
Name    Version         Rev    Tracking       Publisher   Notes
core20  20240111        2182   latest/stable  canonical✓  base,disabled
core20  20240227        2264   latest/stable  canonical✓  base
core22  20231123        1033   latest/stable  canonical✓  base,disabled
core22  20240111        1122   latest/stable  canonical✓  base
lxd     5.20-f3dd836    27049  latest/stable  canonical✓  disabled
lxd     5.21.1-43998c6  28155  latest/stable  canonical✓  -
snapd   2.61.1          20671  latest/stable  canonical✓  snapd,disabled
snapd   2.61.2          21184  latest/stable  canonical✓  snapd
  • The output of "lxc info" or if that fails:
    • Kernel version: 4.15.0-209-generic
    • LXC version: 5.21.1 LTS
    • LXD version: 5.21.1 LTS
    • Storage backend in use: BTRFS

Issue description

Since the last update I get on any lxc commands following error:

Error: LXD unix socket not accessible: Get "http://unix.socket/1.0": EOF

The containers itself are running as I can reach the services behind them.

Below are some logs collected from the server:

cat /var/snap/lxd/common/lxd/logs/lxd.log
time="2024-04-14T16:59:23+02:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"

syslog

Apr 14 17:03:09 bionic systemd[1]: Started Service for snap application lxd.daemon.
Apr 14 17:03:09 bionic lxd.daemon[31828]: => Preparing the system (28155)
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Loading snap configuration
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Setting up mntns symlink (mnt:[4026532607])
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Setting up kmod wrapper
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Preparing /boot
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Preparing a clean copy of /run
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Preparing /run/bin
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Preparing a clean copy of /etc
Apr 14 17:03:09 bionic lxd.daemon[31828]: ==> Preparing a clean copy of /usr/share/misc
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Setting up ceph configuration
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Setting up LVM configuration
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Setting up OVN configuration
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Rotating logs
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Unsupported ZFS version (0.7)
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Escaping the systemd cgroups
Apr 14 17:03:10 bionic lxd.daemon[31828]: ====> Detected cgroup V1
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Escaping the systemd process resource limits
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Enabling LXD UI
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Exposing LXD documentation
Apr 14 17:03:10 bionic lxd.daemon[31828]: => Re-using existing LXCFS
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Reloading LXCFS
Apr 14 17:03:10 bionic lxd.daemon[31828]: ==> Cleaning up existing LXCFS namespace
Apr 14 17:03:10 bionic lxd.daemon[31828]: => Starting LXD
Apr 14 17:03:10 bionic lxd.daemon[31828]: time="2024-04-14T17:03:10+02:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
Apr 14 17:03:11 bionic kernel: [22395832.644410] BTRFS warning (device dm-0): direct IO failed ino 615 rw 1,2131969 sector 0x24c96fb0 len 40960 err no 10
Apr 14 17:03:11 bionic kernel: [22395832.644424] BTRFS warning (device dm-0): direct IO failed ino 615 rw 1,2131969 sector 0x24c97000 len 65536 err no 10
Apr 14 17:03:11 bionic kernel: [22395832.644429] BTRFS warning (device dm-0): direct IO failed ino 615 rw 1,2131969 sector 0x24c97080 len 8192 err no 10
Apr 14 17:03:11 bionic lxd.daemon[31828]: time="2024-04-14T17:03:11+02:00" level=error msg="Failed to start the daemon" err="Failed to initialize global database: failed to ensure schema: Failed to ensure schema: not an error"
Apr 14 17:03:12 bionic lxd.daemon[31828]: Error: Failed to initialize global database: failed to ensure schema: Failed to ensure schema: not an error
Apr 14 17:03:12 bionic lxd.user-daemon[31734]: Error: Unable to connect to LXD: Get "http://unix.socket/1.0": EOF
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.user-daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.user-daemon.service: Failed with result 'exit-code'.
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.user-daemon.service: Service hold-off time over, scheduling restart.
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.user-daemon.service: Scheduled restart job, restart counter is at 253.
Apr 14 17:03:12 bionic systemd[1]: Stopped Service for snap application lxd.user-daemon.
Apr 14 17:03:12 bionic systemd[1]: Started Service for snap application lxd.user-daemon.
Apr 14 17:03:12 bionic lxd.daemon[31828]: Killed
Apr 14 17:03:12 bionic lxd.daemon[31828]: => LXD failed to start
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.daemon.service: Service hold-off time over, scheduling restart.
Apr 14 17:03:12 bionic systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 253.
Apr 14 17:03:12 bionic systemd[1]: Stopped Service for snap application lxd.daemon.

Steps to reproduce

Not sure

Information to attach

  • Any relevant kernel output (dmesg)
  • Container log (lxc info NAME --show-log)
  • Container configuration (lxc config show NAME --expanded)
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)
@tomponline
Copy link
Member

tomponline commented Apr 15, 2024

Hi,

I've taken a look in the LXD source code and cannot see "not an error" as an error that LXD generates.

One thing that certainly looks problematic are those BTRFS errors coming from the kernel:

Apr 14 17:03:11 bionic kernel: [22395832.644410] BTRFS warning (device dm-0): direct IO failed ino 615 rw 1,2131969 sector 0x24c96fb0 len 40960 err no 10
Apr 14 17:03:11 bionic kernel: [22395832.644424] BTRFS warning (device dm-0): direct IO failed ino 615 rw 1,2131969 sector 0x24c97000 len 65536 err no 10
Apr 14 17:03:11 bionic kernel: [22395832.644429] BTRFS warning (device dm-0): direct IO failed ino 615 rw 1,2131969 sector 0x24c97080 len 8192 err no 10

I'm wondering if this is a dqlite issue with direct I/O. I'll ask the dqlite team if they have any ideas about this.

@tomponline tomponline added the Incomplete Waiting on more information from reporter label Apr 15, 2024
@chuegel
Copy link
Author

chuegel commented Apr 15, 2024

Thanks for your reply. I overlooked those BTRFS errors...Indeed that doesn't look good.
Interesting thing is that the containers are still online. Wonder if this is related to the latest kernel update I did.

@haryHuds0n
Copy link

Hi,
I am also facing the same problem in this issue now
The log I got is exactly the same

level=error msg="Failed to start the daemon" err="Failed to initialize global database: failed to ensure schema: Failed to ensure schema: not an error"

LXD and services in container are still running after I restart LXD several time, but I can't do anything else. That's odd.

@roosterfish
Copy link
Contributor

@haryHuds0n do you also see those BTRFS related errors? Which version of Ubuntu (and kernel) are you using?

@haryHuds0n
Copy link

haryHuds0n commented Apr 15, 2024

@haryHuds0n do you also see those BTRFS related errors? Which version of Ubuntu (and kernel) are you using?

Thanks for your reply,

No, I don't see those BTRFS related errors.
The error log line I have only the above line
FYI, I am using Ubuntu 20.04.6 LTS and kernel version 5.4.0-40-generic.

@haryHuds0n
Copy link

haryHuds0n commented Apr 16, 2024

UPDATE
After performing an update and upgrading the kernel, as well as installing the HWE kernel, my LXD is now functioning normally. Here is the link to the discourse I referenced.
Current kernel version 5.15.0-102-generic.

@tomponline tomponline changed the title Error: Unable to connect to LXD: Get "http://unix.socket/1.0": EOF LXD 5.21.1 Failed to ensure schema: not an error Apr 16, 2024
@tomponline
Copy link
Member

If you have further issues please can you post over in the forum https://discourse.ubuntu.com/c/lxd/support/149

Thanks

@chuegel
Copy link
Author

chuegel commented Apr 20, 2024

Unfortunately I cannot create new topics in the forum.
I installed the HWE kernel with
apt-get install --install-recommends linux-generic-hwe-18.04

uname -r
5.4.0-150-generic

but the error still persists. Now the containers are not even starting.

@tomponline tomponline reopened this Apr 20, 2024
@tomponline
Copy link
Member

tomponline commented Apr 20, 2024

@cole-miller is there a fix we can apply to the dqlite build in lxd to avoid this error on bionic 4.15 kernel?

@tomponline
Copy link
Member

@tomponline
Copy link
Member

Unfortunately I cannot create new topics in the forum. I installed the HWE kernel with apt-get install --install-recommends linux-generic-hwe-18.04

uname -r
5.4.0-150-generic

but the error still persists. Now the containers are not even starting.

As it doesn’t appear that the schema changes have been applied you could try switching to the 5.20/stable channel using:

sudo snap refresh lxd --channel=5.20/stable

@chuegel
Copy link
Author

chuegel commented Apr 20, 2024

Downgrading to 5.20 worked like a charm. Also the BTRFS errors are gone.
Thanks

@tomponline tomponline added Bug Confirmed to be a bug 5.21 LTS and removed Incomplete Waiting on more information from reporter labels Apr 20, 2024
@tomponline tomponline added this to the lxd-6.1 milestone Apr 20, 2024
@tomponline
Copy link
Member

@MggMuggins if you have any time would be great to see if we can get a reproducer for this one.

@tomponline
Copy link
Member

@cole-miller
Copy link
Contributor

Hi! If you're affected by this issue, you can help us troubleshoot by following the instructions here: https://discourse.ubuntu.com/t/lxd-5-21-1-lts-has-been-released/43823/67. Thanks!

@tomponline
Copy link
Member

Hi @chuegel @haryHuds0n

Would you mind trying to refresh to 5.21/edge channel, this has a reverted dqlite version in it (canonical/dqlite#641) and we would like to see if this fixes the issue for you. We've not been able to reproduce the issue locally, even using Bionic on BTRFS.

If that works you can remain on 5.21/edge until we have fixed 5.21/stable (we will let you know).

The snap revision you want is git-d15f111.

sudo snap refresh lxd --channel=5.21/edge

@chuegel
Copy link
Author

chuegel commented Apr 25, 2024

Hi @tomponline ,
As we already copied all container to a new production server (Ubuntu 22.04 with ZFS backend), I can do the tests later this day.

@ValdikSS
Copy link

ValdikSS commented Apr 25, 2024

@haryHuds0n do you also see those BTRFS related errors? Which version of Ubuntu (and kernel) are you using?

I'm not @haryHuds0n, but I was hit with this issue on Ubuntu 20.04.6 Focal, with 5.4.0-177-generic kernel. Rootfs is btrfs.
The kernel log was spammed with:

[9688106.385766] new mount options do not match the existing superblock, will be ignored
[9688125.318143] new mount options do not match the existing superblock, will be ignored
[9688129.778376] new mount options do not match the existing superblock, will be ignored
[9688134.244518] proc: Bad value for 'hidepid'
[9688135.519693] new mount options do not match the existing superblock, will be ignored
[9688139.266611] new mount options do not match the existing superblock, will be ignored
[9688145.013936] new mount options do not match the existing superblock, will be ignored
[9688150.900361] new mount options do not match the existing superblock, will be ignored
[9688155.654297] new mount options do not match the existing superblock, will be ignored
[9688161.267383] new mount options do not match the existing superblock, will be ignored
[9688166.514346] new mount options do not match the existing superblock, will be ignored
[9688172.289809] new mount options do not match the existing superblock, will be ignored

journalctl:

Apr 25 22:08:14 homenuc systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 22:08:14 homenuc systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Apr 25 22:08:14 homenuc systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 16.
Apr 25 22:08:14 homenuc systemd[1]: Stopped Service for snap application lxd.daemon.
Apr 25 22:08:14 homenuc systemd[1]: Started Service for snap application lxd.daemon.
Apr 25 22:08:15 homenuc lxd.daemon[427258]: => Preparing the system (28322)
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Loading snap configuration
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Setting up mntns symlink (mnt:[4026532725])
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Setting up kmod wrapper
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Preparing /boot
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Preparing a clean copy of /run
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Preparing /run/bin
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Preparing a clean copy of /etc
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Preparing a clean copy of /usr/share/misc
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Setting up ceph configuration
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Setting up LVM configuration
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Setting up OVN configuration
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Rotating logs
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Unsupported ZFS version (0.8)
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Escaping the systemd cgroups
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ====> Detected cgroup V1
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Escaping the systemd process resource limits
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Enabling LXD UI
Apr 25 22:08:15 homenuc lxd.daemon[427258]: ==> Exposing LXD documentation
Apr 25 22:08:15 homenuc lxd.daemon[427168]: Running destructor lxcfs_exit
Apr 25 22:08:15 homenuc lxd.daemon[427258]: => Starting LXCFS
Apr 25 22:08:15 homenuc lxd.daemon[427409]: Starting LXCFS at lxcfs
Apr 25 22:08:15 homenuc lxd.daemon[427409]: Running constructor lxcfs_init to reload liblxcfs
Apr 25 22:08:15 homenuc lxd.daemon[427409]: mount namespace: 4
Apr 25 22:08:15 homenuc lxd.daemon[427409]: hierarchies:
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   0: fd:   5:
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   1: fd:   6: name=systemd
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   2: fd:   7: net_cls,net_prio
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   3: fd:   8: devices
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   4: fd:   9: blkio
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   5: fd:  10: pids
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   6: fd:  11: cpuset
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   7: fd:  12: cpu,cpuacct
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   8: fd:  13: memory
Apr 25 22:08:15 homenuc lxd.daemon[427409]:   9: fd:  14: freezer
Apr 25 22:08:15 homenuc lxd.daemon[427409]:  10: fd:  15: perf_event
Apr 25 22:08:15 homenuc kernel: new mount options do not match the existing superblock, will be ignored
Apr 25 22:08:15 homenuc lxd.daemon[427409]:  11: fd:  16: hugetlb
Apr 25 22:08:15 homenuc lxd.daemon[427409]:  12: fd:  17: rdma
Apr 25 22:08:15 homenuc lxd.daemon[427409]: Kernel supports pidfds
Apr 25 22:08:15 homenuc lxd.daemon[427409]: Kernel does not support swap accounting
Apr 25 22:08:15 homenuc lxd.daemon[427409]: api_extensions:
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - cgroups
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - sys_cpu_online
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_cpuinfo
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_diskstats
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_loadavg
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_meminfo
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_stat
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_swaps
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_uptime
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - proc_slabinfo
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - shared_pidns
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - cpuview_daemon
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - loadavg_daemon
Apr 25 22:08:15 homenuc lxd.daemon[427409]: - pidfds
Apr 25 22:08:16 homenuc lxd.daemon[427258]: => Starting LXD
Apr 25 22:08:16 homenuc lxd.daemon[427426]: time="2024-04-25T22:08:16+03:00" level=warning msg=" - Couldn't find the CGroup blkio.weight, disk priority will be ignored"
Apr 25 22:08:16 homenuc lxd.daemon[427426]: time="2024-04-25T22:08:16+03:00" level=warning msg=" - Couldn't find the CGroup memory swap accounting, swap limits will be ignored"
Apr 25 22:08:17 homenuc lxd.daemon[427426]: time="2024-04-25T22:08:17+03:00" level=error msg="Failed to start the daemon" err="Failed to initialize global database: failed to ensure schema: Failed to ensure schema: not an error"
Apr 25 22:08:17 homenuc lxd.daemon[427426]: Error: Failed to initialize global database: failed to ensure schema: Failed to ensure schema: not an error
Apr 25 22:08:18 homenuc lxd.daemon[427258]: Killed
Apr 25 22:08:18 homenuc lxd.daemon[427258]: => LXD failed to start
Apr 25 22:08:18 homenuc systemd[1]: snap.lxd.daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 22:08:18 homenuc systemd[1]: snap.lxd.daemon.service: Failed with result 'exit-code'.
Apr 25 22:08:18 homenuc systemd[1]: snap.lxd.daemon.service: Scheduled restart job, restart counter is at 17.

@chuegel
Copy link
Author

chuegel commented Apr 25, 2024

Hi @tomponline,

snap list --all lxd core20 core22 core24 snapd
Name    Version       Rev    Tracking       Publisher   Notes
core20  20240111      2182   latest/stable  canonical✓  base,disabled
core20  20240227      2264   latest/stable  canonical✓  base
core22  20231123      1033   latest/stable  canonical✓  base,disabled
core22  20240111      1122   latest/stable  canonical✓  base
lxd     5.20-f3dd836  27049  5.21/edge      canonical✓  disabled
lxd     git-d15f111   28446  5.21/edge      canonical✓  -
snapd   2.61.2        21184  latest/stable  canonical✓  snapd,disabled
snapd   2.62          21465  latest/stable  canonical✓  snapd

container start just fine. No BTRFS errors logged.

@tomponline
Copy link
Member

container start just fine. No BTRFS errors logged.

Excellent. Good news @cole-miller , so whats the next step, do you want to keep the revert in dqlite and do a release or do you need to develop it further to address the reason it was removed in the first place?

@tomponline
Copy link
Member

Apr 25 22:08:17 homenuc lxd.daemon[427426]: time="2024-04-25T22:08:17+03:00" level=error msg="Failed to start the daemon" err="Failed to initialize global database: failed to ensure schema: Failed to ensure schema: not an error"

@ValdikSS please can you try doing sudo snap refresh lxd --channel=5.21/edge and let us know if it fixes it.

@tomponline
Copy link
Member

The revert to dqlite has been pushed to the 5.21/stable snap channel now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5.21 LTS Bug Confirmed to be a bug
Projects
None yet
Development

No branches or pull requests

7 participants