-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Services (apparmor, snapd.seeded, ...?) fail to start in nested lxd container #3813
Comments
Launchpad user Dan Watkins(oddbloke) wrote on 2020-12-01T22:52:35.628306+00:00 Hi Ian, I've just launched such a container and I see a bunch of non-cloud-init errors in the log and when I examine root@certain-cod:~# systemctl list-jobs 10 jobs listed. Examining the journal, I see that systemd-logind.service is in a restart loop: root@certain-cod:~# journalctl -u systemd-logind.service | grep Failed\ w This is blocking boot before cloud-init's later stages run, so as it is correctly indicating that it hasn't yet completed, I'm marking this Invalid for cloud-init. I'll add a systemd task instead, as that looks to be the source of the issue. Cheers, Dan |
Launchpad user Dan Streetman(ddstreet) wrote on 2021-03-17T18:41:52.657003+00:00 The systemd-logind problem is due to dbus defaulting to apparmor mode 'enabled', but apparmor can't do much of anything inside a container so it fails to start, and dbus can't contact it. In the 2nd level container, create a file like '/etc/dbus-1/system.d/no-apparmor.conf' with content: Then restart the 2nd level container and recheck systemd-logind which should now work Of course, fixing dbus should be a bit smarter about only disabling its use of apparmor if it's inside a container. However, cloud-init status --wait still hangs after systemd-logind starts up, so that wasn't the original problem (or at least wasn't the only problem) |
Launchpad user Dan Watkins(oddbloke) wrote on 2021-03-17T19:26:24.013040+00:00 Given that the logind issue is an AppArmor issue and, per my previous comment, "the two running jobs are systemd-logind.service and snapd.seeded.service", I suspect that we'll find that snapd is running into similar sorts of issues. I'll take a quick look now. |
Launchpad user Ian Johnson(anonymouse67) wrote on 2021-03-17T19:51:36.237331+00:00 FWIW I know what the snapd issue is, the issue is that snapd does not and will not work in a nested LXD container, we need to add code to make snapd.seeded.service die/exit gracefully in this situation. |
Launchpad user Dan Watkins(oddbloke) wrote on 2021-03-17T19:54:48.252169+00:00 Yep, that's what I've found; cloud-init is just waiting for its later stages to run, which are blocked by snapd.seeded.service exiting. |
Launchpad user Dan Streetman(ddstreet) wrote on 2021-03-17T21:14:04.959162+00:00 it's interesting that apparmor appears to work ok in the first-level container, but fails in the nested container, e.g.: $ lxc shell lp1905493-f Mar 17 18:17:44 lp1905493-f systemd[1]: Starting Load AppArmor profiles... Mar 17 18:40:15 layer2 apparmor.systemd[147]: /sbin/apparmor_parser: Unable to replace "nvidia_modprobe". Permission denied; attempted to load a profile while confined? |
Launchpad user Dan Streetman(ddstreet) wrote on 2021-03-17T21:17:34.368658+00:00 I wonder if this is actually a problem with the specific apparmor profile that's created by lxd, maybe it doesn't provide enough permissions to allow the container's lxd to correctly pass the apparmor profile down to the nested container. Similar to how lxd locks down containers a bit too tight by default and requires enabling 'security.nesting' just to be able to create a nested container. |
Launchpad user Launchpad Janitor(janitor) wrote on 2022-06-13T17:17:37.756437+00:00 Status changed to 'Confirmed' because the bug affects multiple users. |
Launchpad user Christian Ehrhardt (paelzer) wrote on 2022-06-14T05:17:41.716650+00:00 Due to a ping on IRC I wanted to summarize the situation here as it seems this still affects people. In nested LXD container we seem to have multiple issues:
|
Launchpad user Jose Manuel Santamaria Lema(panfaust) wrote on 2022-06-18T18:58:36.314328+00:00 Hi there, thanks of the update. Just in in case anyone else here is interested in a temporary workaround, this is what I did for my use case:
After that "runlevel", "systemctl is-system-running" and "cloud-init -status --wait" should work. Last but not least, I would like to mention this issue affects running autopkgtests in nested containers (autopkgtest uses "runlevel" to detect if the container is properly started), I had an interesting conversation about this here (thanks a lot ddstreet for the help!): |
This bug was originally filed in Launchpad as LP: #1905493
Launchpad details
Launchpad user Ian Johnson(anonymouse67) wrote on 2020-11-25T00:04:53.707777+00:00
When booting a nested lxd container inside another lxd container (just a normal container, not a VM) (i.e. just L2), using cloud-init -status --wait, the "." is just printed off infinitely and never returns.
The text was updated successfully, but these errors were encountered: