Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-init selects sysconfig netconfig renderer if network-manager is installed on Ubuntu #3354

Closed
ubuntu-server-builder opened this issue May 11, 2023 · 33 comments
Labels
launchpad Migrated from Launchpad priority Fix soon

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #1819994

Launchpad details
affected_projects = ['maas', 'plainbox-provider-certification-server', 'cloud-init (Ubuntu)']
assignee = None
assignee_name = None
date_closed = 2019-05-10T18:08:34.003361+00:00
date_created = 2019-03-14T02:30:40.386891+00:00
date_fix_committed = 2019-04-22T22:46:45.653909+00:00
date_fix_released = 2019-05-10T18:08:34.003361+00:00
id = 1819994
importance = high
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1819994
milestone = None
owner = raharper
owner_name = Ryan Harper
private = False
status = fix_released
submitter = duanbl1
submitter_name = duanbenliang
tags = ['servcert-265']
duplicates = []

Launchpad user duanbenliang(duanbl1) wrote on 2019-03-14T02:30:40.386891+00:00

Configuration:
UEFI/BIOS: TEE136S
IMM/BMC: CDI333V
CPU: Intel(R) Xeon(R) Platinum 8253 CPU @ 2.20GHz
Memory: 16G DIMM * 12
Raid card: ThinkSystem RAID 530-8i
NIC Card: Intel X722 LOM

Reproduce Steps:
1.Config "network" as first boot
2.Power on machine
3.Visit TC through web browser and Commission machine
4.When commission complete, deploy ubuntu 18.04 LTS on SUT
5.The Error appeared during OS deploy.

Deploy errors like the following(you can view the attachment for details):

cloud-init[xxxx] Date_and_time - handlers.py[WARNING]: failed posting event: start: modules-final/config-xxxx: running config-xxxx

cloud-init[xxxx] Date_and_time - handlers.py[WARNING]: failed posting event: fainish: modules-final: SUCCESS: running modules for final

@ubuntu-server-builder ubuntu-server-builder added launchpad Migrated from Launchpad priority Fix soon labels May 11, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user duanbenliang(duanbl1) wrote on 2019-03-14T02:30:40.386891+00:00

Launchpad attachments: Errors during deploy

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user duanbenliang(duanbl1) wrote on 2019-03-14T02:33:13.606700+00:00

Launchpad attachments: Error during deploy

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user duanbenliang(duanbl1) wrote on 2019-03-14T02:33:44.999366+00:00

Launchpad attachments: The log under MAAS

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user duanbenliang(duanbl1) wrote on 2019-03-14T02:34:06.784111+00:00

Launchpad attachments: Output of dpkg_l

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Blake Rouse(blake-rouse) wrote on 2019-03-14T11:38:41.250137+00:00

Looks like it might be an issue either in curtin or MAAS based on the network configuration.

Once the machine fails to deploy can you provide the output of:

maas {profile} machine get-curtin-config {system_id}

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Jeff Lane (bladernr) wrote on 2019-03-14T16:22:02.476461+00:00

FYI, I've added a cert task for this. I don't know for sure this is curtin, it looks like something may have changed in one of the hundreds of dependency packages that checkbox pulls in causing curtin to fail.

Rod is investigating it on our side.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Jeff Lane (bladernr) wrote on 2019-03-14T19:29:51.173736+00:00

We have a bug for this as well, 1189973 but duping for that kills the MAAS (possibly curtin) task. So I un-duped it for now

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-03-14T19:36:05.407887+00:00

We've traced the problem to the network-manager package, which gets pulled in by a dependency in canonical-certification-server. Apparently, curtin or cloud-init (I'm not sure which) is now skipping netplan configuration when the network-manager package is installed.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-03-14T20:53:11.663789+00:00

Launchpad attachments: Output of "maas {profile} machine get-curtin-config {system_id}" on MAAS server

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-03-14T20:54:08.321929+00:00

Launchpad attachments: Output of "maas {profile} node-results read system_id={system_id}" on MAAS server

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2019-03-14T21:29:20.282424+00:00

Neither curtin, nor cloud-init will skip generating networking. However, if there exists some additional netplan config in the target system that cloud-init is not aware (maybe provided in the NetworkManager package (or something else)) then there may be a conflict in the configuration that prevents netplan apply from bringing up the network.

If possible, getting the systemd journal and what's in /etc/netplan and /run/systemd/{netif,network} and /var/log/cloud-init.log could help see what's going on.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-03-14T22:41:35.104569+00:00

Launchpad attachments: /var/log/cloud-init.log on a node that failed deployment

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-03-14T22:43:47.937184+00:00

I've attached the /var/log/cloud-init.log file from a node that failed deployment. (This is a different node from the one that generated the earlier logs.) The /etc/netplan directory is empty, and neither there is no /run/systemd directory on this node that failed to deploy.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2019-03-15T14:51:10.206832+00:00

2019-03-14 17:32:34,606 - init.py[DEBUG]: Selected renderer 'sysconfig' from priority list: None

This is a cloud-init bug. The sysconfig renderer has NetworkManager support, this triggered cloud-init to render sysconfig instead of netplan.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2019-03-15T19:19:25.169491+00:00

You can workaround this issue by including the following curtin config when deploying.

write_files:
policy:
path: /etc/cloud/cloud.cfg.d/01_network_renderer_policy.cfg
content: |
#cloud-config
system_info:
network:
renderers: ['eni', 'netplan']

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-03-15T19:44:41.765139+00:00

Thanks for the quick fix, Ryan! I've confirmed that your curtin config workaround in comment #15 works. Do you have an estimate for how long it'll be before a fix goes live? (I ask so we can plan whether we should push your workaround through one of the certification packages.)

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Ryan Harper(raharper) wrote on 2019-03-15T20:51:06+00:00

On Fri, Mar 15, 2019 at 2:50 PM Rod Smith rod.smith@canonical.com wrote:

Thanks for the quick fix, Ryan! I've confirmed that your curtin config
workaround in comment #15 works. Do you have an estimate for how long
it'll be before a fix goes live? (I ask so we can plan whether we should
push your workaround through one of the certification packages.)

Depends on where you need it. It can likely land upstream either today
or on Monday; and would be available via the cloud-init-dev daily PPA;
however, an SRU will take at least another week after next; We're almost
done with an existing cloud-init SRU; so we'd likely not start another SRU
until the current one is in -updates.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1819994

Title:
cloud-init selects sysconfig netconfig renderer if network-manager is
installed on Ubuntu

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1819994/+subscriptions

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Amy Gou(goujm1) wrote on 2019-03-18T09:54:52.332948+00:00

hi Jeff and all,

After upgrade online, it is MAAS 0.4.0 show under version tale, but still 2.4.2 under the log. in the same time, the deploy fails again. please double check the log and let me know if there is any comments.

Best Regards,
Amy
Launchpad attachments: UbuntuUpdate0318.zip

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Jeff Lane (bladernr) wrote on 2019-03-18T15:54:20.830384+00:00

Hi Amy,

first, which machine failed? I see a bunch of machines in the /var/log/maas/rsyslog/ directory, and I'm not sure exactly which one to look at.

Secondly, the version you posted in the screen shot looks correct, can you show me the output of:

ls -l /etc/maas/preseed/curtin_userdata*

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Jeff Lane (bladernr) wrote on 2019-03-18T15:58:34.386862+00:00

Amy: Also, could you send me a tarball containing /etc/maas/preseeds ??

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user duanbenliang(duanbl1) wrote on 2019-03-19T07:59:31.300448+00:00

Launchpad attachments: Screen shot of "ls -l /etc/maas/preseed/curtin_userdata*"

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user duanbenliang(duanbl1) wrote on 2019-03-19T08:00:41.799679+00:00

Launchpad attachments: Tarball of "/etc/maas"

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Amy Gou(goujm1) wrote on 2019-03-19T11:20:07.034916+00:00

hi Jeff,

it is SR590 Cascadelake deploy failed with the new MAAS 0.4.0. the attahmen above is collected from The environment with SR590 Cascadelake.
Besides, the same issue also occurs on SR650 cascadelake.

best Regards,
Amy

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-03-19T23:01:17.784603+00:00

Amy, I think you're confusing the MAAS version (which is 2.4.2 on one of our installations) and the maas-cert-server package version (the latest of which is 0.4.0). The maas-cert-server 0.3.9 package includes a workaround (but NOT A FIX) for this bug, and 0.4.0 provides some unrelated improvements, so the installation SHOULD succeed after you've upgraded maas-cert-server to version 0.3.9 or 0.4.0. If it's still failing, then it could be you'll need to apply the workaround described by Ryan Harper in comment #15, which is different from the workaround in maas-cert-server 0.3.9 and 0.4.0. (Post back if you need help applying Ryan's workaround.) It could also be that you're looking at a completely different problem.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Amy Gou(goujm1) wrote on 2019-03-20T10:38:20.526162+00:00

hi Rod,

Thanks for your update, we will use the workaround to execute the current certification test on Purley Cascadelake.
As to the Deploy failure on MAAS 0.4.0, do you advise we raise the other defect to track?

Best Regards,
Amy

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Server Team CI bot(server-team-bot) wrote on 2019-04-22T22:46:43.736734+00:00

This bug is fixed with commit 5de83fc to cloud-init on branch master.
To view that commit see the following URL:
https://git.launchpad.net/cloud-init/commit/?id=5de83fc5

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2019-05-10T18:08:35.881688+00:00

This bug is believed to be fixed in cloud-init in version 19.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Amy Gou(goujm1) wrote on 2019-05-13T09:27:29.235445+00:00

Sorry for the later reply, the issue does not occur with current Cloud-init v. 18.5-45-g3554ffe8-0ubuntu1~18.04.1. please move on and close it. thanks a lot.

Best Regards,
Amy

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Jeff Lane (bladernr) wrote on 2019-05-13T15:28:26.883924+00:00

Hi Amy, it's likely that you're still using our patched tooling that includes a workaround. cloud-init 18.5 should not work.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Jeff Lane (bladernr) wrote on 2019-06-19T15:54:38.308550+00:00

Just a heads up, the fix is now in -updates, I've tested this locally on a couple deployments and it seems to resolve the issue we had before. Asking my team to verify on a couple more deployments for due diligence.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Rod Smith(rodsmith) wrote on 2019-06-19T19:10:25.401326+00:00

I've tested this on three nodes on two MAAS servers (my own home MAAS server and maastiff, our MAAS server in the certification lab), using both 18.04 and 19.04. It looks good to me.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Amy Gou(goujm1) wrote on 2019-06-20T10:16:23.729470+00:00

thanks for your kindly update, i will do the double check with the latest one.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Dan Watkins(oddbloke) wrote on 2019-07-23T16:08:05.314117+00:00

Hi Amy et al,

I'm going to mark this Fix Released, as 19.1 has made its way in to Ubuntu. Please let us know if you don't think this is fixed!

Dan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad priority Fix soon
Projects
None yet
Development

No branches or pull requests

1 participant