New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud-init not running on bare metal installed via coreos-install #1872

Closed
SpencerBrown opened this Issue Mar 17, 2017 · 6 comments

Comments

Projects
None yet
5 participants
@SpencerBrown

SpencerBrown commented Mar 17, 2017

Issue Report

Bug

Container Linux Version

1353
$ cat /etc/os-release

Environment

bare metal server

What hardware/cloud provider/hypervisor is being used to run Container Linux?

just a plain Intel x86 server, has 3 hard drives, was installing to /dev/sdb. have installed coreos on this machine before, but it's been many months

Expected Behavior

coreos-install would install a system that boots and processes the cloud-init file

Actual Behavior

booted but did not process cloudinit, so could not login

Reproduction Steps

burned the coreos iso (1335) to a flash drive, booted the flash drive
put cloud-config.yaml on another flash drive, mounted it to /media

coreos-install -d /dev/sdb -C alpha -c /media/cloud-config.yaml

this ran successfully and installed coreos

now boot from the newly installed hard drive, you will see that it boots without cloud-config which had an SSH key, so I cannot login

the cloud-config file is present in /var/lib/coreos-install/user_data

if I modify the system to add a grub.cfg with coreos-autologin so I can get to the console, then I manually run coreos-cloudinit --from-file=/var/lib/coreos-install/user_data, it processes my directives correctly

Other Information

there are no coreos-cloudinit entries in the systemd journal
there is no evidence of any config or cloudinit activities in systemctl history
here is my somewhat redacted cloud-config.yaml

#cloud-config

ssh_authorized_keys:
  - "ssh-rsa AAAAB3NzaC1yc2EAAAA...redacted"

hostname: server-d

users:
  - name: core
    passwd: "(redacted)"

Feature Request

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?

Desired Feature

Other Information

@dm0-

This comment has been minimized.

Show comment
Hide comment
@dm0-

dm0- Mar 17, 2017

Member

I've create coreos/coreos-overlay#2484, which I believe will fix this. As a workaround, can you try running the following and rebooting?

mkdir -p /etc/systemd/system/multi-user.target.wants
ln -s /usr/lib/systemd/system/{user,system}-config.target /etc/systemd/system/multi-user.target.wants/

You should also be able to mount the target disk and do the equivalent after the coreos-install command, if that is easier.

Member

dm0- commented Mar 17, 2017

I've create coreos/coreos-overlay#2484, which I believe will fix this. As a workaround, can you try running the following and rebooting?

mkdir -p /etc/systemd/system/multi-user.target.wants
ln -s /usr/lib/systemd/system/{user,system}-config.target /etc/systemd/system/multi-user.target.wants/

You should also be able to mount the target disk and do the equivalent after the coreos-install command, if that is easier.

@SpencerBrown

This comment has been minimized.

Show comment
Hide comment
@SpencerBrown

SpencerBrown Mar 17, 2017

Thanks @dm0- the workaround indeed worked, and thanks for the fast turnaround.

SpencerBrown commented Mar 17, 2017

Thanks @dm0- the workaround indeed worked, and thanks for the fast turnaround.

@llamahunter

This comment has been minimized.

Show comment
Hide comment
@llamahunter

llamahunter Mar 17, 2017

Note that we hit this too when our working bare metal install auto upgraded to 1353.0.0.

llamahunter commented Mar 17, 2017

Note that we hit this too when our working bare metal install auto upgraded to 1353.0.0.

@larryr

This comment has been minimized.

Show comment
Hide comment
@larryr

larryr Mar 18, 2017

I have several nodes that have already downloaded 1353.0.0 but are in a waiting to reboot state so if i do update_engine_client -check_for_update it skips because its in a waiting for update state.
How do I force to bypass the 1353.0.0 and go to 1353.1.0 because I know after the update my cloud init is not read and the node does not going my etcd cluster; gets wrong IP address and ruins does not execute a few other key tasks (like joining my mess cluster). It does have internet access so it seems like they do start downloading 1353.1.0 but I have to manually reboot one at a time.

larryr commented Mar 18, 2017

I have several nodes that have already downloaded 1353.0.0 but are in a waiting to reboot state so if i do update_engine_client -check_for_update it skips because its in a waiting for update state.
How do I force to bypass the 1353.0.0 and go to 1353.1.0 because I know after the update my cloud init is not read and the node does not going my etcd cluster; gets wrong IP address and ruins does not execute a few other key tasks (like joining my mess cluster). It does have internet access so it seems like they do start downloading 1353.1.0 but I have to manually reboot one at a time.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Mar 18, 2017

Member

@larryr you can use update_engine_client -reset_status. Then if you check for updates, it will download the latest.

Member

crawford commented Mar 18, 2017

@larryr you can use update_engine_client -reset_status. Then if you check for updates, it will download the latest.

@larryr

This comment has been minimized.

Show comment
Hide comment
@larryr

larryr Mar 18, 2017

ah... thanks... other than 1 node that I needed to get back online since it ran a mess master... i've been letting the others just figure themselves out. if they booted into 1353.0.0 it seems they would come up outside the cluster but still with internet access and then would eventually upgrade. I'm waiting for 2/12 nodes to finish upgraded only 1 stays out of the cluster at a time.
thx.

larryr commented Mar 18, 2017

ah... thanks... other than 1 node that I needed to get back online since it ran a mess master... i've been letting the others just figure themselves out. if they booted into 1353.0.0 it seems they would come up outside the cluster but still with internet access and then would eventually upgrade. I'm waiting for 2/12 nodes to finish upgraded only 1 stays out of the cluster at a time.
thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment