Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

behavior if no Ignition is provided #279

Closed
cgwalters opened this issue Sep 20, 2019 · 21 comments
Closed

behavior if no Ignition is provided #279

cgwalters opened this issue Sep 20, 2019 · 21 comments
Assignees
Labels
jira for syncing to jira

Comments

@cgwalters
Copy link
Member

Today, we happily boot and sit there if the user doesn't provide Ignition. I didn't read the docs and assumed I could paste Ignition into the GCP "Startup script" (Since that's how AWS works).

Also tangentially related, afterburn seems to choke on ed25519 keys, I need to debug that.

Certainly if afterburn doesn't handle SSH keys from the provider (e.g. bare metal) there's no reason to boot and sit there inaccessible, doing nothing - right?

@jlebon
Copy link
Member

jlebon commented Sep 20, 2019

Today, we happily boot and sit there if the user doesn't provide Ignition.

Hmm, interesting. Are you able to access the logs? It might be that Ignition errored out but OnFailure=emergency.target isn't kicking in correctly.

@ajeddeloh
Copy link
Contributor

Ignition uses the GCE "user data" not the startup script iirc. See: coreos/bugs#2558 for a CL analogy.

CL will boot with no ignition config and just add SSH keys from the cloud (via afterburn), which I presume is what we want for FCOS?

@bgilbert
Copy link
Contributor

Certainly if afterburn doesn't handle SSH keys from the provider (e.g. bare metal) there's no reason to boot and sit there inaccessible, doing nothing - right?

Afterburn generally does handle provider SSH keys. Also, live ISO counts as bare metal, and we autologin there if we don't get an Ignition config. We could define a set of cases where it'd be reasonable to fail the boot without an Ignition config, but... does it matter? Either way you get a useless machine.

@cgwalters
Copy link
Member Author

Either way you get a useless machine.

The thing I got very frustrated about with GCP is it was not very obvious that my ssh key wasn't added. So a fix here could just be making clear in the serial console log if:

  1. We didn't fetch Ignition
  2. No ssh keys were added by afterburn

@dustymabe
Copy link
Member

Maybe we can build on the work robert did. Here is what we see now on the serial console:

Fedora 30.20190918.dev.3 (CoreOS preview)
Kernel 5.2.14-200.fc30.x86_64 on an x86_64 (ttyS0)

SSH host key: SHA256:OuECn8bBnGxibsCsfN0nR+YrDm+rCRvnCmGe8Edz6Kg (RSA)
SSH host key: SHA256:m8P9DoCKuhKQ8E7loJwF1haGRVIDDRlbaZ1+TcoNQbQ (ED25519)
SSH host key: SHA256:l6ucbtJCX/0SwKzmBzzaDY2qVATKIKk3XnUy1obfPVY (ECDSA)
eth0: 192.168.122.12 fe80::5054:ff:fec0:7829
localhost login:

We could possibly add some info there by adding a new file into /run/console-login-helper-messages/issue.d

@dustymabe dustymabe added the meeting topics for meetings label Dec 11, 2019
@ajeddeloh
Copy link
Contributor

I'm in favor of the issue.d method. Ignition can't know if afterburn won't find keys, so it can't fail the boot. Afterburn runs too late to fail the boot in a non-racy way.

@cgwalters
Copy link
Member Author

Yep, I have no issue with that approach.

@dustymabe
Copy link
Member

we discussed this in the meeting yesterday. There was mostly consensus that the issue.d approach would be a good path forward. There was a lot of discussion about how we detect if ignition ran or not. I believe there was an RFE for ignition that came out of that that @jlebon was going to open.

@jlebon
Copy link
Member

jlebon commented Dec 13, 2019

I believe there was an RFE for ignition that came out of that that @jlebon was going to open.

coreos/ignition#903

@lucab lucab removed the meeting topics for meetings label Dec 17, 2019
@dustymabe dustymabe added the jira for syncing to jira label Mar 23, 2020
@lucab
Copy link
Contributor

lucab commented Apr 2, 2020

@jlebon your coreos/ignition#903 goes in the direction of structured journal entries. I think it's a good API to have.
Do we still want to have the issue.d/ bits on top? If so, where would that logic live?

@dustymabe
Copy link
Member

Do we still want to have the issue.d/ bits on top? If so, where would that logic live?

I think the answer is yes. I'm working with Sohan on this part of the puzzle as well as issue.d bits that will let users know if SSH auth keys exist (which is really the information that would have helped Colin understand there was a problem (see original description of this issue)). Where will the script live that generates the issue.d snippets? I think probably either a systemd service/script in FCOS configs OR ignition-dracut. WDYT?

@jlebon
Copy link
Member

jlebon commented Apr 2, 2020

Yeah, agreed we still want a banner notification for both Ignition config provided and SSH keys found. IMO that glue should live in the FCOS configs.

@lucab
Copy link
Contributor

lucab commented Apr 22, 2020

I've some doubts on how coreos/afterburn#397 integrates here.

afterburn-sshkeys@core.service may run quite late in the boot process, so I fear that the serial console banner may be printed before keys are fetched.

Is the plan to delay the serial console, or to accommodate false negatives by not printing anything in the "no keys" case?

@cgwalters
Copy link
Member Author

afterburn-sshkeys@core.service may run quite late in the boot process, so I fear that the serial console banner may be printed before keys are fetched.

Probably...if it doesn't already do this today, the best fix would be to extend CLHM to re-render if something changes.

@dustymabe
Copy link
Member

I've some doubts on how coreos/afterburn#397 integrates here.

afterburn-sshkeys@core.service may run quite late in the boot process, so I fear that the serial console banner may be printed before keys are fetched.

Is the plan to delay the serial console, or to accommodate false negatives by not printing anything in the "no keys" case?

How bad is it to delay the console login? I think if it only delays it by tens of seconds then we should just pop Before=systemd-user-sessions.service (I think that will work) on it and call it a day.

Probably...if it doesn't already do this today, the best fix would be to extend CLHM to re-render if something changes.

That would work for motd, but I don't think the issue gets regenerated on the live console if the source files get updated does it?

@cgwalters
Copy link
Member Author

I think if it only delays it by tens of seconds then we should just pop Before=systemd-user-sessions.service (I think that will work) on it and call it a day.

s/tens/tenths/? Yeah, I agree.

That would work for motd, but I don't think the issue gets regenerated on the live console if the source files get updated does it?

Well, with a serial console we can only append. If we have an actual video card we can redraw the screen.

@rfairley
Copy link

Sorry to have missed discussion here before.

afterburn-sshkeys@core.service may run quite late in the boot process, so I fear that the serial console banner may be printed before keys are fetched.

Probably...if it doesn't already do this today, the best fix would be to extend CLHM to re-render if something changes.

Yes, currently CLHM issuegen will only re-generate when it is started on boot, before systemd-user-sessions.service. If services that generate info during boot do Before=console-login-helper-messages-issuegen.service, snippets dropped in by the service into /run/console-login-helper-messages/issue.d should be displayed reliably.

It'll be more reliable though specifying console-login-helper-messages-issuegen.path with the .path unit added (coreos/console-login-helper-messages#47) which will cause CLHM issuegen to regenerate whenever something is dropped into the CLHM issue.d directory. This will also enable writing out issues to the CLHM issue.d directory any time after boot, and the generated issue message will be refreshed.

util-linux/util-linux#1041 was implemented - beginning util-linux v2.36 we'll be able to configure agetty so that /run/issue.d will be searched upon opening the login console with agetty - so no need to rely on c-l-h-m regeneration. But for now, dropping snippets during boot into /run/console-login-helper-messages/issue.d and having Before=console-login-helper-messages-issuegen.service would work if the information is sourced quickly (before issuegen finishes).

That would work for motd, but I don't think the issue gets regenerated on the live console if the source files get updated does it?

Well, with a serial console we can only append. If we have an actual video card we can redraw the screen.

I hadn't thought about refreshing the console after already opening it - maybe CLHM should be calling agetty --reload when regenerating the issue file?

dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue May 22, 2020
…rized keys

This PR addresses the concern raised in coreos/fedora-coreos-tracker#279
which talks about systems behavior when no igntion is provided. Currently, we're tracking ignitionConfig
messages(coreos/fedora-coreos-tracker#279) and ssh-authorized keys info
(coreos/afterburn#397) by sending the structured entry into journald log. Here,
the systemd units are written to scrape through that information to display meaningful data to users.
@dustymabe
Copy link
Member

The solution to this problem was implemented via changes to various upstream projects but ultimately will now be shown to the user because of the work @sohankunkerkar did in coreos/fedora-coreos-config#344.

@dustymabe
Copy link
Member

The fix for this landed upstream. It is now pending a testing stream release.

@dustymabe dustymabe added the status/pending-testing-release Fixed upstream. Waiting on a testing release. label May 22, 2020
@dustymabe
Copy link
Member

The fix for this went into testing stream release 32.20200601.2.1. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Jun 4, 2020
@dustymabe
Copy link
Member

The fix for this went into stable stream release 32.20200601.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira
Projects
None yet
Development

No branches or pull requests

8 participants