Live PXE: Ignition failure: failed to set SSH key: permission denied #496
Comments
|
For what it is worth, version 31.20200517.2.0 in the testing stream is working for bare-metal OKD4 builds with iPXE. I don't know if that helps you narrow down the issue, but there it is. :-) |
|
Nice investigation! I am not offhand understanding why that commit could cause this. Something like a race in when we mount the rootfs? Clearly though it'd help to refactor |
Nothing jumped out at me either. My late night self couldn't quite grok the change in that diff that would cause this.
Yeah seems like maybe a timing issue. At least it is consistent so investigating this was straightforward.
+1000 for CI coverage so we don't regress here in the future. |
In my testing that version is affected (fails the test). Are you providing an ignition config when you PXE boot are are you just doing an install using the kargs (coreos.inst.install_dev, etc)? The former should fail (at least if you set an SSH key), while the latter should work I think. |
|
Yes, the iPXE boot is passing an ignition config for one of bootstrap, master, or worker. I'm using kargs to set IP info and hostname, but everything else is in the ignition configs created by |
|
One can also reproduce this on the live ISO with an embedded Ignition config that tries to add an SSH key. |
Wanted to sanity-check this while debugging coreos/fedora-coreos-tracker#496.
Wanted to sanity-check this while debugging coreos/fedora-coreos-tracker#496.
Super subtle but evil regression from coreos#1423. By switching to `find` to feed filenames to `cpio`, we were passing it `.`. And `cpio` dutifully made note of the permissions of `.`, which at the time it is run are those of the temporary directory we allocated, which is naturally 0700. This in turn breaks Ignition's `files` stage, which purposely runs things like adding SSH keys as the target user, and thus it can't even access the directories in `/` when doing the equivalent of `mkdir -p /sysroot/var/home/core/.ssh/...`. Instead, use `-mindepth 1` here so we always skip the root of the temporary directory itself. This resolves coreos/fedora-coreos-tracker#496 but not marking as `Closes:` because we should really add a basic live PXE + SSH key test.
|
This took me way too long to hunt down. I was looking at squashfs'es, splitting initrds, and doing all sorts of weird diffs. The actual issue turned out to be quite simple: coreos/coreos-assembler#1487 |
Super subtle but evil regression from coreos#1423. By switching to `find` to feed filenames to `cpio`, we were passing it `.`. And `cpio` dutifully made note of the permissions of `.`, which at the time it is run are those of the temporary directory we allocated, which is naturally 0700. This in turn breaks Ignition's `files` stage, which purposely runs things like adding SSH keys as the target user, and thus it can't even access the directories in `/` when doing the equivalent of `mkdir -p /sysroot/var/home/core/.ssh/...`. Instead, use `-mindepth 1` here so we always skip the root of the temporary directory itself. This resolves coreos/fedora-coreos-tracker#496 but not marking as `Closes:` because we should really add a basic live PXE + SSH key test.
Wow, what a needle in a haystack. At least the haystack was much smaller due to our copious builds and rich history information that allowed us to narrow it down to a single PR. The merged fix does seem to work for me in a local build:
|
|
Excellent! Let us know when this hits the |
|
Looking forward to seeing it in Stable soon. Today I wanted to set up a productive system with FCOS and unfortunately stumbled upon this error, when booting from a modified installer with embedded ignition config, containing an SSH key. Push it guys! :) |
|
hey @ratzrattillo - if you're just using the media to do an install (i.e., you aren't running workloads from a live environment) then you can use the previous set of media from the
|
|
@dustymabe Thank you, that helped. Now my systems are up and running :) |
|
This is fixed now in the latest releases, and we have a test to watch for regressions. |
|
This was fixed in coreos/coreos-assembler#1487 The fix for this went into testing stream release The fix for this went into stable stream release |
coreos#1509 (comment) This change will fix the CI test for PXE by adding an SSH key to the pxe-live.ign config. For more information: coreos/fedora-coreos-tracker#515 coreos/fedora-coreos-tracker#515 coreos/fedora-coreos-tracker#496
This was orinally reported in the discussion forum.
Boots of the Live environment via PXE fail because Ignition hits an error:
failed to set SSH key: permission denied.This exists in the latest
stablerelease31.20200505.3.0. It does not exist in the priorstablerelease31.20200420.3.0.I did a bisect of our
testing-develstream to try to narrow down when the problem was introduced. Here is what I found:31.20200521.20.0- failed31.20200511.20.0- failed31.20200507.20.0- failed31.20200506.20.1- passed31.20200506.20.0- passed31.20200505.20.2- passed31.20200427.20.1- passedSo focusing on the
31.20200506.20.1->31.20200507.20.0we have:One might think the ignition upgrade would be a smoking gun, but the latest stable release doesn't yet have the new ignition and it does still fail. So we need to dig deeper and look at the COSA and FCOS configs repos:
31.20200507.20.0- failed31.20200506.20.1- passedThe commits for that range are:
7b1434d2ace9d680016562d4f27de15c37f52fa9^..5af3d8e8cf0dcf892c7e5616c6487fec7a595d9b5a07e8a5aef1bdca2272e22cbd9aaed142819f8b^..44bdfbbccb0c7fa73fe10d4bba2a41edeb802b8dOut of these it appears the
installer: Refactor initramfs code(from coreos/coreos-assembler#1423) seems the most likely suspect. I confirmed that if reset tosrc/cmd-buildextend-installerbefore that change (git checkout 6724a4c^ src/cmd-buildextend-installer) and do a local build on top of the testing-devel branch then my system boots normally.The text was updated successfully, but these errors were encountered: