Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

providers/qemu: support Ignition block device on s390x and ppc64le #905

Merged
merged 1 commit into from
Mar 16, 2020

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Dec 18, 2019

Add experimental support for fetching Ignition configs via a virtio
block device with serial/ID ignition.

The main advantage of this is that it is cross-platform. But for now, we
only use it on platforms which don't support the QEMU firmware config
approach, which are (of those we care about) s390x and ppc64le. We may
end up using it across the remaining platforms.

See related discussions in:
#928

@jlebon
Copy link
Member Author

jlebon commented Dec 18, 2019

See coreos/coreos-assembler#1004 (comment)
Not sure if we actually need to get this in, or if we can just wait until #825 is fixed.

@Prashanth684
Copy link
Contributor

Prashanth684 commented Dec 18, 2019

we need the same mechanism for ppc too since ppc doesn't support fw_cfg ..should the qemu provider just support fetching config from the config-drive for non x86 arches ?

Of course for ppc , we still need the cluster-api-provider libvirt to support config drive just like for s390x so we can make use of it - but i am bringing this up for the sake of a long term discussion

@ajeddeloh
Copy link
Contributor

If we introduce a hack, we need an exit strategy too. What's the plan for when we want to remove this?

@jaypoulz
Copy link

@dbenoit17 we need to re-initiate this discussion so that we can get this to work for s390x

@Prashanth684
Copy link
Contributor

@jlebon , i don't think #825 is going to be fixed anytime soon as nobody seems to be actively looking at it. Could we get this PR in and while we're at it also add ppc64le to it.Thanks.

@jlebon
Copy link
Member Author

jlebon commented Feb 26, 2020

@Prashanth684 Were you able to test this?

Once this goes in, it's going to be harder to move to something cleaner. Just quickly, has the alternative @lucab mentioned here been investigated? #825 (comment).

@Prashanth684
Copy link
Contributor

Prashanth684 commented Feb 26, 2020

@Prashanth684 Were you able to test this?

Once this goes in, it's going to be harder to move to something cleaner. Just quickly, has the alternative @lucab mentioned here been investigated? #825 (comment).

@jlebon No I have not tested this fix yet, we have just been using the openstack image itself to run the CI. I will build try to test this fix and let you know.

As for #825 , we need some kvm experts from IBM to look into this and give their feedback. I will take this as an action item to talk to some of their kvm experts and get some suggestions from them, but I'm guessing it won't be done anytime soon as #825 has been open for a while now. We need this change for CI on Openshift 4.3 for s390x and that means this fix is our only viable option for now.

@Prashanth684
Copy link
Contributor

@Prashanth684 Were you able to test this?
Once this goes in, it's going to be harder to move to something cleaner. Just quickly, has the alternative @lucab mentioned here been investigated? #825 (comment).

@jlebon No I have not tested this fix yet, we have just been using the openstack image itself to run the CI. I will build try to test this fix and let you know.

As for #825 , we need some kvm experts from IBM to look into this and give their feedback. I will take this as an action item to talk to some of their kvm experts and get some suggestions from them, but I'm guessing it won't be done anytime soon as #825 has been open for a while now. We need this change for CI on Openshift 4.3 for s390x and that means this fix is our only viable option for now.

Just tested this on s390x and it works. @jlebon another alternative to getting the fix upstream is to just patch the downstream Ignition rpm spec for RHCOS since this is specifically needed only in the RHCOS image used by Openshift CI s390x pipeline. This patch would have to carried-forward downstream till #825 is addressed.

internal/platform/platform.go Outdated Show resolved Hide resolved
@lucab
Copy link
Contributor

lucab commented Feb 28, 2020

I don't honestly see a feasible plan (actions+timeline) to ever get out of this terrible hack without breaking the whole world; does that plan exist somewhere?

Additionally, there has been literally no effort in investigating how to do this in a non-hackish way with platform-specific capabilities of qemu-system-s390x (I guess because, to the best of my knowledge, #825 hasn't been properly prioritized & planned on the roadmap of openshift-s390x development team).

@jlebon
Copy link
Member Author

jlebon commented Feb 28, 2020

We're discussing changing the semantics for QEMU here: #928

@Prashanth684
Copy link
Contributor

I don't honestly see a feasible plan (actions+timeline) to ever get out of this terrible hack without breaking the whole world; does that plan exist somewhere?

Additionally, there has been literally no effort in investigating how to do this in a non-hackish way with platform-specific capabilities of qemu-system-s390x (I guess because, to the best of my knowledge, #825 hasn't been properly prioritized & planned on the roadmap of openshift-s390x development team).

I think the problem was when #825 was opened, it was not opened with the motive to find a solution in qemu itself but rather to work around it. The IBM team went with the config drive solution for the Openshift installer, the problem being that with the config drive approach, it would only work with Openstack platform. So for most of their and our CI testing, we were using the Openstack image. But now with the automated CI, we only have the option of using the qemu image (since this was turned down: openshift/installer#2092). We are also discussing an alternative to somehow use the Openstack image for CI instead of the qemu image which would be a better way to do it.

@crawford
Copy link
Contributor

crawford commented Mar 5, 2020

Can we move forward with this and just put a big "experimental" label on it? Within OpenShift, our customers don't actually make use of this mechanism anyway (we recommend using coreos.inst.ignition_url). We only use this in CI, where we don't need to worry about backward compatibility as much.

@jlebon
Copy link
Member Author

jlebon commented Mar 5, 2020

Can we move forward with this and just put a big "experimental" label on it?

Is there a way to do this at the technical level? Or is it more of a policy thing? I guess for now we can just print scary warnings on the console or something.

@crawford
Copy link
Contributor

crawford commented Mar 5, 2020

I think it would be a combination of policy and a s̰̮̉ͮ̇c̞̠̮̤͍͔̊̚ͅa͈̟̘̬̲̟̅̾͋̆r̳̘͎̭͚̦̐̉̀͑͐̋̓y̆͋̓ͫ ̣̺̤͈̪͛̊͂ͩ̃̉̚w̞͍̣̍̓̓̌a͉̗͇̻͈͉̻r̜̜͖͎͖̜͛̈́ͅniͣn͖̼̹̓̓̓͋̐gͪͧ͊͗ͮͪ̂. We could bother with creating a feature flag mechanism eventually, but that seems overkill for this.

@jlebon jlebon changed the title platform: add hack to use openstack fetcher for s390x qemu providers/qemu: support Ignition block device on s390x and ppc64le Mar 6, 2020
@jlebon
Copy link
Member Author

jlebon commented Mar 6, 2020

#928 (comment)

So here's a suggestion: we're currently in dire need of getting something to work for better OCP CI coverage on s390x. It doesn't have to be stabilized and we don't expect production use of whatever mechanism we choose.

Thus, I think this is a good opportunity to try out this approach to see how it looks and get feedback from it. I've updated #905 for this. I've tested it successfully on amd64 (by tweaking the conditional build flags).

@Prashanth684 Can you test this on s390x and ppc64le?

@Prashanth684
Copy link
Contributor

Prashanth684 commented Mar 6, 2020

#928 (comment)

So here's a suggestion: we're currently in dire need of getting something to work for better OCP CI coverage on s390x. It doesn't have to be stabilized and we don't expect production use of whatever mechanism we choose.
Thus, I think this is a good opportunity to try out this approach to see how it looks and get feedback from it. I've updated #905 for this. I've tested it successfully on amd64 (by tweaking the conditional build flags).

@Prashanth684 Can you test this on s390x and ppc64le?

Thanks @jlebon . Will do. @crawford going ahead with this option would require changes to the terraform-libvirt-provider which would then have to be bumped in the Openshift installer FYI. So we have to be prepared for that. And the cluster-api-provider-libvirt too.

@jlebon jlebon requested a review from lucab March 6, 2020 16:34
@Prashanth684
Copy link
Contributor

#928 (comment)

So here's a suggestion: we're currently in dire need of getting something to work for better OCP CI coverage on s390x. It doesn't have to be stabilized and we don't expect production use of whatever mechanism we choose.
Thus, I think this is a good opportunity to try out this approach to see how it looks and get feedback from it. I've updated #905 for this. I've tested it successfully on amd64 (by tweaking the conditional build flags).

@Prashanth684 Can you test this on s390x and ppc64le?

works on s390x. i was able to bring up an rhcos VM up with terraform. will also test ppc64le

@Prashanth684
Copy link
Contributor

#928 (comment)

So here's a suggestion: we're currently in dire need of getting something to work for better OCP CI coverage on s390x. It doesn't have to be stabilized and we don't expect production use of whatever mechanism we choose.
Thus, I think this is a good opportunity to try out this approach to see how it looks and get feedback from it. I've updated #905 for this. I've tested it successfully on amd64 (by tweaking the conditional build flags).

@Prashanth684 Can you test this on s390x and ppc64le?

works on s390x. i was able to bring up an rhcos VM up with terraform. will also test ppc64le

works on ppc64le as well.

internal/providers/qemu/qemu_blockdev.go Outdated Show resolved Hide resolved
internal/providers/qemu/qemu_blockdev.go Outdated Show resolved Hide resolved
internal/providers/qemu/qemu_blockdev.go Outdated Show resolved Hide resolved
internal/providers/qemu/qemu_fwcfg.go Outdated Show resolved Hide resolved
internal/providers/qemu/qemu_blockdev.go Outdated Show resolved Hide resolved
internal/providers/qemu/qemu_blockdev.go Outdated Show resolved Hide resolved
@crawford
Copy link
Contributor

@crawford going ahead with this option would require changes to the terraform-libvirt-provider which would then have to be bumped in the Openshift installer FYI.

That's okay. We had to do that in the 4.2 branch as well. Fortunately, libvirt is not a supported platform, so there is relatively little risk.

@jlebon
Copy link
Member Author

jlebon commented Mar 10, 2020

Had a chat with @lucab and we've agreed that re-using the same context.Context pattern from the other providers is not very appealing. Went with a simpler approach now that just uses a vanilla chan + select.

@jlebon jlebon requested a review from lucab March 11, 2020 17:38
Add experimental support for fetching Ignition configs via a virtio
block device with serial `ignition`.

The main advantage of this is that it is cross-platform. But for now, we
only use it on platforms which don't support the QEMU firmware config
approach, which are (of those we care about) s390x and ppc64le. We may
end up using it across the remaining platforms.

See related discussions in:
coreos#928
Copy link
Contributor

@lucab lucab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM.

@jlebon
Copy link
Member Author

jlebon commented Mar 12, 2020

@bgilbert Make sense? Or did you want to keep exploring options in #928?

@jlebon jlebon requested a review from bgilbert March 12, 2020 14:10
@Prashanth684
Copy link
Contributor

Prashanth684 commented Mar 16, 2020

@jlebon we can continue exploring better options in #928 if needed, but do you think it makes sense to get this in now so we have a solution that unblocks us?

@jlebon
Copy link
Member Author

jlebon commented Mar 16, 2020

@jlebon we can continue exploring better options in #928 if needed, but do you think it makes sense to get this in now so we have a solution that unblocks us?

Yes, agreed. Let's get this in, given that it's experimental and only applies to previously unsupported architectures.

@jlebon jlebon merged commit a25df69 into coreos:master Mar 16, 2020
cgwalters referenced this pull request in coreos/coreos-assembler Apr 20, 2020
This finally unifies the advantages of `cosa run` and `kola spawn`.
I kept getting annoyed by how serial console sizing is broken
(e.g. trying to use `less` etc.).  Using `ssh` via `kola spawn`
addresses that, but it means you can't debug the initramfs.

Now things work in an IMO pretty cool way; if you do e.g.
`cosa run --kargs ignition.config.url=blah://` (or inject a bad
Ignition config) to cause a failure in the initramfs,
you'll see a nice error (building on
coreos/ignition-dracut#146 ) telling you
to rerun with `cosa run --devshell-console`.

Things are also wired up cleanly so that we support rebooting
with the equivalent of `kola spawn --reconnect` (which we should
probably remove now).  You can exit via *either* quitting SSH
cleanly or using `poweroff`, and the lifecycle of ssh and qemu
is wired together.

And finally, if we detect a cosa workdir we also bind it in by
default.

More to come here, such as auto-injecting debugging
tools and containers.
@cgwalters
Copy link
Member

What's the qemu incantation to use this? The "convenience" -drive method seems to reject using serial.

@jlebon
Copy link
Member Author

jlebon commented Apr 22, 2020

What's the qemu incantation to use this? The "convenience" -drive method seems to reject using serial.

Yeah, serial is a property of the virtio-blk device driver, so you have to associate with an explicitly defined virtio-blk device. E.g.:

-drive file=foobar.ign,if=none,format=raw,readonly=on,id=ignition \
  -device virtio-blk,serial=ignition,drive=ignition

(This is also what kola does for primary-disk.)

jcajka added a commit to jcajka/coreos-assembler that referenced this pull request May 27, 2020
…fault

Move from injecting ingnition directly in to the VM image on non-fw_cfg architectures
to passing it via virtio-blk device. It should be more performant and should be more
robust going forward.
See: coreos/ignition#905
openshift-merge-robot pushed a commit to coreos/coreos-assembler that referenced this pull request May 27, 2020
…fault

Move from injecting ingnition directly in to the VM image on non-fw_cfg architectures
to passing it via virtio-blk device. It should be more performant and should be more
robust going forward.
See: coreos/ignition#905
@dustymabe
Copy link
Member

For anyone in the future who stumbles upon this like I did:

@jebon pointed me at #928 (comment)

so this should work instead of passing --qemu-commandline= directly:

virt-install ... --disk path=$PWD/config.ign,format=raw,readonly=on,serial=ignition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants