Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boot fails in VirtualBox #51

Closed
faddat opened this issue Dec 16, 2017 · 10 comments
Closed

Boot fails in VirtualBox #51

faddat opened this issue Dec 16, 2017 · 10 comments
Assignees
Labels

Comments

@faddat
Copy link

faddat commented Dec 16, 2017

screenshot from 2017-12-16 17-17-57

I was successfully using pixiecore previously in the same setup. This began to happen, and I don't know why. Now, I have reinstalled my test environment (pixiecore and virtualbox) and am still getting it. Anyone have any idea why this is happening?

Ubuntu 17.10 desktop running virtualbox with a bridged network adapter.

Thanks!

@faddat
Copy link
Author

faddat commented Dec 17, 2017

I'm going to leave this open, mostly because it's very strange, but I think that it is caused by a bug in virtualbox, documented here:

coreos/tectonic-installer#932

When I changed the configuration to metal booting from a raspberry pi, there were no problems. Mostly I'm leaving this open in case anyone can give insight into what happened here. It's really got me flummoxed.

@faddat faddat changed the title Boot fails with strangely malformed URLs Boot fails in VirtualBox Dec 17, 2017
@danderson
Copy link
Owner

Thanks for the bug report!

What a strange bug, and thank you for tracking down the root cause!

It's been a while since I hacked on Pixiecore, but I thought I chainloaded to pixiecore's embedded copy of iPXE precisely so that I could boot knowing exactly what iPXE features are available, instead of relying on the vendor's PXE implementation... It's possible that Virtualbox's copy of iPXE is good enough that Pixiecore thinks it can skip that step, only to fail immediately after...

@danderson
Copy link
Owner

So, I think I can actually fix this in Pixiecore.

The ProxyDHCP server has a piece of logic where it checks if the booting client is iPXE, and if that iPXE supports HTTP. If yes, then it skips straight to serving the iPXE boot file. If not, then it first chainloads to our embedded iPXE.

A quick fix would be to just also check for bzImage support, which is one of the things listed in the iPXE DHCP option, and chainload to the Pixiecore-provided iPXE if that support is not present.

A slightly longer fix is to unconditionally chainload to Pixiecore's builtin iPXE, so that we always chainload to a "known" set of features instead of checking for individual feature support. This is a bit more complicated because we need a way to detect that the client is our iPXE as opposed to any iPXE. iPXE supports setting a custom client name, so we could use that, but I'll have to dig more into how that works. It also makes it more complex to provide your own iPXE images in pixiecore-apache2, but I think it's probably time to declare that dead, honestly.

danderson added a commit that referenced this issue Dec 25, 2017
Virtualbox inexplicably ships with an iPXE image that knows
how to speak HTTP, but cannot handle bzImage files. This is a
quick and dirty fix for that specific issue, while I work on
the more permanent fix of always chainloading into a known
good iPXE.
@danderson
Copy link
Owner

@faddat Could you test the latest pixiecore code against your virtualbox system? AFAICT, the change I just pushed should be enough to make it boot correctly, but I'd love confirmation before I go deeper into the permanent fix. Thanks!

danderson added a commit that referenced this issue Dec 25, 2017
This guarantees that we load the real OS from an iPXE with a known
featureset, rather than rely on the firmware iPXEs to be correct.

Fixes #51
@danderson
Copy link
Owner

@faddat And if possible, could you also test the code in the bug-51 branch? That branch unconditionally chainloads to pixiecore's embedded iPXE, so if I got it right, it should "fix" your virtualbox issue. Let me know if it works, and I'll merge into master.

@faddat
Copy link
Author

faddat commented Dec 31, 2017

yes :)!

I'll be happy to do both!

I'm also going to post another really odd issue -- #52 .... complete with pcap files.

@faddat
Copy link
Author

faddat commented Dec 31, 2017

Branch bug-51 seems to be effected by #52.

> $ sudo ./pixiecore boot \                                         [±bug-51 ✓]
  https://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe.vmlinuz \
  https://alpha.release.core-os.net/amd64-usr/current/coreos_production_pxe_image.cpio.gz \
  --cmdline='coreos.autologin'
[sudo] password for boot: 
Sorry, try again.
[sudo] password for boot: 
[DHCP] Offering to boot 08:00:27:4e:15:86
[DHCP] Offering to boot 08:00:27:4e:15:86

image

The tell is of course that the machine's mac address is coming up as a filename. The last thing I need to do to be 100% sure that #52 is actually a regression is run one of the old (october?) armv7 binaries from packagecloud on our trusty raspberry pi and see if it makes boot happen. So... just to be clear, I still couldn't say weather the issues in #51 have been cleared up due to #52.

Once I've run the test on our wee raspberry, I may have a look under the hood and see if I can figure out what might be causing #52. I can only gues that somehow a mac address is going where a file should go, or something similar.

@danderson
Copy link
Owner

Updated #52, there's some weirdness that suggests the problem is in your environment and not in the code changes (I hope so, because otherwise I have no idea how my Dec 24 changes are causing this failure...)

@danderson
Copy link
Owner

Okay, the Virtualbox boot can be fixed by doing two things:

  1. Unconditionally chainload to Pixiecore's embedded iPXE, so that we have all the features we need.
  2. Chainload using ipxe.pxe instead of undionly.kpxe. The undionly stack causes network configuration failures when you chainload ipxe->ipxe.

Open question: I need to check that ipxe.pxe can reliably boot bare metal as well as ipxe-based VMs. I've tested a VM that uses Intel's UNDI stack, and that seemed to work... But testing with a real machine would make me feel better.

@danderson danderson self-assigned this Jan 2, 2018
@danderson danderson added the bug label Jan 2, 2018
danderson added a commit that referenced this issue Jan 2, 2018
This guarantees that we load the real OS from an iPXE with a known
featureset, rather than rely on the firmware iPXEs to be correct.

Also switch to ipxe.pxe for BIOS boots instead of undionly.kpxe.

ipxe.pxe works when you chainload from one iPXE to another, whereas
undionly.kpxe encounters some kind of poorly explained bug where
it loses the ability to configure networking.

Tested against the following configurations:
 - VirtualBox + BIOS w/ iPXE
 - VirtualBox + BIOS w/ Intel UNDI
 - VirtualBox + EFI
 - KVM + SeaBIOS w/ iPXE
 - KVM + OVMF (EFI)
 - Dell R610 + Dell BIOS/PXE

Fixes #51, fixes #52.
@danderson
Copy link
Owner

@faddat Could you test the bug-51 branch on your setup? I've tested on a wide variety of systems that I have access to, and I think this fixes all the issues we encountered in this bug, but confirmation would be great before I merge into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants