New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcom bnx2 driver fails when specifying first_boot #1263

Closed
mkristiansen opened this Issue May 3, 2016 · 12 comments

Comments

Projects
None yet
6 participants
@mkristiansen

mkristiansen commented May 3, 2016

Hardware:

  • IBM eServer BladeCenter HS21 -[8853L6G]-/Server Blade, BIOS -[BCE147AUS-1.20]- 10/27/2010
  • QLogic bnx2 Gigabit Ethernet Driver v2.2.6 (January 29, 2014)

Running corecfg and dnsmasq as documented in github.com/coreos/coreos-baremetal
Specifying pxe profile when starting corecfg:

  • sudo docker run --net=host -p 8080:8080 --rm -v $PWD/examples:/var/lib/bootcfg:Z -v $PWD/examples/groups/pxe:/var/lib/bootcfg/groups:Z quay.io/coreos/bootcfg:v0.3.0 -address=0.0.0.0:8080 -log-level=debug
  • sudo docker run --rm --cap-add=NET_ADMIN --net=host quay.io/coreos/dnsmasq -d -q --dhcp-range=172.18.0.43,172.18.0.99 --enable-tftp --tftp-root=/var/lib/tftpboot --dhcp-userclass=set:ipxe,iPXE --dhcp-boot=tag:#ipxe,undionly.kpxe --dhcp-boot=tag:ipxe,http://172.18.0.2:8080/boot.ipxe --log-queries --log-dhcp --address=/bootcfg.foo/172.18.0.2

When I boot using the default pxe profile (including config.url, first_boot and autologin) the boot process fails as the NICs fail to initialise
When I remove boot_first from the list of cmd line parameters it successfully boots.

In the following gists (output from journalctl) I have eliminated config.url to demonstrate the problem with the smallest set of config options

@crawford

This comment has been minimized.

Member

crawford commented May 3, 2016

May 03 11:30:48 localhost systemd-networkd[203]: eth0: eth0            : could not bring up interface: No such file or directory
May 03 11:30:48 localhost kernel: bnx2 0000:04:00.0: Direct firmware load for bnx2/bnx2-mips-06-6.2.3.fw failed with error -2
May 03 11:30:48 localhost kernel: bnx2: Can't load firmware file "bnx2/bnx2-mips-06-6.2.3.fw"

The reason this only happens with first_boot is because the network isn't initialized in the initramfs unless that parameter is present.

@mkristiansen

This comment has been minimized.

mkristiansen commented May 3, 2016

Forgive me.. I am not sure I understand..

When I boot
with "coreos.first_boot": "" in the profile => Failure to init network
without any reference to coreos.first_boot in the profile => Network init's correctly

Is this the expected behaviour?

I thought we needed network init in initramfs in order to pull the Ignition definition from the url defined in coreos.config.url? Do I need to implement additional boot phases to lay down the ignition config (and subsequently use from (oem://)?

@crawford

This comment has been minimized.

Member

crawford commented May 3, 2016

Ignition and the network are only started in the initramfs if coreos.first_boot is preset with a non-zero value. The behavior you are seeing is expected, but it is incorrect. We need to ship the firmware in the initramfs so that this NIC can initialize properly.

@mkristiansen

This comment has been minimized.

mkristiansen commented May 3, 2016

I note from https://alpha.release.core-os.net/amd64-usr/983.0.0/coreos_production_image_contents.txt that bnx2/bnx2-mips-06-6.2.3.fw already exists in /lib64/firmware - so wonder if there is something else amiss.

Is there anything I can do locally to patch the required files into the image to get up and running while waiting for the upstream fix?

@crawford

This comment has been minimized.

Member

crawford commented May 3, 2016

The firmware exists in the base image but not the initramfs. This is where the failure is happening. There isn't really anything you can do locally, since it's going to involve modifying the initramfs CPIO and injecting that into your kernel image.

@crawford crawford added this to the CoreOS 1038.0.0 milestone May 3, 2016

@marineam

This comment has been minimized.

marineam commented May 4, 2016

Hm, well, considering this is PXE I think you actually can work around it. This guide explains adding files to /usr/share/oem inside the initrd, in your case you want to add /usr/lib64/firmware which could be extracted from the squashfs or copied from /lib/firmware on most any system.

https://coreos.com/os/docs/latest/booting-with-pxe.html#adding-a-custom-oem

@marineam

This comment has been minimized.

marineam commented May 4, 2016

but we really need to improve our kernel firmware situation. Also slim down the image some too, we include all built modules but that isn't really required, dropping netfilter for example is an obvious first step to clean that up. hopefully the slimming will have a more significant impact on size than the firmware adding ;-)

@mkristiansen

This comment has been minimized.

mkristiansen commented May 4, 2016

Thanks guys. I got it to work by adding drivers via /usr/lib64/firmware in the initrd as recommended. I'm happy to carry the inconvenience of doing this on every CoreOS release for the time being.

As a side thought.. is it possible for an end-user of the framework to supply an additional, completely separate (initrd) file (via assets)? This could form the basis for injecting firmware and other files without having to re-add into the pxe image on every release.

@spkane

This comment has been minimized.

spkane commented Jun 23, 2016

It looks like this also impacts installing coreos to disk, via coreos-install, since that initrd that is provided in the coreos_production_image.bin.bz2 and used with an injected ignition file is also missing these files. This is much harder to work around, since the bin is gpg signed, and much more difficult to modify.

I'm going to try modifying the install script to inject the initrd, but I have no idea if it will work. Getting this firmware images into a release soon, would be much appreciated, so that it can be tested and get into the next stable release.

@mischief

This comment has been minimized.

mischief commented Jun 23, 2016

@spkane it's currently being worked on at coreos/coreos-overlay#2005

@spkane

This comment has been minimized.

spkane commented Jun 23, 2016

@mischief Thank you. Looking there.

@crawford

This comment has been minimized.

Member

crawford commented Jun 28, 2016

Fixed by coreos/coreos-overlay#2005. This will be in the 1096.0.0 Alpha.

@crawford crawford closed this Jun 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment