New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP IP changes on every boot when using PXE #1432

Closed
jgunthorpe opened this Issue Jun 30, 2016 · 16 comments

Comments

@jgunthorpe

jgunthorpe commented Jun 30, 2016

Using coreos 1010.5.0

PXE booting via iPXE and observing that the DHCP IP changes on every boot. This is undesired.

I tracked this down to the DHCP client identifier changing on every boot. This is because networkd now defaults to ClientIdentifier=duid (see http://man7.org/linux/man-pages/man5/systemd.network.5.html)

This might make sense when booting from disk - but when PXE booting is detected CoreOS should change that parameter to ClientIdentifer=mac before starting the network. This will follow the PXE RFC for IPv4 client id generation. Not sure if networkd can do it, but for DHCPv6 the ClientIdentifier should be a type 3 DUID-LL for PXE.

This is similar to #360, but the solution 'use ignition' seems unworkable since the only way to get the ignition settings into a PXE environment is after the network has been started, which is too late to change the dhcp settings.

For reference, here is a DHCP trace of what is happening now:

System boot rom fetching iPXE:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION:  97 ( 17) UUID/GUID                 00805439cf14bae3 ..T9....
                        1197a8a0481ca1f7 ....H...
                        fe               .
OPTION:  94 (  3) Client NDI                010310           ...
OPTION:  93 (  2) Client System             0007             ..
OPTION:  60 ( 32) Vendor class identifier   PXEClient:Arch:00007:UNDI:003016

iPXE fetching CoreOS:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION: 175 ( 27) ???                       b105018086153aeb ......:.
                        0301000017010124 .......$
                        0101130101270101 .....'..
                        150101           ...
OPTION:  61 (  7) Client-identifier         01:a0:48:1c:a1:f7:fe
OPTION:  97 ( 17) UUID/GUID                 00805439cf14bae3 ..T9....
                        1197a8a0481ca1f7 ....H...
                        fe               .

CoreOS Booted:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION:  53 (  1) DHCP message type         1 (DHCPDISCOVER)
OPTION:  61 ( 19) Client-identifier         ff:b6:22:0f:eb:00:02:00:00:ab:11:e7:91:1f:ab:08:35:0f:50
@mischief

This comment has been minimized.

mischief commented Jun 30, 2016

looks like your client mac address is consistent - why don't you use that instead?

@jgunthorpe

This comment has been minimized.

jgunthorpe commented Jun 30, 2016

How do you mean?

A new version of ISC dhcp (that I am not running) has an option to totally ignore the client-id, but it is not really a good idea.

This is a core os bug because the various RFCs specify the client-id should be a MAC when PXE booting, and frown on randomizing the DUID on every boot.

@mischief

This comment has been minimized.

mischief commented Jun 30, 2016

when you pxe boot, you could set systemd.machine_id= on the kernel commandline. that should give you a stable client identifier, but will require you pre-generate a unique machine id for each pxe host (rather than have it be generated on-system).

@jgunthorpe

This comment has been minimized.

jgunthorpe commented Jun 30, 2016

I could do that, however that means every PXE boot still consumes two IP addresses from the pool (sub optimal if I have a lot of machines) and it doesn't really help make PXE booting a fully functional option in CoreOS.

Better would be to add a coreos.pxe=1 kernel command line that causes the ClientIdentifier to be switched to MAC as I described. Then all PXE users will have a solid fix.

Why the reluctance to see this as a coreos bug? This is probably a regression as things would have worked fine for PXE before networkd changed the default to duid.

@crawford

This comment has been minimized.

Member

crawford commented Jun 30, 2016

It should be easy enough to put together a generator that adjusts the network config depending on the OEM ID ("pxe" in this case). We'll probably need it in the future for other providers cough DigitialOcean cough.

@jgunthorpe

This comment has been minimized.

jgunthorpe commented Jul 4, 2016

Just to clarify, with the two patches dm0 just made, should coreos.oem=pxe be set on the kernel command line when PXE booting?

Does https://coreos.com/os/docs/latest/booting-with-pxe.html need an update too?

@jgunthorpe

This comment has been minimized.

jgunthorpe commented Jul 4, 2016

.. and for others stumbling across this, the systemd.machine_id kernel parameter isn't supported until systemd v229, which is newer than what coreos stable has today.

@crawford crawford added this to the CoreOS 1109.0.0 milestone Jul 5, 2016

@crawford

This comment has been minimized.

Member

crawford commented Jul 5, 2016

Just to clarify, with the two patches dm0 just made, should coreos.oem=pxe be set on the kernel command line when PXE booting?

Given the current implementation, yes they would have to be set. This is not desirable though, so we will be sure to fix it up so that it's not necessary.

dm0- added a commit to dm0-/init that referenced this issue Jul 5, 2016

network: set ClientIdentifier=mac for the PXE OEM
Follow Ignition's convention of defaulting to the PXE OEM.  It
currently ignores the case where users define coreos.oem.id=pxe
on their kernel command-line manually.

This fixes coreos/bugs#1432.
@dm0-

This comment has been minimized.

Member

dm0- commented Jul 5, 2016

I've updated the pull requests to follow Ignition's behavior of using PXE when no OEM is given on the kernel command-line.

@crawford

This comment has been minimized.

Member

crawford commented Jul 5, 2016

LGTM

mischief added a commit to mischief/init that referenced this issue Jul 12, 2016

network: set ClientIdentifier=mac for the PXE OEM
Follow Ignition's convention of defaulting to the PXE OEM.  It
currently ignores the case where users define coreos.oem.id=pxe
on their kernel command-line manually.

This fixes coreos/bugs#1432.
@jgunthorpe

This comment has been minimized.

jgunthorpe commented Jul 20, 2016

@dm0- this causes a behaviour change in my non-PXE bare metal machines, they don't have a coreos.oem.id kernel command line parameter, so they change away from machine id mode to mac mode..

It would be nice if this stopped changing. Honestly, I'd just permanently set it to mac for everything. I think the target environment for the DUID mode is something like a laptop with docking stations or other variable hardware that just doesn't seem to be the focus for coreos.

@crawford

This comment has been minimized.

Member

crawford commented Jul 26, 2016

@jgunthorpe sorry about that. We are going to change it one more time (back to the original behavior). Instead of checking the OEM ID, it's more accurate to check if the root kernel parameter is specified. In your case, it is, so the MAC address will not be used as the DHCP client identifier.

@jgunthorpe

This comment has been minimized.

jgunthorpe commented Jul 26, 2016

Thanks, I still think you should seriously consider not using DUID mode at all for CoreOS.

@crawford crawford closed this Jul 26, 2016

@crawford

This comment has been minimized.

Member

crawford commented Jul 26, 2016

This has been cleaned up in the latest PRs.

@redbaron

This comment has been minimized.

redbaron commented Nov 14, 2017

Stumbled upon this and it is not clear for me what is a recommended way to make it work in conjunction with matchbox.

Current situation in stable 1520.8.0 is following:

yy-pxe.network:

...
[DHCP]
ClientIdentifier=mac
UseMTU=true
UseDomains=true

zz-default.network:

...
[DHCP]
UseMTU=true
UseDomains=true

What happens in my case, PXE starts, downloads Ignition config from matchbox and calls coreos-install. If Igntion config is templated with values of the {net0/ip:ipv4}, then it become invalid on a subsequent boot :( If Container Linux continues to use DUID, then this quirk at least should be documented in a PXE booting docs.

@bgilbert

This comment has been minimized.

Member

bgilbert commented Nov 15, 2017

@redbaron It sounds as though you're having a slightly different issue: one client ID is used by the PXE-booted system that installs Container Linux, and a different one is used by the installed system after rebooting (since the latter won't use the PXE config). If so, could you open a new bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment