Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

DHCP IP changes on every boot when using PXE #1432

Closed
jgunthorpe opened this issue Jun 30, 2016 · 16 comments · Fixed by coreos/init#209
Closed

DHCP IP changes on every boot when using PXE #1432

jgunthorpe opened this issue Jun 30, 2016 · 16 comments · Fixed by coreos/init#209

Comments

@jgunthorpe
Copy link

jgunthorpe commented Jun 30, 2016

Using coreos 1010.5.0

PXE booting via iPXE and observing that the DHCP IP changes on every boot. This is undesired.

I tracked this down to the DHCP client identifier changing on every boot. This is because networkd now defaults to ClientIdentifier=duid (see http://man7.org/linux/man-pages/man5/systemd.network.5.html)

This might make sense when booting from disk - but when PXE booting is detected CoreOS should change that parameter to ClientIdentifer=mac before starting the network. This will follow the PXE RFC for IPv4 client id generation. Not sure if networkd can do it, but for DHCPv6 the ClientIdentifier should be a type 3 DUID-LL for PXE.

This is similar to #360, but the solution 'use ignition' seems unworkable since the only way to get the ignition settings into a PXE environment is after the network has been started, which is too late to change the dhcp settings.

For reference, here is a DHCP trace of what is happening now:

System boot rom fetching iPXE:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION:  97 ( 17) UUID/GUID                 00805439cf14bae3 ..T9....
                        1197a8a0481ca1f7 ....H...
                        fe               .
OPTION:  94 (  3) Client NDI                010310           ...
OPTION:  93 (  2) Client System             0007             ..
OPTION:  60 ( 32) Vendor class identifier   PXEClient:Arch:00007:UNDI:003016

iPXE fetching CoreOS:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION: 175 ( 27) ???                       b105018086153aeb ......:.
                        0301000017010124 .......$
                        0101130101270101 .....'..
                        150101           ...
OPTION:  61 (  7) Client-identifier         01:a0:48:1c:a1:f7:fe
OPTION:  97 ( 17) UUID/GUID                 00805439cf14bae3 ..T9....
                        1197a8a0481ca1f7 ....H...
                        fe               .

CoreOS Booted:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION:  53 (  1) DHCP message type         1 (DHCPDISCOVER)
OPTION:  61 ( 19) Client-identifier         ff:b6:22:0f:eb:00:02:00:00:ab:11:e7:91:1f:ab:08:35:0f:50
@mischief
Copy link

looks like your client mac address is consistent - why don't you use that instead?

@jgunthorpe
Copy link
Author

How do you mean?

A new version of ISC dhcp (that I am not running) has an option to totally ignore the client-id, but it is not really a good idea.

This is a core os bug because the various RFCs specify the client-id should be a MAC when PXE booting, and frown on randomizing the DUID on every boot.

@mischief
Copy link

when you pxe boot, you could set systemd.machine_id= on the kernel commandline. that should give you a stable client identifier, but will require you pre-generate a unique machine id for each pxe host (rather than have it be generated on-system).

@jgunthorpe
Copy link
Author

I could do that, however that means every PXE boot still consumes two IP addresses from the pool (sub optimal if I have a lot of machines) and it doesn't really help make PXE booting a fully functional option in CoreOS.

Better would be to add a coreos.pxe=1 kernel command line that causes the ClientIdentifier to be switched to MAC as I described. Then all PXE users will have a solid fix.

Why the reluctance to see this as a coreos bug? This is probably a regression as things would have worked fine for PXE before networkd changed the default to duid.

@crawford
Copy link
Contributor

It should be easy enough to put together a generator that adjusts the network config depending on the OEM ID ("pxe" in this case). We'll probably need it in the future for other providers cough DigitialOcean cough.

@jgunthorpe
Copy link
Author

Just to clarify, with the two patches dm0 just made, should coreos.oem=pxe be set on the kernel command line when PXE booting?

Does https://coreos.com/os/docs/latest/booting-with-pxe.html need an update too?

@jgunthorpe
Copy link
Author

.. and for others stumbling across this, the systemd.machine_id kernel parameter isn't supported until systemd v229, which is newer than what coreos stable has today.

@crawford crawford added this to the CoreOS 1109.0.0 milestone Jul 5, 2016
@crawford
Copy link
Contributor

crawford commented Jul 5, 2016

Just to clarify, with the two patches dm0 just made, should coreos.oem=pxe be set on the kernel command line when PXE booting?

Given the current implementation, yes they would have to be set. This is not desirable though, so we will be sure to fix it up so that it's not necessary.

@dm0-
Copy link

dm0- commented Jul 5, 2016

I've updated the pull requests to follow Ignition's behavior of using PXE when no OEM is given on the kernel command-line.

@crawford
Copy link
Contributor

crawford commented Jul 5, 2016

LGTM

mischief pushed a commit to mischief/init that referenced this issue Jul 12, 2016
Follow Ignition's convention of defaulting to the PXE OEM.  It
currently ignores the case where users define coreos.oem.id=pxe
on their kernel command-line manually.

This fixes coreos/bugs#1432.
@jgunthorpe
Copy link
Author

@dm0- this causes a behaviour change in my non-PXE bare metal machines, they don't have a coreos.oem.id kernel command line parameter, so they change away from machine id mode to mac mode..

It would be nice if this stopped changing. Honestly, I'd just permanently set it to mac for everything. I think the target environment for the DUID mode is something like a laptop with docking stations or other variable hardware that just doesn't seem to be the focus for coreos.

@crawford
Copy link
Contributor

@jgunthorpe sorry about that. We are going to change it one more time (back to the original behavior). Instead of checking the OEM ID, it's more accurate to check if the root kernel parameter is specified. In your case, it is, so the MAC address will not be used as the DHCP client identifier.

@jgunthorpe
Copy link
Author

Thanks, I still think you should seriously consider not using DUID mode at all for CoreOS.

@crawford
Copy link
Contributor

This has been cleaned up in the latest PRs.

@redbaron
Copy link

Stumbled upon this and it is not clear for me what is a recommended way to make it work in conjunction with matchbox.

Current situation in stable 1520.8.0 is following:

yy-pxe.network:

...
[DHCP]
ClientIdentifier=mac
UseMTU=true
UseDomains=true

zz-default.network:

...
[DHCP]
UseMTU=true
UseDomains=true

What happens in my case, PXE starts, downloads Ignition config from matchbox and calls coreos-install. If Igntion config is templated with values of the {net0/ip:ipv4}, then it become invalid on a subsequent boot :( If Container Linux continues to use DUID, then this quirk at least should be documented in a PXE booting docs.

@bgilbert
Copy link
Contributor

@redbaron It sounds as though you're having a slightly different issue: one client ID is used by the PXE-booted system that installs Container Linux, and a different one is used by the installed system after rebooting (since the latter won't use the PXE config). If so, could you open a new bug?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants