Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network interface name differs between Fedora CoreOS and RedHat CoreOS #484

Closed
thekoma opened this issue May 19, 2020 · 29 comments · Fixed by coreos/fedora-coreos-config#491
Labels

Comments

@thekoma
Copy link

thekoma commented May 19, 2020

I've noticed that Fedora coreos use the old eth* standard while RedHat CoreOS use the net.ifnames=1 naming.
I.E. on vmware FedoraCOS expose eth0 while RHCOS expose ens192 .

@dustymabe dustymabe added the meeting topics for meetings label May 19, 2020
@lucab
Copy link
Contributor

lucab commented May 19, 2020

@dustymabe before going for a meeting, I think we should investigate/document why the Fedora behavior is different and how we are defaulting to legacy interface names.

@thekoma
Copy link
Author

thekoma commented May 19, 2020

@dustymabe
Copy link
Member

@dustymabe before going for a meeting, I think we should investigate/document why the Fedora behavior is different and how we are defaulting to legacy interface names.

ok will hold on the meeting discussion pending some investigation

@dustymabe
Copy link
Member

dustymabe commented May 20, 2020

I've found this:
coreos/coreos-installer@8ac0f42#diff-23c9551d9d97b795acef2cacbdd4248b

hey @Koma-Andrea - that particular piece of code is just there to say "if user added a net.ifnames arg to the kernel command line of the install then we'll forward them to the first boot (ignition boot)". It requires the user to set it, so that is not specifically what is causing there to be a difference between FCOS and RHCOS.

@elemental-lf
Copy link

This is actually a quite serious problem in my opinion and I'd classify this as a bug. I've got HP servers with six network interfaces (4 built-in, two extra) and the order is not deterministic and can vary from boot to boot. I tried setting net.ifnames=1 and I also tried if installing the initscripts RPM would make any difference because it contains /usr/lib/udev/rules.d/60-net.rules which calls /usr/lib/udev/rename_device. But both attempts didn't yield the result I hoped for.

So, how can I get back predictable interface names which are standard for a "normal" Fedora install since Fedora 19? I'll look into this myself in the coming days but maybe someone on here has a hint.

@lucab
Copy link
Contributor

lucab commented May 26, 2020

For some reason, it looks like we are missing 99-default.link:

$ rpm -ql systemd-udev | grep 99-default.link
/usr/lib/systemd/network/99-default.link
$ stat /usr/lib/systemd/network/99-default.link
stat: cannot stat '/usr/lib/systemd/network/99-default.link': No such file or directory

Most notably, it is not present in the initramfs. This means that an interface appears at boot, systemd and udev are aware of its persistent name, but there is nothing which performs the renaming. Thus, NM starts configuring it with the legacy name.
Example from a qemu VM initramfs journal:

kernel: e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
kernel: e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
systemd-udevd[473]: Using default interface naming scheme 'v245'.
systemd[1]: Finished udev Wait for Complete Device Initialization.
[...]
NetworkManager[548]: <info>  [1590494796.6892] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)

@lucab lucab added needs/investigation and removed meeting topics for meetings labels May 26, 2020
@lucab
Copy link
Contributor

lucab commented May 26, 2020

It turns out coreos/fedora-coreos-config#129 is the one that dropped it via:

remove-from-packages:
[...]
  - [systemd-udev, /usr/lib/systemd/network/.*]

Indeed, building a next image but leaving that config entry out results in:

kernel: e1000 0000:00:03.0 ens3: renamed from eth0
[...]
NetworkManager[545]: <info>  [1590497347.5377] manager: (ens3): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)

@lucab
Copy link
Contributor

lucab commented May 26, 2020

For reference, fixing this for new nodes is likely a single-line patch. However there is a larger story around auto-upgrades flipping the netnames for existing machines. Let's take some time to brainstorm and see if we can come up with a safer update path.

@dustymabe
Copy link
Member

Thanks for the investigation Luca! sounds like something good to discuss on wednesday at our community meeting?

@bgilbert
Copy link
Contributor

@lucab What is reading that file if not systemd-networkd?

@lucab
Copy link
Contributor

lucab commented May 27, 2020

@bgilbert the same question/comment was raised by @jlebon too. After a short code-digging trip, I found that udev has this logic which sources link units.

To be fair, this is stated in the first sentence of the user docs on link units at https://www.freedesktop.org/software/systemd/man/systemd.link.html:

A plain ini-style text file that encodes configuration for matching network devices, used by systemd-udev(8) and in particular its net_setup_link builtin.

@lucab lucab added the meeting topics for meetings label May 27, 2020
@jlebon
Copy link
Member

jlebon commented Jun 1, 2020

We discussed this in the community meeting last week. This might be as easy as adding a barrier with a script which adds net.ifnames=0 to the kernel cmdline. This is similar to what we did for cgroupsv1. The difference here is that we only want to affect upgraded nodes. So this would be something like:

  • Add a barrier release which adds net.ifnames=0 to the running node. We could technically only release the update for this and not bump the stream metadata.
  • Restore the udev link files.

@dustymabe
Copy link
Member

So the thoughts here are that existing nodes keep their existing behavior, but new nodes get the new behavior? If true, I do think it's the best path forward but it will lead to some confusion. We should try to do our best to document the problem (documentation FAQ entry?) and also raise awareness that a change in behavior is coming.

@cgwalters
Copy link
Member

Ugh...this is a mess indeed. That's really really unfortunate that we managed to break the default NIC naming scheme 😦

@dustymabe
Copy link
Member

We discussed this during the meeting today:

* AGREED: For this issue we'd like to correct the behavior but realize
    it would be a disruption for existing nodes that upgrade. We propose
    to add a barrier node in our update graph that will keep behavior
    for updating nodes the same, while fixing the bug and causing new
    behavior for newly installed nodes.  (dustymabe, 17:26:08)

As for when we should make the change the current proposal is that we wait for all the Fedora 32 changes to propagate out (since there is a lot going on there) and then properly implement this change and communicate it to users.

@dustymabe dustymabe added jira for syncing to jira and removed meeting topics for meetings labels Jun 3, 2020
elemental-lf added a commit to elemental-lf/fedora-coreos-config that referenced this issue Jun 5, 2020
@tkarls
Copy link

tkarls commented Jun 24, 2020

I do agree on your proposed plan to roll this out as an upgrade that only affect new nodes. However, I am in the process of starting a new deployment now and I would really really want the predictable network naming from start.

Is there anything I can do in my ignition file today that will turn this feature on?

@dustymabe dustymabe reopened this Jun 25, 2020
dustymabe added a commit to dustymabe/fedora-coreos-streams that referenced this issue Jun 26, 2020
This release bumps the package set and also restores persistent NIC
naming. See the following issue for context:

coreos/fedora-coreos-tracker#484
dustymabe added a commit to coreos/fedora-coreos-streams that referenced this issue Jun 26, 2020
This release bumps the package set and also restores persistent NIC
naming. See the following issue for context:

coreos/fedora-coreos-tracker#484
@dustymabe
Copy link
Member

The barrier release (the release that adds net.ifnames=0 to kargs) for the testing stream is rolling out now: 32.20200615.2.2

@dustymabe
Copy link
Member

The release that fixes this for the next stream is rolling out now: 32.20200625.1.0

@dustymabe
Copy link
Member

The release that fixes this for the next stream is rolling out now: 32.20200625.1.0

@tkarls FYI ^^ - If you want to do a proof of concept go grab the latest next stream artifacts from the website.

dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jun 26, 2020
Now that we are using persistent NIC naming in FCOS we'll update the
test to also use those interface names. See:
coreos/fedora-coreos-tracker#484
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jun 26, 2020
Now that we are using persistent NIC naming in FCOS we'll update the
test to also use those interface names. See:
coreos/fedora-coreos-tracker#484

Also add a test that makes sure that net.ifnames=0 works as desired.
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Jun 26, 2020
Now that we are using persistent NIC naming in FCOS we'll update the
test to also use those interface names. See:
coreos/fedora-coreos-tracker#484

Also add a test that makes sure that net.ifnames=0 works as desired.
elemental-lf added a commit to elemental-lf/fedora-coreos-config that referenced this issue Jun 29, 2020
travier added a commit to travier/fedora-coreos-streams that referenced this issue Jun 30, 2020
This a barrier release for coreos/fedora-coreos-tracker#484
This also adds the GCP stream.
travier added a commit to travier/fedora-coreos-streams that referenced this issue Jun 30, 2020
This a barrier release for coreos/fedora-coreos-tracker#484
This also adds the GCP Cloud launchable stream.
travier added a commit to travier/fedora-coreos-streams that referenced this issue Jun 30, 2020
This a barrier release for coreos/fedora-coreos-tracker#484
This also adds the GCP Cloud launchable stream.
travier added a commit to travier/fedora-coreos-streams that referenced this issue Jun 30, 2020
This a barrier release for coreos/fedora-coreos-tracker#484
This also adds the GCP Cloud launchable stream.
travier added a commit to travier/fedora-coreos-streams that referenced this issue Jun 30, 2020
This a barrier release for coreos/fedora-coreos-tracker#484
This also adds the GCP Cloud launchable stream.
travier added a commit to travier/fedora-coreos-streams that referenced this issue Jun 30, 2020
This a barrier release for coreos/fedora-coreos-tracker#484
This also adds the GCP Cloud launchable stream.
travier added a commit to coreos/fedora-coreos-streams that referenced this issue Jun 30, 2020
This a barrier release for coreos/fedora-coreos-tracker#484
This also adds the GCP Cloud launchable stream.
@lucab
Copy link
Contributor

lucab commented Jul 1, 2020

Barriers to pin the legacy names are now in place on all the streams:

  • stable - 32.20200615.3.0
  • testing - 32.20200615.2.2
  • next - 32.20200615.1.3

@lucab lucab added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed meeting topics for meetings labels Jul 1, 2020
@dustymabe
Copy link
Member

The fix for this went into next stream release 32.20200625.1.0. Please try out the new release and report issues.

The fix for this went into testing stream release 32.20200629.2.0. Please try out the new release and report issues.

@dustymabe
Copy link
Member

The fix for this went into stable stream release 32.20200629.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants