New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmware: persistent network interface name changed on upgrade #2437

Closed
philosifer opened this Issue May 24, 2018 · 30 comments

Comments

Projects
None yet
7 participants
@philosifer

philosifer commented May 24, 2018

Issue Report

Auto upgrade from 1688.5.3 to 1745.3.1 has changed the network interface name from ens192 to enp11s0 on all my vmware based systems. As a consequence the IP addresses went to dhcp and changed and flannel also then failed to find the interface.

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1745.3.1
VERSION_ID=1745.3.1
BUILD_ID=2018-05-23-0922
PRETTY_NAME="Container Linux by CoreOS 1745.3.1 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

Bare-metal kubernetes and etcd running on vmware. VMware is version 6.5 deployed using vmware cloud config.

Expected Behavior

Stable upgrades shouldn't change the adapter name.

Actual Behavior

Interface name changed

Reproduction Steps

Not reproduced but the same thing has happened on 7 VMs before i managed to stop the update service.

Other Information

Is there any way to downgrade again to the old version?

@philosifer

This comment has been minimized.

philosifer commented May 24, 2018

Got things working again by putting a udev rule to set it back based on the PCI address.

cat << EOF > /etc/udev/rules.d/70-persistent-net.rules
# 0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
ACTION=="add", SUBSYSTEM=="net", KERNELS=="0000:0b:00.0", NAME:="ens192"
EOF

@philosifer philosifer changed the title from container linux stable upgraded changed network adapter name to container linux stable upgrade changed network adapter name May 24, 2018

@lucab

This comment has been minimized.

Member

lucab commented May 24, 2018

Question for the other folks in here (@zeeZ, @richardmoe, @nrk-msa, @gcyre): is this only happening on vmware instances or elsewhere too?

@gcyre

This comment has been minimized.

gcyre commented May 24, 2018

for us its just vmware instances

@lucab

This comment has been minimized.

Member

lucab commented May 24, 2018

@philosifer thanks for the report. Do you still have some nodes on the previous stable? If so can you please attach the output of udevadm test /sys/class/net/<NAME> for both the current and the previous stable (without your rules fix)?

@lucab lucab changed the title from container linux stable upgrade changed network adapter name to vmware: persistent network interface name changed on upgrade May 24, 2018

@gcyre

This comment has been minimized.

gcyre commented May 24, 2018

@lucab here's the output from one of our nodes on previous stable

udevadm test /sys/class/net/ens192
calling: test
version 237
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

Load module index
Parsed configuration file /usr/lib64/systemd/network/99-default.link
Parsed configuration file /usr/lib64/systemd/network/98-virtio.link
Created link configuration context.
Reading rules file: /usr/lib64/udev/rules.d/10-dm.rules
Reading rules file: /usr/lib64/udev/rules.d/11-dm-lvm.rules
Reading rules file: /usr/lib64/udev/rules.d/11-dm-mpath.rules
Reading rules file: /usr/lib64/udev/rules.d/13-dm-disk.rules
Reading rules file: /usr/lib64/udev/rules.d/50-udev-default.rules
Reading rules file: /usr/lib64/udev/rules.d/56-multipath.rules
Reading rules file: /usr/lib64/udev/rules.d/60-block.rules
Reading rules file: /usr/lib64/udev/rules.d/60-cdrom_id.rules
Reading rules file: /usr/lib64/udev/rules.d/60-drm.rules
Reading rules file: /usr/lib64/udev/rules.d/60-evdev.rules
Reading rules file: /usr/lib64/udev/rules.d/60-input-id.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-alsa.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-input.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-storage-tape.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-storage.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-v4l.rules
Reading rules file: /usr/lib64/udev/rules.d/60-sensor.rules
Reading rules file: /usr/lib64/udev/rules.d/60-serial.rules
Reading rules file: /usr/lib64/udev/rules.d/61-trousers.rules
Reading rules file: /usr/lib64/udev/rules.d/63-md-raid-arrays.rules
Reading rules file: /usr/lib64/udev/rules.d/64-btrfs-dm.rules
Reading rules file: /usr/lib64/udev/rules.d/64-btrfs.rules
Reading rules file: /usr/lib64/udev/rules.d/64-md-raid-assembly.rules
Reading rules file: /usr/lib64/udev/rules.d/65-coreos-kvm.rules
Reading rules file: /usr/lib64/udev/rules.d/66-azure-storage.rules
Reading rules file: /usr/lib64/udev/rules.d/66-kpartx.rules
Reading rules file: /usr/lib64/udev/rules.d/69-dm-lvm-metad.rules
Reading rules file: /usr/lib64/udev/rules.d/70-joystick.rules
Reading rules file: /usr/lib64/udev/rules.d/70-mouse.rules
Reading rules file: /usr/lib64/udev/rules.d/70-power-switch.rules
Reading rules file: /usr/lib64/udev/rules.d/70-touchpad.rules
Reading rules file: /usr/lib64/udev/rules.d/71-seat.rules
Reading rules file: /usr/lib64/udev/rules.d/73-seat-late.rules
Reading rules file: /usr/lib64/udev/rules.d/75-net-description.rules
Reading rules file: /usr/lib64/udev/rules.d/75-probe_mtd.rules
Reading rules file: /usr/lib64/udev/rules.d/78-sound-card.rules
Reading rules file: /usr/lib64/udev/rules.d/79-net-google-compat.rules
Reading rules file: /usr/lib64/udev/rules.d/80-drivers.rules
Reading rules file: /usr/lib64/udev/rules.d/80-net-setup-link.rules
Reading rules file: /usr/lib64/udev/rules.d/90-cloud-storage.rules
Reading rules file: /usr/lib64/udev/rules.d/90-configdrive.rules
Reading rules file: /usr/lib64/udev/rules.d/90-issuegen.rules
Reading rules file: /usr/lib64/udev/rules.d/90-ovfenv.rules
Reading rules file: /usr/lib64/udev/rules.d/90-vconsole.rules
Reading rules file: /usr/lib64/udev/rules.d/90-virtfs-metadata.rules
Reading rules file: /usr/lib64/udev/rules.d/95-dm-notify.rules
Reading rules file: /usr/lib64/udev/rules.d/99-azure-product-uuid.rules
Reading rules file: /usr/lib64/udev/rules.d/99-systemd.rules
rules contain 49152 bytes tokens (4096 * 12 bytes), 15132 bytes strings
2142 strings (28429 bytes), 1424 de-duplicated (14016 bytes), 719 trie nodes used
IMPORT builtin 'net_id' /usr/lib64/udev/rules.d/75-net-description.rules:6
IMPORT builtin 'hwdb' /usr/lib64/udev/rules.d/75-net-description.rules:12
IMPORT builtin 'path_id' /usr/lib64/udev/rules.d/80-net-setup-link.rules:5
IMPORT builtin 'net_setup_link' /usr/lib64/udev/rules.d/80-net-setup-link.rules:9
Config file /usr/lib64/systemd/network/99-default.link applies to device ens192
link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
RUN '/usr/lib/coreos/issuegen add $env{INTERFACE}' /usr/lib64/udev/rules.d/90-issuegen.rules:1
RUN '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/$name --prefix=/net/ipv4/neigh/$name --prefix=/net/ipv6/conf/$name --prefix=/net/ipv6/neigh/$name' /usr/lib64/udev/rules.d/99-systemd.rules:60
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:16.0/0000:0b:00.0/net/ens192
ID_BUS=pci
ID_MODEL_FROM_DATABASE=VMXNET3 Ethernet Controller
ID_MODEL_ID=0x07b0
ID_NET_DRIVER=vmxnet3
ID_NET_LINK_FILE=/usr/lib64/systemd/network/99-default.link
ID_NET_NAME_MAC=enx0050569838c8
ID_NET_NAME_PATH=enp11s0
ID_NET_NAME_SLOT=ens192
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_PATH=pci-0000:0b:00.0
ID_PATH_TAG=pci-0000_0b_00_0
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=VMware
ID_VENDOR_ID=0x15ad
IFINDEX=2
INTERFACE=ens192
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens192
TAGS=:systemd:
USEC_INITIALIZED=23788204
run: '/usr/lib/coreos/issuegen add ens192'
run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/ens192 --prefix=/net/ipv4/neigh/ens192 --prefix=/net/ipv6/conf/ens192 --prefix=/net/ipv6/neigh/ens192'
Unload module index
Unloaded link configuration context.
@gcyre

This comment has been minimized.

gcyre commented May 24, 2018

and new version

devadm test /sys/class/net/enp11s0
calling: test
version 238
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

Load module index
Parsed configuration file /usr/lib64/systemd/network/99-default.link
Parsed configuration file /usr/lib64/systemd/network/98-virtio.link
Created link configuration context.
Reading rules file: /usr/lib64/udev/rules.d/10-dm.rules
Reading rules file: /usr/lib64/udev/rules.d/11-dm-lvm.rules
Reading rules file: /usr/lib64/udev/rules.d/11-dm-mpath.rules
Reading rules file: /usr/lib64/udev/rules.d/13-dm-disk.rules
Reading rules file: /usr/lib64/udev/rules.d/50-udev-default.rules
Reading rules file: /usr/lib64/udev/rules.d/56-multipath.rules
Reading rules file: /usr/lib64/udev/rules.d/60-block.rules
Reading rules file: /usr/lib64/udev/rules.d/60-cdrom_id.rules
Reading rules file: /usr/lib64/udev/rules.d/60-drm.rules
Reading rules file: /usr/lib64/udev/rules.d/60-evdev.rules
Reading rules file: /usr/lib64/udev/rules.d/60-input-id.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-alsa.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-input.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-storage-tape.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-storage.rules
Reading rules file: /usr/lib64/udev/rules.d/60-persistent-v4l.rules
Reading rules file: /usr/lib64/udev/rules.d/60-sensor.rules
Reading rules file: /usr/lib64/udev/rules.d/60-serial.rules
Reading rules file: /usr/lib64/udev/rules.d/61-trousers.rules
Reading rules file: /usr/lib64/udev/rules.d/63-md-raid-arrays.rules
Reading rules file: /usr/lib64/udev/rules.d/64-btrfs-dm.rules
Reading rules file: /usr/lib64/udev/rules.d/64-btrfs.rules
Reading rules file: /usr/lib64/udev/rules.d/64-md-raid-assembly.rules
Reading rules file: /usr/lib64/udev/rules.d/65-coreos-kvm.rules
Reading rules file: /usr/lib64/udev/rules.d/66-azure-storage.rules
Reading rules file: /usr/lib64/udev/rules.d/66-kpartx.rules
Reading rules file: /usr/lib64/udev/rules.d/69-dm-lvm-metad.rules
Reading rules file: /usr/lib64/udev/rules.d/70-joystick.rules
Reading rules file: /usr/lib64/udev/rules.d/70-mouse.rules
Reading rules file: /usr/lib64/udev/rules.d/70-power-switch.rules
Reading rules file: /usr/lib64/udev/rules.d/70-touchpad.rules
Reading rules file: /usr/lib64/udev/rules.d/71-seat.rules
Reading rules file: /usr/lib64/udev/rules.d/73-seat-late.rules
Reading rules file: /usr/lib64/udev/rules.d/75-net-description.rules
Reading rules file: /usr/lib64/udev/rules.d/75-probe_mtd.rules
Reading rules file: /usr/lib64/udev/rules.d/78-sound-card.rules
Reading rules file: /usr/lib64/udev/rules.d/79-net-google-compat.rules
Reading rules file: /usr/lib64/udev/rules.d/80-drivers.rules
Reading rules file: /usr/lib64/udev/rules.d/80-net-setup-link.rules
Reading rules file: /usr/lib64/udev/rules.d/90-cloud-storage.rules
Reading rules file: /usr/lib64/udev/rules.d/90-configdrive.rules
Reading rules file: /usr/lib64/udev/rules.d/90-issuegen.rules
Reading rules file: /usr/lib64/udev/rules.d/90-ovfenv.rules
Reading rules file: /usr/lib64/udev/rules.d/90-vconsole.rules
Reading rules file: /usr/lib64/udev/rules.d/90-virtfs-metadata.rules
Reading rules file: /usr/lib64/udev/rules.d/95-dm-notify.rules
Reading rules file: /usr/lib64/udev/rules.d/99-azure-product-uuid.rules
Reading rules file: /usr/lib64/udev/rules.d/99-systemd.rules
rules contain 49152 bytes tokens (4096 * 12 bytes), 15132 bytes strings
2145 strings (28452 bytes), 1427 de-duplicated (14039 bytes), 719 trie nodes used
IMPORT builtin 'net_id' /usr/lib64/udev/rules.d/75-net-description.rules:6
IMPORT builtin 'hwdb' /usr/lib64/udev/rules.d/75-net-description.rules:12
IMPORT builtin 'path_id' /usr/lib64/udev/rules.d/80-net-setup-link.rules:5
IMPORT builtin 'net_setup_link' /usr/lib64/udev/rules.d/80-net-setup-link.rules:9
Config file /usr/lib64/systemd/network/99-default.link applies to device enp11s0
link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
RUN '/usr/lib/coreos/issuegen add $env{INTERFACE}' /usr/lib64/udev/rules.d/90-issuegen.rules:1
RUN '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/$name --prefix=/net/ipv4/neigh/$name --prefix=/net/ipv6/conf/$name --prefix=/net/ipv6/neigh/$name' /usr/lib64/udev/rules.d/99-systemd.rules:60
ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:16.0/0000:0b:00.0/net/enp11s0
ID_BUS=pci
ID_MODEL_FROM_DATABASE=VMXNET3 Ethernet Controller
ID_MODEL_ID=0x07b0
ID_NET_DRIVER=vmxnet3
ID_NET_LINK_FILE=/usr/lib64/systemd/network/99-default.link
ID_NET_NAME_MAC=enx005056989be0
ID_NET_NAME_PATH=enp11s0
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_PATH=pci-0000:0b:00.0
ID_PATH_TAG=pci-0000_0b_00_0
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=VMware
ID_VENDOR_ID=0x15ad
IFINDEX=2
INTERFACE=enp11s0
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/enp11s0
TAGS=:systemd:
USEC_INITIALIZED=9711843
run: '/usr/lib/coreos/issuegen add enp11s0'
run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/enp11s0 --prefix=/net/ipv4/neigh/enp11s0 --prefix=/net/ipv6/conf/enp11s0 --prefix=/net/ipv6/neigh/enp11s0'
Unload module index
Unloaded link configuration context.
@philosifer

This comment has been minimized.

philosifer commented May 24, 2018

gcyre beat me to it but i've stopped update-engine for now on one of my etcd nodes so if you want my output as well let me know

@bgilbert

This comment has been minimized.

Member

bgilbert commented May 24, 2018

The obvious difference between the dumps is that ID_NET_NAME_SLOT=ens192 is missing from the new one.

@bgilbert

This comment has been minimized.

Member

bgilbert commented May 24, 2018

Looks like the issue is probably systemd/systemd#8446, fixed by systemd/systemd#8458.

@zeeZ

This comment has been minimized.

zeeZ commented May 24, 2018

@lucab Yes vmware, after upgrade from an existing vmware_raw iso installation.

Since we know all MAC addresses beforehand, my workaround is putting a copy of /usr/lib/systemd/network/99-default.link next to the network configs with a MACAddress match and the NamePolicy replaced with a static Name to make sure names actually do stay the same.

@bgilbert

This comment has been minimized.

Member

bgilbert commented May 24, 2018

Confirmed that systemd/systemd#8458 fixes it.

@philosifer

This comment has been minimized.

philosifer commented May 24, 2018

@bgilbert Thanks for the update. Mine is an upgraded vmware raw iso also so that's a match for me. I've now switched one of my clusters from stable to beta so I should pick this sort of thing up before it hits production next time and I can remove my workaround easily once this fix gets into the releases.

@bgilbert

This comment has been minimized.

Member

bgilbert commented May 24, 2018

This should be fixed in alpha 1786.1.0 and stable 1745.4.0, due shortly. The next beta release will also include the fix. Thanks for the report.

@bgilbert bgilbert closed this May 24, 2018

@bgilbert

This comment has been minimized.

Member

bgilbert commented May 24, 2018

@philosifer And thanks for switching one of your clusters to beta. 😄

@scott1138

This comment has been minimized.

scott1138 commented May 25, 2018

@philosifer I hate to ask this here, but how did you get into the VM to change the network?

@philosifer

This comment has been minimized.

philosifer commented May 25, 2018

@scott1138

This comment has been minimized.

scott1138 commented May 25, 2018

@bgilbert

This comment has been minimized.

Member

bgilbert commented May 31, 2018

This should be fixed in beta 1772.2.0, due shortly.

@philosifer

This comment has been minimized.

philosifer commented Jun 1, 2018

Confirmed working fine with my workaround removed in 1772.2.0

@Savemech

This comment has been minimized.

Savemech commented Jun 1, 2018

May that sounds crazy, but could team stop rolling that version causing problems?

@scott1138

This comment has been minimized.

scott1138 commented Jun 1, 2018

FYI - if you don't have DHCP there is a way to get into the console and roll back to the previous OS version

@Savemech

This comment has been minimized.

Savemech commented Jun 1, 2018

@bgilbert

This comment has been minimized.

Member

bgilbert commented Jun 1, 2018

@Savemech We did stop rolling stable until it was fixed. We didn't stop beta, since the previous beta also had the problem. As of today, all three channels include the fix.

@Savemech

This comment has been minimized.

Savemech commented Jun 1, 2018

@bgilbert

This comment has been minimized.

Member

bgilbert commented Jun 1, 2018

@Savemech I'm sorry this bug caused so much additional work for you. By default, Container Linux will DHCP on any network interface it finds, regardless of the interface name, so typical network configurations should not have been affected. In what way do your nodes depend on the interface name? Do you have static IP address bindings, firewall rules, something else?

You can configure locksmith to allow a limited number of machines in a cluster to reboot at a time. That way, if this happened again, you'd lose only a small number of nodes and would have time to track down the problem. Unfortunately that doesn't address more subtle issues that still allow the machine to boot, but at least in that case you'd likely have shell access to fix the problem.

One way you can help avoid similar issues in the future is to run some of your nodes on the alpha or beta channels. This problem wasn't caught by our internal testing, and we didn't receive any reports about it while the change was in alpha or beta, so unfortunately we weren't aware of the problem until the bug was partially rolled out to the stable channel. If you run some nodes on alpha or beta, you can help us catch these sorts of problems early.

@scott1138

This comment has been minimized.

scott1138 commented Jun 1, 2018

@bgilbert I only have 10 nodes on vSphere, but we use static IPs. The instructions for changing some of those settings could be better. i followed a KB for disabling reboots so it wouldn't continue to affect my remaining nodes when we saw it (I had two nodes down) but it just kept rebooting anyway. I see in one of the docs about upcoming support for maintenance windows with container linux. Any chance that will be happening soon?

@bgilbert

This comment has been minimized.

Member

bgilbert commented Jun 2, 2018

@scott1138 Whoops, the maintenance window documentation wasn't updated when we added support in Container Linux Configs; thanks for pointing that out! Fixed in coreos/docs#1244. Which instructions did you follow for disabling reboots?

@scott1138

This comment has been minimized.

scott1138 commented Jun 3, 2018

@bgilbert

This comment has been minimized.

Member

bgilbert commented Jun 12, 2018

@scott1138 To confirm, you're running Tectonic, or at least the Container Linux Update Operator?

@scott1138

This comment has been minimized.

scott1138 commented Jun 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment