Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networking.vlans fails to bring up interfaces at boot (RTNETLINK answers: Network is down) #28620

Closed
yesbox opened this issue Aug 27, 2017 · 10 comments · Fixed by #44347
Closed

Comments

@yesbox
Copy link
Contributor

yesbox commented Aug 27, 2017

Issue description

VLAN interfaces created by networking.vlans no longer works properly after a reboot. The VLAN interface service fails with the error "RTNETLINK answers: Network is down". This is a regression from NixOS 17.03. I don't know when this functionality broke.

Steps to reproduce

The following was reproduced in a VM with a single interface named "ens32".
The same issue is present on different hardware.

Added to configuration.nix:

  networking = {
    vlans = {
      testvlan = { id = 10; interface = "ens32"; };
    };
  };

When switching to this config, the interface "testvlan" comes up.

After rebooting the interface is not created (when it should be):

root@nixos> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:7e:b7:7f brd ff:ff:ff:ff:ff:ff

root@nixos> systemctl status testvlan-netdev.service
● testvlan-netdev.service - Vlan Interface testvlan
   Loaded: loaded (/nix/store/hkwsjqpdcwzwr6y0kljjafw3cmalcnw3-unit-testvlan-netdev.service/testvlan-netdev.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2017-08-27 20:35:06 CEST; 33min ago
  Process: 963 ExecStopPost=/nix/store/lmbchj0zb85s7gbwr8dlkwqr75pgmajf-unit-script/bin/testvlan-netdev-post-stop (code=exited, status=0/SUCCESS)
  Process: 930 ExecStart=/nix/store/cj7d103w98rp3gws07swkmmm34dmk8s3-unit-script/bin/testvlan-netdev-start (code=exited, status=2)
 Main PID: 930 (code=exited, status=2)

Aug 27 20:35:06 nixos systemd[1]: Starting Vlan Interface testvlan...
Aug 27 20:35:06 nixos testvlan-netdev-start[930]: RTNETLINK answers: Network is down
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 27 20:35:06 nixos systemd[1]: Failed to start Vlan Interface testvlan.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Unit entered failed state.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Failed with result 'exit-code'.

root@nixos> journalctl -b | grep -i vlan                                                                                                                           ~
Aug 27 20:35:03 nixos systemd[1]: testvlan-netdev.service: Dependency Before=sys-subsystem-net-devices-testvlan.device ignored (.device units cannot be delayed)
Aug 27 20:35:06 nixos systemd[1]: Starting Vlan Interface testvlan...
Aug 27 20:35:06 nixos kernel: 8021q: 802.1Q VLAN Support v1.8
Aug 27 20:35:06 nixos testvlan-netdev-start[930]: RTNETLINK answers: Network is down
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 27 20:35:06 nixos systemd[1]: Failed to start Vlan Interface testvlan.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Unit entered failed state.
Aug 27 20:35:06 nixos systemd[1]: testvlan-netdev.service: Failed with result 'exit-code'.
Aug 27 20:35:06 nixos kernel: 8021q: adding VLAN 0 to HW filter on device ens32

Starting the service manually after having booted gets the interface up.

root@nixos> systemctl start testvlan-netdev.service

root@nixos> systemctl status testvlan-netdev.service
● testvlan-netdev.service - Vlan Interface testvlan
   Loaded: loaded (/nix/store/hkwsjqpdcwzwr6y0kljjafw3cmalcnw3-unit-testvlan-netdev.service/testvlan-netdev.service; enabled; vendor preset: enabled)
   Active: active (exited) since Sun 2017-08-27 22:03:29 CEST; 15s ago
  Process: 963 ExecStopPost=/nix/store/lmbchj0zb85s7gbwr8dlkwqr75pgmajf-unit-script/bin/testvlan-netdev-post-stop (code=exited, status=0/SUCCESS)
  Process: 5038 ExecStart=/nix/store/cj7d103w98rp3gws07swkmmm34dmk8s3-unit-script/bin/testvlan-netdev-start (code=exited, status=0/SUCCESS)
 Main PID: 5038 (code=exited, status=0/SUCCESS)

Aug 27 22:03:29 nixos systemd[1]: Starting Vlan Interface testvlan...
Aug 27 22:03:29 nixos systemd[1]: Started Vlan Interface testvlan.

root@nixos> ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:7e:b7:7f brd ff:ff:ff:ff:ff:ff
4: testvlan@ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:0c:29:7e:b7:7f brd ff:ff:ff:ff:ff:ff

Technical details

  • System: 17.09pre113138.96457d26dd
  • Nix version: 1.11.13
  • Nixpkgs version: 17.09pre113138.96457d26dd
  • Sandboxing enabled: Yes
@globin globin added this to the 17.09 milestone Aug 27, 2017
@fpletz fpletz added this to Already being worked on in Blocking Issues 17.09 Sep 10, 2017
@fpletz fpletz self-assigned this Sep 24, 2017
fpletz added a commit that referenced this issue Sep 25, 2017
@fpletz fpletz closed this as completed in 263185a Sep 25, 2017
@fpletz fpletz moved this from Already being worked on to Done in Blocking Issues 17.09 Sep 25, 2017
@yesbox
Copy link
Contributor Author

yesbox commented Sep 25, 2017

Thanks! I can confirm this fixed the issue.

@fpletz
Copy link
Member

fpletz commented Sep 25, 2017

Awesome! Thanks for testing!

@jemilsson
Copy link

jemilsson commented Jun 22, 2018

I am still experiencing this issue. I have a fairly complex networking setup with several vlan interfaces attached to a single physical interface. More often than not the system fails to bring up some of the vlan interfaces. It seems as if it tries to bring up the vlan interface before the physical interface is completely up and running.

This is what it often looks like in the journal:

Jun 22 08:23:22 brody systemd[1]: Found device Ethernet Connection I354.
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1001...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface lan-1...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1005...
Jun 22 08:23:22 brody systemd[1]: Starting Address configuration of enp0s20f0...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1000...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1004...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface management...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface wan...
Jun 22 08:23:22 brody systemd[1]: Starting Link configuration of enp0s20f0...
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1002...
Jun 22 08:23:22 brody kernel: 8021q: 802.1Q VLAN Support v1.8
Jun 22 08:23:22 brody network-link-enp0s20f0-start[669]: Configuring link...
Jun 22 08:23:22 brody lan-1-netdev-start[643]: RTNETLINK answers: Network is down
Jun 22 08:23:22 brody systemd[1]: Starting Vlan Interface vlan1006...
Jun 22 08:23:22 brody vlan1001-netdev-start[642]: RTNETLINK answers: Network is down
Jun 22 08:23:22 brody systemd[1]: vlan1001-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 22 08:23:22 brody systemd[1]: lan-1-netdev.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 22 08:23:22 brody systemd[1]: Started Address configuration of enp0s20f0.
Jun 22 08:23:22 brody systemd[1]: Found device /sys/subsystem/net/devices/vlan1001.
Jun 22 08:23:22 brody systemd[1]: Starting Link configuration of vlan1001...
Jun 22 08:23:22 brody systemd[1]: Found device /sys/subsystem/net/devices/lan-1.
Jun 22 08:23:22 brody network-link-vlan1001-start[703]: Configuring link...
Jun 22 08:23:22 brody systemd[1]: Starting Link configuration of lan-1...
Jun 22 08:23:22 brody network-link-lan-1-start[705]: Configuring link...
Jun 22 08:23:22 brody kernel: IPv6: ADDRCONF(NETDEV_UP): enp0s20f0: link is not ready
Jun 22 08:23:22 brody kernel: 8021q: adding VLAN 0 to HW filter on device enp0s20f0
Jun 22 08:23:22 brody kernel: IPv6: ADDRCONF(NETDEV_UP): lan-1: link is not ready
Jun 22 08:23:22 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1001: link is not ready
Jun 22 08:23:22 brody network-link-enp0s20f0-start[669]: bringing up interface... done
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1005: link is not ready
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): management: link is not ready
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1004: link is not ready
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1000: link is not ready
Jun 22 08:23:23 brody network-link-lan-1-start[705]: bringing up interface... done
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): wan: link is not ready
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface vlan1005.
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface vlan1000.
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1002: link is not ready
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface vlan1004.
Jun 22 08:23:23 brody kernel: IPv6: ADDRCONF(NETDEV_UP): vlan1006: link is not ready
Jun 22 08:23:23 brody systemd[1]: Started Vlan Interface management.
Jun 22 08:23:23 brody network-link-vlan1001-start[703]: bringing up interface... Cannot find device "vlan1001"
Jun 22 08:23:23 brody network-link-vlan1001-start[703]: failed

The complete nixos configuration for a system with this issue can be found here: https://github.com/jemilsson/nixos-configuration/blob/master/machines/brody/configuration.nix.

I am running NixOS 18.03.132748.68e02f8ff21 (Impala) on a physical machine.

I would be happy to assist in troubleshooting this issue.

@yesbox
Copy link
Contributor Author

yesbox commented Jun 23, 2018

I too am having issues once more on 18.03, booting with VLAN interfaces is very unreliable. I have not taken the time to look into it but I can also try to assist.

@jemilsson
Copy link

@fpletz Could you perhaps reopen this issue, or would it be better to create a new one?

@zhangyoufu
Copy link
Contributor

I'm experiencing random failure during boot on NixOS 18.03
Please review my PR

@yesbox
Copy link
Contributor Author

yesbox commented Aug 2, 2018

The issue is intermittent for me, so it's difficult to say for sure. I can reboot 5 times in a row successfully and then fail 3 times in a row, which is roughly what happened when I tried to reproduce this now.

With #44347 applied to NixOS 18.03 it worked every time, when rebooting about 10 times... today anyway. Rolling back pre-patch I got it to fail again after just a couple of attempts.

@Mic92
Copy link
Member

Mic92 commented Aug 3, 2018

Given all the hassle with our networking scripts, should we enable the networkd backend for the networking module by default?

@fpletz
Copy link
Member

fpletz commented Aug 3, 2018

@Mic92 I'm currently working on making that a reality for 18.09.

@yesbox
Copy link
Contributor Author

yesbox commented Aug 8, 2018

@fpletz Cool, is there an issue tracking that somewhere? How to try it, is it networking.useNetworkd, systemd.network.enable or both?

Shouldn't this issue be reopened? The issue as it's described in the first comment is still true, only harder to reproduce.

Update: I guess it's #10001.

@fpletz fpletz reopened this Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants