Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test failure: test_ovs_member_interfaces_not_excluded #4350

Open
TheRealFalcon opened this issue Aug 15, 2023 · 5 comments
Open
Labels
bug Something isn't working correctly

Comments

@TheRealFalcon
Copy link
Member

TheRealFalcon commented Aug 15, 2023

tests/integration_tests/bugs/test_lp1898997.py::TestInterfaceListingWithOpenvSwitch::test_ovs_member_interfaces_not_excluded is currently failing on Ubuntu Lunar under LXD VMs.

        gateway = client.execute(
            "ip -4 route show default | awk '{ print $3 }'"
        )

in unexpectedly returning an empty string causing the assert on the line after to fail.

The test config indicates that ovs-br interface should get a dhcp4 address, yet it is not getting one:

root@cloudinit-0815-152252wjsu05rn:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether 02:00:00:06:f5:d2 brd ff:ff:ff:ff:ff:ff
3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ba:a3:17:61:15:df brd ff:ff:ff:ff:ff:ff
4: ovs-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 22:6c:90:99:14:46 brd ff:ff:ff:ff:ff:ff
    inet6 fd42:eaab:10a1:ad48:206c:90ff:fe99:1446/64 scope global mngtmpaddr noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fed9:81c/64 scope link 
       valid_lft forever preferred_lft forever

A journal snippet shows networkd failing to find the ovs-br device:
Aug 15 15:36:12 cloudinit-0815-152252wjsu05rn systemd-networkd[309]: /run/systemd/network/10-netplan-enp5s0.network: ovs-br NetDev could not be found, ignoring assignment

If I manually run netplan apply everything works as expected.

I couldn't find anything out of place in the cloud-init logs:

2023-08-15 15:23:29,719 - networking.py[DEBUG]: net: all expected physical devices present
2023-08-15 15:23:29,719 - stages.py[DEBUG]: applying net config names for {'bridges': {'ovs-br': {'dhcp4': True, 'interfaces': ['enp5s0'], 'macaddress': '52:54:00:d9:08:1c', 'mtu': 1500, 'openvswitch': {}}}, 'ethernets': {'enp5s0': {'mtu': 1500, 'set-name': 'enp5s0', 'match': {'macaddress': '02:00:00:06:f5:d2'}}}, 'version': 2}

and the problem that originally caused #3792 is not happening here.

tests/integration_tests/bugs/test_lp1912844.py::test_get_interfaces_by_mac_doesnt_traceback is failing similarly. The call in the test works fine, but the interface isn't coming up so pycloudlib thinks the instance is down.

Since the issue appears to be unrelated to cloud-init, I suggest we skip/xfail the tests until the root cause is fixed.

@TheRealFalcon TheRealFalcon added the bug Something isn't working correctly label Aug 15, 2023
@blackboxsw blackboxsw assigned blackboxsw and unassigned blackboxsw Aug 16, 2023
@TheRealFalcon
Copy link
Member Author

/etc/netplan/50-cloud-init.yaml:

network:
    bridges:
        ovs-br:
            dhcp4: true
            interfaces:
            - enp5s0
            macaddress: 52:54:00:d9:08:1c
            mtu: 1500
            openvswitch: {}
    ethernets:
        enp5s0:
            match:
                macaddress: 02:00:00:8c:54:57
            mtu: 1500
            set-name: enp5s0
    version: 2

/run/systemd/network/10-netplan-enp5s0.link:

[Match]
PermanentMACAddress=02:00:00:8c:54:57

[Link]
Name=enp5s0
WakeOnLan=off
MTUBytes=1500

/run/systemd/network/10-netplan-enp5s0.network:

[Match]
PermanentMACAddress=02:00:00:8c:54:57
Name=enp5s0

[Link]
MTUBytes=1500

[Network]
LinkLocalAddressing=no
Bridge=ovs-br

/run/systemd/network/10-netplan-ovs-br.network:

[Match]
Name=ovs-br

[Link]
MTUBytes=1500
MACAddress=52:54:00:d9:08:1c

[Network]
DHCP=ipv4
LinkLocalAddressing=ipv6
ConfigureWithoutCarrier=yes

[DHCP]
RouteMetric=100
UseMTU=true

@TheRealFalcon
Copy link
Member Author

@slyon Does anything here look out of the ordinary from a netplan perspective? I think this is a bug outside of cloud-init, but I'm not sure where to file it.

@slyon
Copy link
Contributor

slyon commented Aug 21, 2023

This smells like systemd dependency issue to me, we can see some ovs-vsctl failing due to OVS not being ready, yet. Therefore the ovs-br NetDev is probably not created:

Aug 15 15:36:12 cloudinit-0815-152252wjsu05rn ovs-vsctl[305]: ovs|00001|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
Aug 15 15:36:12 cloudinit-0815-152252wjsu05rn systemd[1]: Finished cloud-init-local.service - Initial cloud-init job (pre-networking).
Aug 15 15:36:12 cloudinit-0815-152252wjsu05rn systemd[1]: Reached target network-pre.target - Preparation for Network.
Aug 15 15:36:12 cloudinit-0815-152252wjsu05rn systemd[1]: Starting ovsdb-server.service - Open vSwitch Database Unit...
Aug 15 15:36:12 cloudinit-0815-152252wjsu05rn systemd[1]: Starting systemd-networkd.service - Network Configuration...
Aug 15 15:36:12 cloudinit-0815-152252wjsu05rn systemd-networkd[309]: /run/systemd/network/10-netplan-enp5s0.network: ovs-br NetDev could not be found, ignoring assignment.

Can we please get the output of systemctl status netplan-ovs-ovs-br.service and the contents of /run/systemd/system/systemd-networkd.service.wants/netplan-ovs-ovs-br.service?

The ovsdb-server.service systemd service needs to be ready when Netplan wants to create that OVS bridge, and I suppose this is not the case here for some reason.

@slyon
Copy link
Contributor

slyon commented Aug 21, 2023

So I tried building a Netplan integration test around it: seems to pass. But then again... Netplan's tests are calling netplan apply and are not going through the full early boot service dependency mangling, which cloud-init would go and I think is the issue here.

Can we try to trace the exact sequence of services being started by systemd in this scenario?

diff --git a/tests/integration/ovs.py b/tests/integration/ovs.py
index 958800cd..c6bdad8e 100644
--- a/tests/integration/ovs.py
+++ b/tests/integration/ovs.py
@@ -604,6 +607,35 @@ class _CommonTests():
             self.assertIn(b'netplan=true', before['external-ids-%s' % tbl])
             self.assertIn(b'netplan=true', after['external-ids-%s' % tbl])
 
+    def test_cloudinit_member_interface_not_excluded(self):
+        '''
+        https://github.com/canonical/cloud-init/issues/4350
+        '''
+        self.setup_eth('ra-only', True)
+        self.addCleanup(subprocess.call, ['ovs-vsctl', '-t', '5', '--if-exists', 'del-br', 'ovs-br'])
+        with open(self.config, 'w') as f:
+            f.write('''network:
+    bridges:
+        ovs-br:
+            dhcp6: true
+            interfaces:
+            - %(ec)s
+            macaddress: 52:54:00:d9:08:1c
+            mtu: 1500
+            openvswitch: {}
+    ethernets:
+        %(ec)s:
+            mtu: 1500
+    version: 2''' % {'ec': self.dev_e_client})
+        self.generate_and_settle([self.dev_e_client, self.state_dhcp6('ovs-br')])
+        # Basic verification that the interfaces/ports are set up in OVS
+        out = subprocess.check_output(['ovs-vsctl', '-t', '5', 'show'], text=True)
+        self.assertIn('    Bridge ovs-br', out)
+        self.assertIn('''        Port ovs-br
+            Interface ovs-br
+                type: internal''', out)
+        self.assert_iface_up(self.dev_e_client)
+
     @unittest.skip("For debugging only")
     def test_zzz_ovs_debugging(self):  # Runs as the last test, to collect all logs
         """Display OVS logs of the previous tests"""
$ autopkgtest -U -B . --test-name=ovs --shell -- lxd autopkgtest/ubuntu/mantic/amd64
[...]
autopkgtest [14:47:04]: test ovs: systemctl is-active openvswitch-switch.service && ./debian/tests/prep-testbed.sh && python3 tests/integration/run.py --test=ovs || exit 77
autopkgtest [14:47:04]: test ovs: [-----------------------
active
+ systemctl is-active NetworkManager.service
+ [ active = active ]
+ systemctl stop NetworkManager.service
+ dpkg-vendor --is Debian
+ exit 0
test_bond_base (__main__.TestOVS.test_bond_base) ... eth42 eth43 ovsbr ok
test_bridge_base (__main__.TestOVS.test_bridge_base) ... eth42 eth43 ovsbr ok
test_bridge_non_ovs_bond (__main__.TestOVS.test_bridge_non_ovs_bond) ... eth42 eth43 ovs-br non-ovs-bond ok
test_bridge_patch_ports (__main__.TestOVS.test_bridge_patch_ports) ... br0 br1 ok
test_bridge_vlan (__main__.TestOVS.test_bridge_vlan) ... eth42 br-eth42 br-data br-eth42.100 ok
test_bridge_vlan_deletion (__main__.TestOVS.test_bridge_vlan_deletion) ... eth42 br-eth42 br-eth42.100 eth42 br-eth42 ok
test_cleanup_interfaces (__main__.TestOVS.test_cleanup_interfaces) ... ovs0 ovs1 eth42 ok
test_cleanup_patch_ports (__main__.TestOVS.test_cleanup_patch_ports) ... eth42 ovs0 eth42 ovs1 ok
test_cloudinit_member_interface_not_excluded (__main__.TestOVS.test_cloudinit_member_interface_not_excluded)
https://github.com/canonical/cloud-init/issues/4350 ... eth42 ovs-br .ok
test_missing_ovs_tools (__main__.TestOVS.test_missing_ovs_tools) ... ok
test_ovsdb_server_is_not_running (__main__.TestOVS.test_ovsdb_server_is_not_running) ... ok
test_settings_tag_cleanup (__main__.TestOVS.test_settings_tag_cleanup) ... eth42 eth43 ovs0 ovs1 
** (process:5231): WARNING **: 12:50:21.470: Permissions for /etc/netplan/01-main.yaml are too open. Netplan configuration should NOT be accessible by others.
ok
test_vlan_maas (__main__.TestOVS.test_vlan_maas) ... eth42 ovs0 eth42.21 ok
test_zzz_ovs_debugging (__main__.TestOVS.test_zzz_ovs_debugging)
Display OVS logs of the previous tests ... skipped 'For debugging only'

----------------------------------------------------------------------
Ran 14 tests in 229.549s

OK (skipped=1)
autopkgtest [14:50:54]: test ovs: -----------------------]
autopkgtest [14:50:55]: test ovs:  - - - - - - - - - - results - - - - - - - - - -
ovs                  PASS
autopkgtest [14:51:37]: @@@@@@@@@@@@@@@@@@@@ summary
ovs                  PASS

@TheRealFalcon
Copy link
Member Author

@slyon , thanks for the response. Sorry for the delay; I was out on PTO last week.

Here's the output requested:

root@cloudinit-0828-205722vfbhipps:~# systemctl status netplan-ovs-ovs-br.service
○ netplan-ovs-ovs-br.service - OpenVSwitch configuration for ovs-br
     Loaded: loaded (/run/systemd/system/netplan-ovs-ovs-br.service; enabled-runtime; preset: enabled)
     Active: inactive (dead) since Mon 2023-08-28 20:58:01 UTC; 2min 12s ago
    Process: 428 ExecStart=/usr/bin/ovs-vsctl --may-exist add-br ovs-br (code=exited, status=0/SUCCESS)
    Process: 440 ExecStart=/usr/bin/ovs-vsctl --may-exist add-port ovs-br enp5s0 (code=exited, status=0/SUCCESS)
    Process: 441 ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan=true (code=exited, status=0/SUCCESS)
    Process: 442 ExecStart=/usr/bin/ovs-vsctl set-fail-mode ovs-br standalone (code=exited, status=0/SUCCESS)
    Process: 443 ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/global/set-fail-mode=standalone (code=exited, status=0/SUCCESS)
    Process: 444 ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br mcast_snooping_enable=false (code=exited, status=0/SUCCESS)
    Process: 445 ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/mcast_snooping_enable=false (code=exited, status=0/SUCCESS)
    Process: 446 ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br rstp_enable=false (code=exited, status=0/SUCCESS)
    Process: 447 ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/rstp_enable=false (code=exited, status=0/SUCCESS)
   Main PID: 447 (code=exited, status=0/SUCCESS)
        CPU: 15ms

Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[440]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --may-exist add-port ovs-br enp5s0
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[441]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan=true
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[442]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set-fail-mode ovs-br standalone
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[443]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/global/set-fail-mode=standalone
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[444]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs-br mcast_snooping_enable=false
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[445]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/mcast_snooping_enable=false
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[446]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs-br rstp_enable=false
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps ovs-vsctl[447]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/rstp_enable=false
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps systemd[1]: netplan-ovs-ovs-br.service: Deactivated successfully.
Aug 28 20:58:01 cloudinit-0828-205722vfbhipps systemd[1]: Finished netplan-ovs-ovs-br.service - OpenVSwitch configuration for ovs-br.

and

root@cloudinit-0828-205722vfbhipps:~# cat /run/systemd/system/systemd-networkd.service.wants/netplan-ovs-ovs-br.service 
[Unit]
Description=OpenVSwitch configuration for ovs-br
DefaultDependencies=no
Wants=ovsdb-server.service
After=ovsdb-server.service
After=netplan-ovs-cleanup.service
Before=network.target
Wants=network.target

[Service]
Type=oneshot
TimeoutStartSec=10s
ExecStart=/usr/bin/ovs-vsctl --may-exist add-br ovs-br
ExecStart=/usr/bin/ovs-vsctl --may-exist add-port ovs-br enp5s0
ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan=true
ExecStart=/usr/bin/ovs-vsctl set-fail-mode ovs-br standalone
ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/global/set-fail-mode=standalone
ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br mcast_snooping_enable=false
ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/mcast_snooping_enable=false
ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br rstp_enable=false
ExecStart=/usr/bin/ovs-vsctl set Bridge ovs-br external-ids:netplan/rstp_enable=false

Can we try to trace the exact sequence of services being started by systemd in this scenario?

Do you know of a good way to generate this? I know of some systemd-analyze commands (e.g., plot), but I'm not sure the best thing to use. Are you looking for a runtime dependency graph?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

3 participants