Skip to content

F #183: Add ability to rename net/pci/vfio devices#208

Merged
sk4zuzu merged 21 commits into
masterfrom
f-183
May 18, 2026
Merged

F #183: Add ability to rename net/pci/vfio devices#208
sk4zuzu merged 21 commits into
masterfrom
f-183

Conversation

@sk4zuzu
Copy link
Copy Markdown
Collaborator

@sk4zuzu sk4zuzu commented Apr 27, 2026

  • Normalize lspci_devices's structure
  • Add 'unlisted' option for hiding PCI devices from OpenNebula
  • Add 'set_name' option for naming PCI devices
  • Add 'match_address' test plugin to handle wildcards in PCI/MAC addresses
  • Add ability to match NICs by their MAC address
  • Add udev rules for net/pci/vfio device renaming/symlinking (set_name)
  • Rename 99-vfio.rules to 99-mode.rules in both helper/pci and openvswitch roles
  • Remove 'pci_passthrough_enabled'
  • Rename 'pci_passthrough_default_filter' to 'pci_default_filter'
  • Update precheck role
  • Update README.md files

- Normalize lspci_devices's structure
- Add 'unlisted' option for hiding PCI devices from OpenNebula
- Add 'set_name' option for naming PCI devices
- Add 'match_address' test plugin to handle wildcards in PCI/MAC addresses
- Add ability to match NICs by their MAC address
- Add udev rules for net/pci/vfio device renaming/symlinking (set_name)
- Rename 99-vfio.rules to 99-mode.rules in both helper/pci and openvswitch roles
- Remove 'pci_passthrough_enabled'
- Rename 'pci_passthrough_default_filter' to 'pci_default_filter'
- Update precheck role
- Update README.md files

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@sk4zuzu sk4zuzu requested review from dann1, rsmontero and tinova April 27, 2026 18:17
sk4zuzu added 6 commits May 6, 2026 18:39
Imperative top-to-bottom processing should be less
confusing to users and allow mixing of address and
vendor/device/class queries in predictable ways.

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Require users to define driver explicitly to prevent later confusion.

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Ignore devices for which helper/pci role does not manage driver changes.

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Allow opennebula-ovs.service to pick up renamed devices after reboot.
This makes it possible to mix PCI/SR-IOV and OVS/DPDK devices.

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@dann1
Copy link
Copy Markdown
Collaborator

dann1 commented May 8, 2026

Hi @sk4zuzu. After the last batch of commits I'm no longer able to use the PCI roles. Here is the error

TASK [opennebula.deploy.helper/pci : Parse lspci's output] ***********************************************************************************************************************
Friday 08 May 2026  13:26:28 +0200 (0:00:00.384)       0:00:27.447 ************
ok: [sm15] =>
    ansible_facts:
        lspci_devices:
        -   Class: '0200'
            Device: '1015'
            Driver: mlx5_core
            Ecap_sriov: 'yes'
            Excluded: 'no'
            IOMMUGroup: '17'
            Module: mlx5_core
            NUMANode: '0'
            SDevice: 1c69
            SVendor: 15d9
            Set_driver: omit
            Set_name: pf{0[3]}
            Set_numvfs: max
            Slot: 0000:81:00.0
            Unlisted: 'no'
            Vendor: 15b3
        -   Class: '0200'
            Device: '1015'
            Driver: mlx5_core
            Ecap_sriov: 'yes'
            Excluded: 'no'
            IOMMUGroup: '18'
            Module: mlx5_core
            NUMANode: '0'
            SDevice: 1c69
            SVendor: 15d9
            Set_driver: omit
            Set_name: pf{0[3]}
            Set_numvfs: max
            Slot: 0000:81:00.1
            Unlisted: 'no'
            Vendor: 15b3
    changed: false

TASK [opennebula.deploy.helper/pci : Query udev for device info] *****************************************************************************************************************
Friday 08 May 2026  13:26:28 +0200 (0:00:00.062)       0:00:27.510 ************
fatal: [sm15]: FAILED! =>
    changed: false
    cmd:
    - udevadm
    - info
    - --query=property
    - --property=ID_PATH
    - --value
    delta: '0:00:00.008617'
    end: '2026-05-08 11:26:28.833168'
    msg: non-zero return code
    rc: 1
    start: '2026-05-08 11:26:28.824551'
    stderr: A device name or path is required
    stderr_lines: <omitted>
    stdout: ''
    stdout_lines: <omitted>

NO MORE HOSTS LEFT ***************************************************************************************************************************************************************

PLAY RECAP ***********************************************************************************************************************************************************************
sm15                       : ok=72   changed=0    unreachable=0    failed=1    skipped=47   rescued=0    ignored=0

That is the result of using the main playbook with this inventory snippet

  hosts:
    sm15:
      ansible_host: sm15
      pci_devices:
        # - address: "0000:01:00.0" TODO: active interface still no rename
        #   set_driver: omit
        #   set_name: vmnic0
        # - address: "0000:01:00.1"
        #   set_driver: omit
        #   set_name: vmnic1
        - vendor: "15b3"
          device: "*"
          class: "0200"
          set_driver: omit
          set_name: "vf{0[2]}{0[3]}"
        - vendor: "15b3"
          device: "1015"
          class: "0200"
          set_driver: omit
          set_numvfs: max
          set_name: "pf{0[3]}"
        # - vendor: "15b3"
        #   device: "1016"
        #   class: "0200"
        #   set_driver: omit
        #   set_name: "vf{0[2]}{0[3]}"

      ovs:
        set:
          - other_config:dpdk-init: 'false' # DPDK disabled
          - other_config:dpdk-socket-mem: '6144'
          - other_config:dpdk-hugepage-dir: '/dev/hugepages'
          - other_config:pmd-cpu-mask: '0x3'
          - other_config:hw-offload: 'true'
          - other_config:tc-policy: 'skip_sw'
        iface:
          ovsbr0:
            set:
              - mtu_request: 1500
          enp1s0f0np0: # TODO: infer name from PCI address
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.0'
              - mtu_request: 9126
          enp1s0f1np1: # TODO: infer name from PCI address
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.1'
              - mtu_request: 9126
        bond:
          bond0:
            ifaces: [enp1s0f0np0, enp1s0f1np1]
            set:
              - bond_mode: balance-slb
        br:
          ovsbr0:
            ports: [bond0]
            set:
              - datapath_type: netdev
            addrs:
              - cidr: "{{ ansible_default_ipv4.address ~ '/' ~ ansible_default_ipv4.prefix }}"
                metric: 400
            gw: "{{ ansible_default_ipv4.gateway }}"
            dns: [8.8.8.8, 1.1.1.1]

Going back to commit cd607a4 works. Here is the output of said commit

TASK [opennebula.deploy.helper/pci : Parse lspci's output] ***********************************************************************************************************************
Friday 08 May 2026  13:28:25 +0200 (0:00:00.389)       0:00:28.240 ************
ok: [sm15] =>
    ansible_facts:
        lspci_devices:
        -   Class: '0200'
            Device: '1015'
            Driver: mlx5_core
            Ecap_sriov: 'yes'
            Excluded: 'no'
            IOMMUGroup: '17'
            Module: mlx5_core
            NUMANode: '0'
            SDevice: 1c69
            SVendor: 15d9
            Set_driver: omit
            Set_name: pf{0[3]}
            Set_numvfs: max
            Slot: 0000:81:00.0
            Unlisted: 'no'
            Vendor: 15b3
        -   Class: '0200'
            Device: '1015'
            Driver: mlx5_core
            Ecap_sriov: 'yes'
            Excluded: 'no'
            IOMMUGroup: '18'
            Module: mlx5_core
            NUMANode: '0'
            SDevice: 1c69
            SVendor: 15d9
            Set_driver: omit
            Set_name: pf{0[3]}
            Set_numvfs: max
            Slot: 0000:81:00.1
            Unlisted: 'no'
            Vendor: 15b3
    changed: false

TASK [opennebula.deploy.helper/pci : Query udev for device info] *****************************************************************************************************************
Friday 08 May 2026  13:28:25 +0200 (0:00:00.068)       0:00:28.309 ************
ok: [sm15] =>
    changed: false
    cmd:
    - udevadm
    - info
    - --query=property
    - --property=ID_PATH
    - --value
    - -p
    - /sys/class/net/enp1s0f0np0
    - -p
    - /sys/class/net/enp1s0f1np1
    delta: '0:00:00.008968'
    end: '2026-05-08 11:28:26.207890'
    msg: ''
    rc: 0
    start: '2026-05-08 11:28:26.198922'
    stderr: ''
    stderr_lines: <omitted>
    stdout: |-
        pci-0000:01:00.0
        pci-0000:01:00.1
    stdout_lines: <omitted>

@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 8, 2026

@dann1 Hi, so the error doesn't make much sense, that means I need to debug the actual udev check (I'll come back to you 🤗).

Apart from the error there is a logical inconsequence in your inventory:

  1. Assuming enp1s0f0np0 and enp1s0f1np1 are Mellanox devices. (right?)
  2. You explicitly rename them with
        - vendor: "15b3"
          device: "*"
          class: "0200"
          set_driver: omit
          set_name: "vf{0[2]}{0[3]}"
        - vendor: "15b3"
          device: "1015"
          class: "0200"
          set_driver: omit
          set_numvfs: max
          set_name: "pf{0[3]}"
  1. Then because you're treating them as non-DPDK devices (kernel drivers)
        iface:
          ovsbr0:
            set:
              - mtu_request: 1500
          enp1s0f0np0: # TODO: infer name from PCI address
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.0'
              - mtu_request: 9126
          enp1s0f1np1: # TODO: infer name from PCI address
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.1'
              - mtu_request: 9126

those are exactly matched as enp1s0f0np0 and enp1s0f1np1 but those names are gone at this point (I trully think we should use altname instead of full rename, I mean I don't agree with the requirement 👎😇). 🤔

@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 8, 2026

@dann1 Do you think we should keep the old name as an altname? 🤔

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 8, 2026

@dann1 Thank you, it's been a defect indeed (rather basic one) 068f6cf 🥹 Please continue.. 🙏 🙇

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@dann1
Copy link
Copy Markdown
Collaborator

dann1 commented May 11, 2026

Perfect, now it works.

Assuming enp1s0f0np0 and enp1s0f1np1 are Mellanox devices. (right?)

Here is the PCI context of this server (sm15) where I'm testing all of the OVS and PCI related changes.

These are the physical devices

0000:01:00.0 'BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller 16d8' numa_node=0 if=enp1s0f0np0 drv=bnxt_en unused=vfio-pci
0000:01:00.1 'BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller 16d8' numa_node=0 if=vmnic1 drv=bnxt_en unused=vfio-pci
0000:81:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' numa_node=0 if=pf0 drv=mlx5_core unused=vfio-pci
0000:81:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' numa_node=0 if=pf1 drv=mlx5_core unused=vfio-pci

0000:01:00.0 is the active interface the server has by default (NetworkManager) with an IP address. Alongside 0000:01:00.1 these are the interfaces that become part of the ovs bond, either DPDK or not.

0000:81:00.0 and 0000:81:00.0 are the mellanox cards which are only used as physical functions.

@dann1 Do you think we should keep the old name as an altname? 🤔

It isn't a requirement of the feature, but I can see it being useful for reverts and debugging renaming issues. If it is not too problematic it is a good thing to do.

@dann1
Copy link
Copy Markdown
Collaborator

dann1 commented May 11, 2026

Some more feedback. The timestamps are from last week but the issues happen with the latest change still.

Cannot rename interface in use

When declaring a PCI device by address

node:
  hosts:
    sm15:
      ansible_host: sm15
      pci_devices:
        - address: "0000:01:00.0"
          set_driver: omit
          set_name: vmnic0

and said PCI device is the active network interface used to connect via SSH

[root@sm15 ~]# ip addr show enp1s0f0np0
53: enp1s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 90:5a:08:0a:8b:4e brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.16/24 brd 10.0.1.255 scope global noprefixroute enp1s0f0np0
       valid_lft forever preferred_lft forever
    inet6 fe80::925a:8ff:fe0a:8b4e/64 scope link
       valid_lft forever preferred_lft forever
[root@sm15 ~]# ls -ld /sys/bus/pci/devices/0000\:01\:00.0/net/enp1s0f0np0
drwxr-xr-x. 5 root root 0 May  4 13:49 /sys/bus/pci/devices/0000:01:00.0/net/enp1s0f0np0

There is an ansible error

TASK [opennebula.deploy.helper/pci : Assert 'lspci_devices' contains no forbidden PCI addresses] *********************************************************************************
Tuesday 05 May 2026  16:43:44 +0200 (0:00:00.101)       0:00:05.913 ***********
fatal: [sm15]: FAILED! =>
    assertion: (pci_forbidden_addresses is undefined) or (pci_forbidden_addresses | count
        == 0) or (_detected | count == 0)
    changed: false
    evaluated_to: false
    msg: Forbidden PCI addresses ['0000:01:00.0'] detected, aborting! Please adjust 'pci_devices'
        to exclude forbidden PCI addresses. You might also want to look for conflicts
        with OVS/DPDK config.

Could it be possible to rename the management interface used for the ansible SSH connection ? For the context of the project, these interfaces become OVS interface bonds. In DPDK, we are thinking about setting the desired name directly in the OVS declaration since the interface loses its name. But for regular OVS we would need to rename it and set that name for the ovs declaration

As of now the renaming + ovs interfacing works

          enp1s0f0np0:
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.0'
              - mtu_request: 9126
          vmnic1:
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.1'
              - mtu_request: 9126

But we would need the primary interface renamed

VF renaming

How can I establish names for the virtual functions which don't exist yet ? For example, exposing vfs from these two physical functions

0000:81:00.0 'MT27710 Family [ConnectX-4 Lx] 1015' numa_node=0 if=eth6,eth4,eth2,eth0,eth7,eth5,eth3,enp129s0f0np0,eth1 drv=mlx5_core unused=vfio-pci *Active*
0000:81:00.1 'MT27710 Family [ConnectX-4 Lx] 1015' numa_node=0 if=enp129s0f1np1 drv=mlx5_core unused=vfio-pci

yields

0000:81:00.2 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v0 drv=mlx5_core unused=vfio-pci
0000:81:00.3 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v1 drv=mlx5_core unused=vfio-pci
0000:81:00.4 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v2 drv=mlx5_core unused=vfio-pci
0000:81:00.5 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v3 drv=mlx5_core unused=vfio-pci
0000:81:00.6 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v4 drv=mlx5_core unused=vfio-pci
0000:81:00.7 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v5 drv=mlx5_core unused=vfio-pci
0000:81:01.0 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v6 drv=mlx5_core unused=vfio-pci
0000:81:01.1 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f0v7 drv=mlx5_core unused=vfio-pci
0000:81:01.2 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v0 drv=mlx5_core unused=vfio-pci
0000:81:01.3 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v1 drv=mlx5_core unused=vfio-pci
0000:81:01.4 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v2 drv=mlx5_core unused=vfio-pci
0000:81:01.5 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v3 drv=mlx5_core unused=vfio-pci
0000:81:01.6 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v4 drv=mlx5_core unused=vfio-pci
0000:81:01.7 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v5 drv=mlx5_core unused=vfio-pci
0000:81:02.0 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v6 drv=mlx5_core unused=vfio-pci
0000:81:02.1 'MT27710 Family [ConnectX-4 Lx Virtual Function] 1016' numa_node=0 if=enp129s0f1v7 drv=mlx5_core unused=vfio-pci

The PFs have 1015 device and the VFs have 1016. The following inventory snippet fails

        - vendor: "15b3"
          device: "1016"
          class: "0200"
          set_driver: omit
          set_name: "vf{0[2]}{0[3]}"

fails, maybe because 1016 devices do not exist initially

TASK [opennebula.deploy.helper/pci : Query lspci for devices] ********************************************************************************************************************
Tuesday 05 May 2026  17:03:11 +0200 (0:00:00.046)       0:00:10.407 ***********
fatal: [sm15]: FAILED! =>
    changed: false
    cmd: |-
        set -o errexit -o pipefail
        STDOUT="$(lspci -vmm -nkD -d '15b3:1015:0200')"
        if [[ -n "$STDOUT" ]]; then
          echo "$STDOUT"
          echo
        else
          echo "Could not find '15b3:1015:0200'" >&2
          exit 1
        fi
        setpci -v -d '15b3:1015:0200' STATUS | while IFS=' ' read -r SLOT _; do
          echo -e "Slot:\t$SLOT"
          if setpci -s "$SLOT" ECAP_SRIOV.B 1>/dev/null; then
            echo -e "Ecap_sriov:\tyes"
          else
            echo -e "Ecap_sriov:\tno"
          fi
          echo -e 'Set_driver:\tomit'
          echo -e 'Set_name:\tpf{0[3]}'
          echo -e 'Set_numvfs:\tmax'
          echo -e 'Unlisted:\tno'
          echo -e 'Excluded:\tno'
          echo
        done
        STDOUT="$(lspci -vmm -nkD -d '15b3:1016:0200')"
        if [[ -n "$STDOUT" ]]; then
          echo "$STDOUT"
          echo
        else
          echo "Could not find '15b3:1016:0200'" >&2
          exit 1
        fi
        setpci -v -d '15b3:1016:0200' STATUS | while IFS=' ' read -r SLOT _; do
          echo -e "Slot:\t$SLOT"
          if setpci -s "$SLOT" ECAP_SRIOV.B 1>/dev/null; then
            echo -e "Ecap_sriov:\tyes"
          else
            echo -e "Ecap_sriov:\tno"
          fi
          echo -e 'Set_driver:\tomit'
          echo -e 'Set_name:\tvf{0[2]}{0[3]}'
          echo -e 'Set_numvfs:\t0'
          echo -e 'Unlisted:\tno'
          echo -e 'Excluded:\tno'
          echo
        done
        STDOUT="$(lspci -vmm -nkD -s '0000:01:00.1')"
        if [[ -n "$STDOUT" ]]; then
          echo "$STDOUT"
          echo
        else
          echo "Could not find '0000:01:00.1'" >&2
          exit 1
        fi
        setpci -v -s '0000:01:00.1' STATUS | while IFS=' ' read -r SLOT _; do
          echo -e "Slot:\t$SLOT"
          if setpci -s "$SLOT" ECAP_SRIOV.B 1>/dev/null; then
            echo -e "Ecap_sriov:\tyes"
          else
            echo -e "Ecap_sriov:\tno"
          fi
          echo -e 'Set_driver:\tomit'
          echo -e 'Set_name:\tvmnic1'
          echo -e 'Set_numvfs:\t0'
          echo -e 'Unlisted:\tno'
          echo -e 'Excluded:\tno'
          echo
        done
    delta: '0:00:00.020136'
    end: '2026-05-05 15:03:11.667910'
    msg: non-zero return code
    rc: 1
    start: '2026-05-05 15:03:11.647774'
    stderr: Could not find '15b3:1016:0200'
    stderr_lines: <omitted>
    stdout: |-
        Slot:   0000:81:00.0
        Class:  0200
        Vendor: 15b3
        Device: 1015
        SVendor:        15d9
        SDevice:        1c69
        Driver: mlx5_core
        Module: mlx5_core
        NUMANode:       0
        IOMMUGroup:     17

        Slot:   0000:81:00.1
        Class:  0200
        Vendor: 15b3
        Device: 1015
        SVendor:        15d9
        SDevice:        1c69
        Driver: mlx5_core
        Module: mlx5_core
        NUMANode:       0
        IOMMUGroup:     18

        Slot:   0000:81:00.1
        Ecap_sriov:     yes
        Set_driver:     omit
        Set_name:       pf{0[3]}
        Set_numvfs:     max
        Unlisted:       no
        Excluded:       no

        Slot:   0000:81:00.0
        Ecap_sriov:     yes
        Set_driver:     omit
        Set_name:       pf{0[3]}
        Set_numvfs:     max
        Unlisted:       no
        Excluded:       no
    stdout_lines: <omitted>

Using this

      pci_devices:
        - address: "0000:01:00.1"
          set_driver: omit
          set_name: vmnic1
        - vendor: "15b3"
          device: "*"
          class: "0200"
          set_driver: omit
          set_name: "vf{0[2]}{0[3]}"
        - vendor: "15b3"
          device: "1015"
          class: "0200"
          set_driver: omit
          set_numvfs: max
          set_name: "pf{0[3]}"

works, since the udev rules are correct.

[root@sm15 ~]# cat /etc/udev/rules.d/99-rename.rules
# managed by one-deploy
# --- NET
# 0000:81:00.0 <- pf0
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.0", NAME="pf0"
# 0000:81:00.1 <- pf1
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.1", NAME="pf1"
# 0000:81:00.2 <- vf002
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.2", NAME="vf002"
# 0000:81:00.3 <- vf003
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.3", NAME="vf003"
# 0000:81:00.4 <- vf004
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.4", NAME="vf004"
# 0000:81:00.5 <- vf005
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.5", NAME="vf005"
# 0000:81:00.6 <- vf006
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.6", NAME="vf006"
# 0000:81:00.7 <- vf007
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:00.7", NAME="vf007"
# 0000:81:01.0 <- vf010
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.0", NAME="vf010"
# 0000:81:01.1 <- vf011
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.1", NAME="vf011"
# 0000:81:01.2 <- vf012
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.2", NAME="vf012"
# 0000:81:01.3 <- vf013
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.3", NAME="vf013"
# 0000:81:01.4 <- vf014
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.4", NAME="vf014"
# 0000:81:01.5 <- vf015
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.5", NAME="vf015"
# 0000:81:01.6 <- vf016
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.6", NAME="vf016"
# 0000:81:01.7 <- vf017
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:01.7", NAME="vf017"
# 0000:81:02.0 <- vf020
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:02.0", NAME="vf020"
# 0000:81:02.1 <- vf021
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:81:02.1", NAME="vf021"
# 0000:01:00.1 <- vmnic1
SUBSYSTEM=="net", ACTION=="add", ENV{ID_PATH}=="pci-0000:01:00.1", NAME="vmnic1"
# --- PCI
# --- VFIO

Is this and appropriate way to deal with the use case (Renaming VFs and their PF) ? Just wanting to confirm.

Since the PCI device ID of the Virtual Functions can be known beforehand, maybe the assertion of PCI devices existing could be relaxed somehow with a parameter ?

        - vendor: "15b3"
          device: "1016" # TODO: How to deal with VFs that do not exist 
          exist: promise # <--
          class: "0200"
          set_driver: omit
          set_name: "vf{0[2]}{0[3]}"

or try to process this after VFs are enabled.

Again, not really a problem since we can wildcard the device and rename the pfs with their naming

No VFIO rules when set_driver: omit

When declaring a vf to be used in a VM, opennebula manages the vfio binding and unbinding process automatically with managed=yes declaration in the libvirt domain xml interface element. So no driverctl override is really needed, set_driver: omit can be used. When using this however, for example

      pci_devices:
        - vendor: "15b3"
          device: "*"
          class: "0200"
          set_driver: omit
          set_name: "vf{0[2]}{0[3]}"
        - vendor: "15b3"
          device: "1015"
          class: "0200"
          set_driver: omit
          set_numvfs: max
          set_name: "pf{0[3]}"

VFIO symlink udev rules are not created. Only created only when using vfio-pci in set_driver. When the interface gets bound by libvirt there is no symlink recording the name.

This being said, it is a very minor issue as we could declare the SRIOV interfaces always bound to vfio.

sk4zuzu added 4 commits May 11, 2026 14:52
- Add generic way to skip primary NIC checks (unguarded: true)
- Decouple helper/pci from openvswitch role
- Update README.md

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 11, 2026

@dann1 I came to conclusion coupling helper/pci and openvsiwtch roles has been a mistake all along. So I would have to do something like grep lspci_devices to find PCI address of renamed device defined in OVS config, then exclude by PCI address instead of interface name, but that's too spaghett for my taste (pun intended).. So let's try new approach, we'll allow users to decide and bypass the check via 'unguarded: true' attribute. I think this is less "magical" and actually more instructive. Please test it if you can, then if we agree that's the way let's proceed to other issues. 🙏 😇

@dann1
Copy link
Copy Markdown
Collaborator

dann1 commented May 11, 2026

Perfect, this worked

      pci_devices:
        - address: "0000:01:00.0"
          set_driver: omit
          set_name: vmnic0
          unguarded: true
        - address: "0000:01:00.1"
          set_driver: omit
          set_name: vmnic1
.
.
        iface:
          ovsbr0:
            set:
              - mtu_request: 1500
          vmnic0:
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.0'
              - mtu_request: 9126
          vmnic1:
            set:
              # - type: dpdk
              # - options:dpdk-devargs: '0000:01:00.1'
              - mtu_request: 9126
        bond:
          bond0:
            ifaces: [vmnic0, vmnic1]
            set:
              - bond_mode: balance-slb

Both interfaces got renamed and bonded

[root@sm15 ~]# ovs-vsctl show
f4eabec7-0b73-4a45-9024-a640953fde18
    Bridge ovsbr0
        Port bond0
            Interface vmnic1
            Interface vmnic0
        Port ovsbr0
            Interface ovsbr0
                type: internal
    ovs_version: "3.6.1-11.el9fdp"
[root@sm15 ~]# ip addr show ovsbr0
67: ovsbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 90:5a:08:0a:8b:4e brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.16/24 metric 400 scope global ovsbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::925a:8ff:fe0a:8b4e/64 scope link
       valid_lft forever preferred_lft forever

@dann1
Copy link
Copy Markdown
Collaborator

dann1 commented May 14, 2026

Hi @sk4zuzu, now that the management interface can be renamed, could we proceed with the other minor issues mentioned before?

  • VF renaming
  • No VFIO rules when set_driver: omit

and about VF renaming, would it be possible to set a naming schema that uses the parent physical function of said virtual function as a system ? For example pfXvfX With the current one we can use domain, bus, device and function as the readme suggests

            # NOTE: 0[0] -> PCI Domain
            #       0[1] -> PCI Bus
            #       0[2] -> PCI Device
            #       0[3] -> PCI Function
            set_name: "asd{0[1]}v{0[3]}"

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 15, 2026

@dann1 regarding VF naming + 7c915be 😇

Would these two snippets work for you? 🤔

All Mellanox cards:

pci_devices:
  - vendor: '15b3'
    device: '*'
    class: '0200'
    set_name: 'pf{1[1]}{1[2]}{1[3]}vf{2}'
  - vendor: '15b3'
    device: '1015'
    class: '0200'
    unlisted: true
    set_numvfs: max
    set_name: 'pf{0[1]}{0[2]}{0[3]}'

Per single PF using PCI address wildcard:

pci_devices:
  - address: '0000:81:00.*'
    set_name: 'pf{1[1]}{1[2]}{1[3]}vf{2}'
  - address: '0000:81:01.*'
    set_name: 'pf{1[1]}{1[2]}{1[3]}vf{2}'    
  - address: '0000:81:00.0'
    unlisted: true
    set_numvfs: max
    set_name: 'pf{0[1]}{0[2]}{0[3]}'

For the first device in both cases it would produce:

pf81000
pf81000vf0
pf81000vf1
...

Note the second one with 2 cards can be a bit imprecise though.. 🤔

You could also specify all VFs by full address + set_numvfs <- 8 to have complete control without wildcards.

sk4zuzu added 2 commits May 15, 2026 16:35
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 15, 2026

@dann1 Indeed not being able to do simple device: '1016' is horrible UX at best.. What about ecf7bdd + the snippet below ? 🤔

pci_devices:
  - vendor: '15b3'
    device: '1015'
    class: '0200'
    unlisted: true
    set_numvfs: max
    set_name: 'pf{0[1]}{0[2]}{0[3]}'
  - vendor: '15b3'
    device: '1016'
    class: '0200'
    virtual: true
    set_name: 'pf{1[1]}{1[2]}{1[3]}vf{2}'

New 'virtual' attribute basically ignores "not found" error on demand, so it proceeds to later stages and eventually VFs do appear and can be renamed. 👍 😇

@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 15, 2026

@dann1 Regarding "No VFIO rules when set_driver: omit" I don't see how I can create VFIO-like rule in UDEV for a device that is not vfio-pci bound 😞

Maybe core OpenNebula should do this since it's supposed to takeover and manage the device?

TBH, IMHO all this code should be part of core OpenNebula and not some supplementary yaml files. 🤗

sk4zuzu added 4 commits May 15, 2026 18:49
It seems that it is a safer default to force users to pass
devices to OpenNebula explicitly.

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
- Make VFIO symlink config survive reboots
- Make "leap of faith" and create symlink rules on VFIO-unbound devices
- Remove useless PCI config
- Apply linter fixes

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@sk4zuzu
Copy link
Copy Markdown
Collaborator Author

sk4zuzu commented May 17, 2026

@dann1 I think I made it work in the end, please take a look at 4150d8b

  1. Initial implementation for VFIO rules was absolutely wrong, it seems that kernel assigns IOMMU group in a non-deterministic way (or maybe it's systemd IDK), so after reboot IOMMU groups for specific devices are simply shuffled, then names did not match the devices..
  2. This PROGRAM="/usr/bin/test /sys/bus/pci/devices/{{ v.Slot }}/iommu_group -ef /sys/kernel/iommu_groups/%k" is crucial as UDEV doesn't know about the PCI -> IOMMU mapping. This is the simplest way I came up with, it's just execution of a single binary (from coreutils, so should be everywhere).
  3. VFIO mapping is now a "leap of faith" thing, initially you won't see the symlink, but the moment something binds the device to vfio-pci driver, UDEV picks that up and creates the symlink, also unbinding seems to be correctly removing the symlink.

🤗

@dann1
Copy link
Copy Markdown
Collaborator

dann1 commented May 18, 2026

Everything working perfectly now. Issue can be closed, thanks a lot for the very hard work.

One minor detail with adding the ability to "unguard" the main interface is that reverting the configuration when set_name is used, can lead to a situation where the network management software is targeting the default name.

For example, deleting the ovs-bridge with the internal interface holding the management IP and restoring NetworkManager, all within the same command, leads to SSH connection being lost. This was mitigated by copying the connection file and changing the interface name

[root@sm15 ~]# cat /etc/NetworkManager/system-connections/enp1s0f0np0.nmconnection
[connection]
id=enp1s0f0np0
uuid=e83e0ddf-eb3c-457d-aef3-5c76e45a014b
type=ethernet
interface-name=enp1s0f0np0 # original name

[ethernet]
mtu=1500

[ipv4]
address1=10.0.1.16/24
dns=8.8.8.8;
gateway=10.0.1.1
method=manual

[ipv6]
addr-gen-mode=eui64
method=ignore

[proxy]

[root@sm15 ~]# cat /etc/NetworkManager/system-connections/vmnic0.nmconnection
[connection]
id=enp1s0f0np0
uuid=e83e0ddf-eb3c-457d-aef3-5c76e45a014b
type=ethernet
interface-name=vmnic0 # new name

[ethernet]
mtu=1500

[ipv4]
address1=10.0.1.16/24
dns=8.8.8.8;
gateway=10.0.1.1
method=manual

[ipv6]
addr-gen-mode=eui64
method=ignore

[proxy]

Not really needed for the purpose of this development, but perhaps useful to carry out testing/development, but then development reached the goal.

Copy link
Copy Markdown
Collaborator

@dann1 dann1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Michal Opala <sk4zuzu@gmail.com>
@sk4zuzu sk4zuzu merged commit 849416c into master May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants