Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dpdk_setup_ports.py gets NUMA topology wrong with Sub-NUMA clustering enabled #1119

Open
Civil opened this issue Mar 18, 2024 · 3 comments · May be fixed by #1120
Open

dpdk_setup_ports.py gets NUMA topology wrong with Sub-NUMA clustering enabled #1119

Civil opened this issue Mar 18, 2024 · 3 comments · May be fixed by #1120

Comments

@Civil
Copy link
Contributor

Civil commented Mar 18, 2024

I have a new test bench setup that accidentally have 8 NUMA nodes.

# numactl -H
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
node 0 size: 31853 MB
node 0 free: 30260 MB
node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149
node 1 size: 32248 MB
node 1 free: 31565 MB
node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164
node 2 size: 32248 MB
node 2 free: 30914 MB
node 3 cpus: 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
node 3 size: 32248 MB
node 3 free: 29827 MB
node 4 cpus: 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
node 4 size: 32248 MB
node 4 free: 31231 MB
node 5 cpus: 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209
node 5 size: 32248 MB
node 5 free: 30703 MB
node 6 cpus: 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224
node 6 size: 32248 MB
node 6 free: 30712 MB
node 7 cpus: 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239
node 7 size: 32192 MB
node 7 free: 28823 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  12  12  12  21  21  21  21
  1:  12  10  12  12  21  21  21  21
  2:  12  12  10  12  21  21  21  21
  3:  12  12  12  10  21  21  21  21
  4:  21  21  21  21  10  12  12  12
  5:  21  21  21  21  12  10  12  12
  6:  21  21  21  21  12  12  10  12
  7:  21  21  21  21  12  12  12  10

However when I'm trying to run dpdk_setup_ports.py I get KeyError message:

Traceback (most recent call last):
  File "/srv/v3.04/./dpdk_setup_ports.py", line 1760, in main
    obj.do_interactive_create();
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/srv/v3.04/./dpdk_setup_ports.py", line 1473, in do_interactive_create
    config.create_config(print_config = True)
  File "/srv/v3.04/./dpdk_setup_ports.py", line 266, in create_config
    lcores_for_this_dual_if  = lcores_pool[numa]['all'][:lcores_per_dual_if]
                               ~~~~~~~~~~~^^^^^^
KeyError: 2

After looking around it seems that cpu_topology for some reason have only 2 NUMA nodes and that is because instead of using bindings to libnuma or some other library that would report correct result in this case there is an assumption that physical_package_id == NUMA Node ID. Which is not entierly correct on a lot of CPUs.

I have Sapphire Rapids which have 4 NUMA clusters per CPU, Emerald Rapids should have 2 NUMA nodes per CPU. There are also AMD Epyc's that have >1 NUMA node per CPU and so on.

I think it would be better to rely on libnuma instead and fallback to old way only if numa bindings are not available.

@Civil
Copy link
Contributor Author

Civil commented Mar 18, 2024

Alternatively same information should be obtained from

/sys/devices/system/node/

@Civil
Copy link
Contributor Author

Civil commented Mar 18, 2024

It is actually more weird, my topology is:

+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| ID | NUMA |   PCI   |        MAC        |                              Name                               |  Driver   | Linux IF  |  Active  |
+====+======+=========+===================+=================================================================+===========+===========+==========+
| 0  | 0    | 16:00.0 | 10:70:fd:5d:70:30 | MT2892 Family [ConnectX-6 Dx]                                   | mlx5_core | ens3f0np0 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 1  | 0    | 16:00.1 | 10:70:fd:5d:70:31 | MT2892 Family [ConnectX-6 Dx]                                   | mlx5_core | ens3f1np1 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 2  | 2    | 27:00.0 | 0c:42:a1:3a:31:c0 | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens6f0np0 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 3  | 2    | 27:00.1 | 0c:42:a1:3a:31:c1 | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens6f1np1 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 4  | 2    | 38:00.0 | ec:0d:9a:bf:dd:bc | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens2f0np0 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 5  | 2    | 38:00.1 | ec:0d:9a:bf:dd:bd | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens2f1np1 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 6  | 3    | 4a:00.0 | 74:56:3c:ed:b7:42 | I210 Gigabit Network Connection                                 | igb       | eno3      | *Active* |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 7  | 3    | 4b:00.0 | 74:56:3c:ed:b7:43 | I210 Gigabit Network Connection                                 | igb       | enp75s0   |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 8  | 4    | 98:00.0 | 04:3f:72:ea:e2:10 | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens1f0np0 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 9  | 4    | 98:00.1 | 04:3f:72:ea:e2:11 | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens1f1np1 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 10 | 6    | a8:00.0 | b8:ce:f6:75:63:6c | MT42822 BlueField-2 integrated ConnectX-6 Dx network controller | mlx5_core | ens5f0np0 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 11 | 6    | a8:00.1 | b8:ce:f6:75:63:6d | MT42822 BlueField-2 integrated ConnectX-6 Dx network controller | mlx5_core | ens5f1np1 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 12 | 6    | b8:00.0 | 04:3f:72:ea:e1:e8 | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens4f0np0 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+
| 13 | 6    | b8:00.1 | 04:3f:72:ea:e1:e9 | MT28800 Family [ConnectX-5 Ex]                                  | mlx5_core | ens4f1np1 |          |
+----+------+---------+-------------------+-----------------------------------------------------------------+-----------+-----------+----------+

And that fails with:

Not enough cores at NUMA 0. This NUMA has 3 processing units and 2 interfaces.

Even though it is not true, I have about 15 (+HT) cores per NUMA Node here.

@Civil
Copy link
Contributor Author

Civil commented Mar 18, 2024

Found the problem:

# grep -c processor /proc/cpuinfo
240

And:
https://github.com/cisco-system-traffic-generator/trex-core/blob/master/scripts/dpdk_setup_ports.py#L65-L68

I'm not sure why the limit is there, but I remove it - I can generate a config with improved numa detection code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant