Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in ena NIC - /sys/class/net/eth0/device/msi_irqs/ is empty #268

Closed
mykaul opened this issue Apr 20, 2023 · 17 comments
Closed

Regression in ena NIC - /sys/class/net/eth0/device/msi_irqs/ is empty #268

mykaul opened this issue Apr 20, 2023 · 17 comments

Comments

@mykaul
Copy link

mykaul commented Apr 20, 2023

(Copy of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2016991 - just as I'm unsure if this or there is the right place to report this regression):

It worked on kernel 5.15, and it doesn't on 5.19. (Ubuntu 22.04):
Running on AWS i3.4xlarge, instance, with the following:

driver: ena
version: 5.19.0-1022-aws
firmware-version:
expansion-rom-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

Linux ip-10-0-1-194 5.19.0-1022-aws #23~22.04.1-Ubuntu SMP Fri Mar 17 15:38:24 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

dmesg:
[ 5.485138] ena 0000:00:03.0: ENA device version: 0.10
[ 5.485146] ena 0000:00:03.0: ENA controller version: 0.0.1 implementation version 1
[ 5.487074] ena 0000:00:03.0: LLQ is not supported Fallback to host mode policy.
...
[ 5.533589] ena 0000:00:03.0: Elastic Network Adapter (ENA) found at mem 83000000, mac addr 02:08:84:31:35:a9

Interrupt mapping:
210: 35669 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 xen-pirq -msi-x ena-mgmnt@pci:0000:00:03.0
 211: 1079105381 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 xen-pirq -msi-x eth0-Tx-Rx-0
 212: 0 1047885215 0 0 0 0 0 0 0 0 0 0 0 0 0 0 xen-pirq -msi-x eth0-Tx-Rx-1
@davidarinzon
Copy link
Contributor

Hi @mykaul

This is actually a change on the kernel side to remove msi_irqs sysfs entries.

The change is torvalds/linux@24cff37
It was introduced in kernel 5.17.

@mykaul
Copy link
Author

mykaul commented Apr 23, 2023

@davidarinzon - thanks for the quick response. Do you have an alternative? We've been using that interface for quite some time and now we'll need to figure something out (see https://github.com/scylladb/seastar/blob/0b9afec2e8f0f29b35801b88b202508704faf507/scripts/perftune.py#L214 ).
(I've opened an issue on Seastar to find an alternative - scylladb/seastar#1622 )

@amitbern-aws
Copy link
Contributor

amitbern-aws commented Apr 23, 2023

@mykaul
You can get all eth0 IRQs from /proc/interrupts:
"ena" - IRQ of mgmt queue
"eth0" - IRQ of tx/rx queues
for example:
grep -E "eth0|ena" /proc/interrupts | awk -F':' '{print $1}'

@avikivity
Copy link

@vladzcloudius is the information in /proc/interrupts a suitable replacement?

@vladzcloudius
Copy link

vladzcloudius commented Apr 24, 2023

@vladzcloudius is the information in /proc/interrupts a suitable replacement?

We were using the info in /proc/interrupts as a fallback for outdated configurations for a long time. Kernel has(had) better interfaces for this however.

However what you, @amitbern-aws , wrote above still makes little sense to me because on a system with 6.2.12-060212-generic kernel I still see msi_irqs interface for my WiFi interface:

ll /sys/class/net/wlp110s0/device/msi_irqs/                                                                                                                                        
total 0
-r--r--r-- 1 root root 4.0K Apr 24 09:23 144
-r--r--r-- 1 root root 4.0K Apr 24 09:23 145
-r--r--r-- 1 root root 4.0K Apr 24 09:23 146
-r--r--r-- 1 root root 4.0K Apr 24 09:23 147
-r--r--r-- 1 root root 4.0K Apr 24 09:23 148
-r--r--r-- 1 root root 4.0K Apr 24 09:23 149

On top of that I can't imagine that Linus has removed the sysfs interfaces without giving something instead. And /proc/interrupts is not such an interface.

@avikivity
Copy link

Normal practice is not to remove user-facing interfaces without a long deprecation period.

@vladzcloudius
Copy link

Hi @mykaul

This is actually a change on the kernel side to remove msi_irqs sysfs entries.

The change is torvalds/linux@24cff37 It was introduced in kernel 5.17.

@mykaul could you, please, re-open this GH issue?

@davidarinzon the patch you have referenced doesn't seem to disable sysfs interfaces. AFAIKT it's an internal kernel code rework which doesn't intend to remove any user facing interfaces.

sysfs interfaces can be disabled if CONFIG_SYSFS is set to n - however it's not the case.

The configuration of the kernel in question has the following:

$ grep CONFIG_SYSFS /boot/config-5.19.0-1022-aws 
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_SYSFS_SYSCALL=y
CONFIG_SYSFS=y

The same as your 5.15 kernel where msi_irqs are properly populated:

$ grep CONFIG_SYSFS /boot/config-5.15.0-1031-aws 
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_SYSFS_SYSCALL=y
CONFIG_SYSFS=y

And the same as a 6.2.12 kernel on my laptop which also has msi_irqs populated as I've mentioned above:

$ grep CONFIG_SYSFS /boot/config-6.2.12-060212-generic                                                                                                                                                     
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_SYSFS_SYSCALL=y
CONFIG_SYSFS=y

So, your previous explanation makes very little sense to me unfortunately.
Could you, please, clarify again why you think the patch above made your 5.19 kernel assembly not populate sysfs msi_irqs interface while not causing the same effect on 6.2.12 kernel?

Attaching kernel configurations of all kernels referenced above:
configs.zip

@vladzcloudius
Copy link

I tested the same kernel with the KVM-based i4i instances and not very surprisingly msi_irqs were populated as expected:

scyllaadm@ip-10-99-17-182:~$ uname -a
Linux ip-10-99-17-182 5.19.0-1022-aws #23~22.04.1-Ubuntu SMP Fri Mar 17 15:38:24 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
scyllaadm@ip-10-99-17-182:~$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
scyllaadm@ip-10-99-17-182:~$ sudo ls -al /sys/devices/pci0000:00/0000:00:05.0/msi_irqs/
total 0
drwxr-xr-x 2 root root    0 Apr 26 14:11 .
drwxr-xr-x 6 root root    0 Apr 26 14:10 ..
-r--r--r-- 1 root root 4096 Apr 26 14:11 36
-r--r--r-- 1 root root 4096 Apr 26 14:11 37
-r--r--r-- 1 root root 4096 Apr 26 14:11 38
-r--r--r-- 1 root root 4096 Apr 26 14:11 39
-r--r--r-- 1 root root 4096 Apr 26 14:11 40
-r--r--r-- 1 root root 4096 Apr 26 14:11 41
-r--r--r-- 1 root root 4096 Apr 26 14:11 42
-r--r--r-- 1 root root 4096 Apr 26 14:11 43
-r--r--r-- 1 root root 4096 Apr 26 14:11 44
scyllaadm@ip-10-99-17-182:~$ 

It proves it has something to do with this specific kernel assembly and its interaction with Xen... and that it's a real bug.

@davidarinzon @amitbern-aws FYI

@davidarinzon
Copy link
Contributor

Hi @vladzcloudius, @mykaul, @avikivity

After conducting additional tests, we acknowledge the issue. While the issue is unrelated to the ENA driver, we have internally assigned it to one of our kernel teams. We appreciate your communication regarding this matter.

@mykaul
Copy link
Author

mykaul commented Apr 27, 2023

@davidarinzon - thanks for letting us know (and @vladzcloudius for analyzing this!).
How / where can we track progress on this matter (as this issue is closed) ?

@davidarinzon
Copy link
Contributor

Hi @mykaul, you can contact AWS support.

@vladzcloudius
Copy link

vladzcloudius commented Apr 27, 2023

@davidarinzon - thanks for letting us know (and @vladzcloudius for analyzing this!). How / where can we track progress on this matter (as this issue is closed) ?

@mykaul As an author of this GH issue you should be able to re-open it, aren't you?

@mykaul
Copy link
Author

mykaul commented Apr 27, 2023

@davidarinzon - thanks for letting us know (and @vladzcloudius for analyzing this!). How / where can we track progress on this matter (as this issue is closed) ?

@mykaul As an author of this GH issue you should be able to re-open it, aren't you?

I thought so too :-/

@vladzcloudius
Copy link

@davidarinzon - thanks for letting us know (and @vladzcloudius for analyzing this!). How / where can we track progress on this matter (as this issue is closed) ?

@mykaul As an author of this GH issue you should be able to re-open it, aren't you?

I thought so too :-/

And you can't? That's weird.
I suggest to open a new one referencing this one and asking Amazon people not to close it till the issue is actually resolved.

@ndagan
Copy link

ndagan commented Apr 28, 2023

I reopened the issue for your convenience. However, let's put the discussion in this thread off. This forum is for ENA driver specific issues.
We opened an internal ticket for AWS kernel teams and they are investigating this issue further. I encourage you to also contact AWS support.

Thanks.

@ndagan ndagan reopened this Apr 28, 2023
@amitbern-aws
Copy link
Contributor

Patch for upstream was posted today by AWS kernel team: https://lore.kernel.org/xen-devel/20230503131656.15928-1-mheyne@amazon.de/

@davidarinzon
Copy link
Contributor

And update about the upstream, it was merged (torvalds/linux@335b422) and is part of https://github.com/torvalds/linux/releases/tag/v6.4-rc4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants