New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All eth0 interrupts going to CPU0 and saturating it on Debian Stretch, kernel 4.9.0 #122
Comments
I have also seen this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926967 |
Based on that bug, it sounds like you are using irqbalance in a virtual guest. You shouldn't do that unless you also pin you virtual cpus to physical cpus, otherwise you have no idea what the actual mapping will be. If you are indeed running on a physical host, please let me know what version of irqbalance you are using, and post the output of irqbalance -f -d here |
I shouldn't have linked that bug then, I'm running on bare metal. I mistakenly believed the discussion in that bug was relevant to my issue. irqbalance version: 1.1.0-2.3. There's no config file, irqbalance is running as a systemd daemon (regular, not oneshot). irqbalance -f -d output:
|
Four things:
|
Here's an output with 1 and 2 taken into account. As for isolcpus, I don't know much about it - we don't run our kernel with the isolcpus boot line, and
I wil try an upstream release now. |
1.6.0, just built from master:
After running it this way (and stopping the contintually running irqbalance daemon in the background), the interrupts are becoming better spread:
|
I also don't know what the desired behavior of irqbalance is - should it be putting all ethernet interrupts on CPU0 for cache coherency? Or spreading it across other CPUs to reduce single hotspots/saturation? |
Thanks for the information. In answer to your question, the desired behavior of irqbalance for most general purpose workloads is to spread irqs as much as possible throughout all of your cpus. This helps give an even latency to your user space processes, as well as maintaining a high data cache hit rate (nominally be default, irqs will trigger on different cpus for each raising, leading to dirtying cache on multiple cpus and then not reusing that data). That said, you seem to have a unique situation here. For some reason your distro version of irqbalance is doing something very wrong. Its somehow improperly parsing the isolated bitmap and giving you an isolated cpu mask of 0x1, which it shouldn't be, and that is likely somehow leading to misbalancing In either case however, this was fixed back in 2016 with commit 3c9a009. And from your testing you can see the upstream version is behaving. The debian maintainers need to backport that fix, or better still, just update to a modern version of irqbalance. |
Thanks for the help. Should I be filing a Debian bug/upgrade request somewhere? |
yes, following the instructions here: |
On Debian Stretch hosts with kernel
4.9.0-9-amd64
, irqbalance is putting all eth0 interrupts on CPU0, causing some CPU saturation issues. It's almost as if it's doing the inverse of the expected behavior - on older Debian hosts, it does "the right thing" (unless my expectations are wrong).Should I expect irqbalance to be balancing these across more than one core? That's what I see on my older hosts. E.g. Debian Jessie, kernel
3.16.0
:The text was updated successfully, but these errors were encountered: