-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All interrupts running on single cpu #125
Comments
irqbalance 1.3.0 is very old. You are likely running into an issue fixed by commit 723c41c. Please try and build the latest irqbalance version. |
I already tried v1.6.0 which is the latest version on one of my server but still the problem persists. I'll try it on other servers as well and reply back here with the results. |
please include the debug output of version 1.6.0 as you did above with version 1.3.0 |
Sure, I will |
@nhorman Sorry to bother, I've added the debug output of irqbalance v1.6.0. |
wait a second, this is working exactly as its supposed to. I'm sorry, i should have looked closer at your initial comments. Irqbalance is meant to balance workloads on cpus, not balance any single irq accross multiple cpus. It does this intentionally, because it increases data cache hit rates. If you continually re-affine interrupts to different cpus, you will get worse perfomance than you would otherwise. Don't do that. If you want to see that in action, just don't run irqbalance. That will make all cpus elligible to receive all interupts, but what you will find is that the hardware will just select the first available cpu, which will coalesce interrupts on a single cpu, until its saturated, at which point they will overflow to the next cpu. Thats well and good, but what you will find is that any application that gets scheduled on a low cpu (i.e. cpu0), will experience higher latency than other processes. |
All interrupts are handled by a single CPU and as soon as incoming requests increases the cpu gets to 100% CPU utilisation even though the other cores are at 70%.
irqbalance is not balancing interrupts to other CPUs.
Server Details-
Type: Dedicated
Model: Intel(R) Xeon(R) CPU E3-1270 V2
OS: Ubuntu 18.04.3 LTS (Bionic Beaver)
Irqbalance version: 1.3.0 (Bundled with OS)
# irqbalance --debug Isolated CPUs: 00000000 Adaptive-ticks CPUs: 00000000 Banned CPUs: 00000000 Package 0: numa_node 0 cpu mask is 000000ff (load 0) Cache domain 0: numa_node is 0 cpu mask is 00000088 (load 0) CPU number 7 numa_node is 0 (load 0) CPU number 3 numa_node is 0 (load 0) Cache domain 1: numa_node is 0 cpu mask is 00000022 (load 0) CPU number 5 numa_node is 0 (load 0) CPU number 1 numa_node is 0 (load 0) Cache domain 2: numa_node is 0 cpu mask is 00000044 (load 0) CPU number 6 numa_node is 0 (load 0) CPU number 2 numa_node is 0 (load 0) Cache domain 3: numa_node is 0 cpu mask is 00000011 (load 0) CPU number 4 numa_node is 0 (load 0) CPU number 0 numa_node is 0 (load 0) Adding IRQ 28 to database Adding IRQ 30 to database Adding IRQ 29 to database Adding IRQ 24 to database Adding IRQ 16 to database Adding IRQ 25 to database Adding IRQ 26 to database Adding IRQ 27 to database Adding IRQ 11 to database Adding IRQ 23 to database Adding IRQ 18 to database Adding IRQ 0 to database Adding IRQ 1 to database Adding IRQ 4 to database Adding IRQ 8 to database Adding IRQ 9 to database Adding IRQ 12 to database NUMA NODE NUMBER: -1 LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff NUMA NODE NUMBER: 0 LOCAL CPU MASK: 000000ff ----------------------------------------------------------------------------- Package 0: numa_node 0 cpu mask is 000000ff (load 0) Cache domain 0: numa_node is 0 cpu mask is 00000088 (load 0) CPU number 7 numa_node is 0 (load 0) Interrupt 27 node_num is -1 (ethernet/0:0) CPU number 3 numa_node is 0 (load 0) Interrupt 30 node_num is -1 (ethernet/0:0) Interrupt 18 node_num is -1 (legacy/0:0) Cache domain 1: numa_node is 0 cpu mask is 00000022 (load 0) CPU number 5 numa_node is 0 (load 0) Interrupt 26 node_num is -1 (ethernet/0:28295) CPU number 1 numa_node is 0 (load 0) Interrupt 28 node_num is -1 (ethernet/0:11) Interrupt 23 node_num is -1 (legacy/0:0) Cache domain 2: numa_node is 0 cpu mask is 00000044 (load 0) CPU number 6 numa_node is 0 (load 0) Interrupt 25 node_num is -1 (ethernet/0:31647) CPU number 2 numa_node is 0 (load 0) Interrupt 24 node_num is -1 (storage/0:4) Interrupt 11 node_num is -1 (legacy/0:0) Cache domain 3: numa_node is 0 cpu mask is 00000011 (load 0) CPU number 4 numa_node is 0 (load 0) Interrupt 29 node_num is -1 (ethernet/0:1) CPU number 0 numa_node is 0 (load 0) Interrupt 16 node_num is -1 (legacy/0:0) Interrupt 12 node_num is -1 (other/0:0) Interrupt 9 node_num is -1 (other/0:0) Interrupt 8 node_num is -1 (other/0:0) Interrupt 4 node_num is -1 (other/0:0) Interrupt 1 node_num is -1 (other/0:0) Interrupt 0 node_num is -1 (other/0:0) ----------------------------------------------------------------------------- Package 0: numa_node 0 cpu mask is 000000ff (load 1500000000) Cache domain 0: numa_node is 0 cpu mask is 00000088 (load 90000000) CPU number 7 numa_node is 0 (load 80000000) Interrupt 27 node_num is -1 (ethernet/1:0) CPU number 3 numa_node is 0 (load 10000000) Interrupt 30 node_num is -1 (ethernet/1:0) Interrupt 18 node_num is -1 (legacy/1:0) Cache domain 1: numa_node is 0 cpu mask is 00000022 (load 50000000) CPU number 5 numa_node is 0 (load 20000000) Interrupt 26 node_num is -1 (ethernet/19992390:28358) CPU number 1 numa_node is 0 (load 30000000) Interrupt 28 node_num is -1 (ethernet/29999997:9) Interrupt 23 node_num is -1 (legacy/1:0) Cache domain 2: numa_node is 0 cpu mask is 00000044 (load 1300000000) CPU number 6 numa_node is 0 (load 1290000000) Interrupt 25 node_num is -1 (ethernet/1289986894:31591) CPU number 2 numa_node is 0 (load 10000000) Interrupt 24 node_num is -1 (storage/9999978:51) Interrupt 11 node_num is -1 (legacy/1:0) Cache domain 3: numa_node is 0 cpu mask is 00000011 (load 60000000) CPU number 4 numa_node is 0 (load 10000000) Interrupt 29 node_num is -1 (ethernet/1:0) CPU number 0 numa_node is 0 (load 50000000) Interrupt 16 node_num is -1 (legacy/1:0) Interrupt 12 node_num is -1 (other/1:0) Interrupt 9 node_num is -1 (other/1:0) Interrupt 8 node_num is -1 (other/1:0) Interrupt 4 node_num is -1 (other/1:0) Interrupt 1 node_num is -1 (other/1:0) Interrupt 0 node_num is -1 (other/1:0) ----------------------------------------------------------------------------- Package 0: numa_node 0 cpu mask is 000000ff (load 1540000000) Cache domain 0: numa_node is 0 cpu mask is 00000088 (load 70000000) CPU number 7 numa_node is 0 (load 60000000) Interrupt 27 node_num is -1 (ethernet/1:0) CPU number 3 numa_node is 0 (load 10000000) Interrupt 30 node_num is -1 (ethernet/1:0) Interrupt 18 node_num is -1 (legacy/1:0) Cache domain 1: numa_node is 0 cpu mask is 00000022 (load 60000000) CPU number 5 numa_node is 0 (load 20000000) Interrupt 26 node_num is -1 (ethernet/19997232:28086) CPU number 1 numa_node is 0 (load 40000000) Interrupt 28 node_num is -1 (ethernet/40000000:10) Interrupt 23 node_num is -1 (legacy/1:0) Cache domain 2: numa_node is 0 cpu mask is 00000044 (load 1350000000) CPU number 6 numa_node is 0 (load 1340000000) Interrupt 25 node_num is -1 (ethernet/1339979553:31259) CPU number 2 numa_node is 0 (load 10000000) Interrupt 24 node_num is -1 (storage/9999999:9) Interrupt 11 node_num is -1 (legacy/1:0) Cache domain 3: numa_node is 0 cpu mask is 00000011 (load 60000000) CPU number 4 numa_node is 0 (load 0) Interrupt 29 node_num is -1 (ethernet/1:0) CPU number 0 numa_node is 0 (load 60000000) Interrupt 16 node_num is -1 (legacy/1:0) Interrupt 12 node_num is -1 (other/1:0) Interrupt 9 node_num is -1 (other/1:0) Interrupt 8 node_num is -1 (other/1:0) Interrupt 4 node_num is -1 (other/1:0) Interrupt 1 node_num is -1 (other/1:0) Interrupt 0 node_num is -1 (other/1:0)
content of smp_affinity files-
# for i in $(seq 0 300); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done /proc/irq/0/smp_affinity:ff /proc/irq/1/smp_affinity:ff /proc/irq/2/smp_affinity:ff /proc/irq/3/smp_affinity:ff /proc/irq/4/smp_affinity:ff /proc/irq/5/smp_affinity:ff /proc/irq/6/smp_affinity:ff /proc/irq/7/smp_affinity:ff /proc/irq/8/smp_affinity:ff /proc/irq/9/smp_affinity:ff /proc/irq/10/smp_affinity:ff /proc/irq/11/smp_affinity:44 /proc/irq/12/smp_affinity:ff /proc/irq/13/smp_affinity:ff /proc/irq/14/smp_affinity:ff /proc/irq/15/smp_affinity:ff /proc/irq/16/smp_affinity:11 /proc/irq/23/smp_affinity:22 /proc/irq/24/smp_affinity:04 /proc/irq/25/smp_affinity:40 /proc/irq/26/smp_affinity:20 /proc/irq/27/smp_affinity:80 /proc/irq/28/smp_affinity:02 /proc/irq/29/smp_affinity:10 /proc/irq/30/smp_affinity:08
This below
htop
screenshot is at very low requests per second-I referred to the following issues but nothing helped-
All eth0 interrupts going to CPU0 and saturating it on Debian Stretch, kernel 4.9.0
All interruptions seem to run on a single CPU
The text was updated successfully, but these errors were encountered: