Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All interrupts running on single cpu #125

Closed
saifat29 opened this issue Oct 11, 2019 · 7 comments
Closed

All interrupts running on single cpu #125

saifat29 opened this issue Oct 11, 2019 · 7 comments

Comments

@saifat29
Copy link

All interrupts are handled by a single CPU and as soon as incoming requests increases the cpu gets to 100% CPU utilisation even though the other cores are at 70%.

irqbalance is not balancing interrupts to other CPUs.

Server Details-

Type: Dedicated
Model: Intel(R) Xeon(R) CPU E3-1270 V2
OS: Ubuntu 18.04.3 LTS (Bionic Beaver)

Irqbalance version: 1.3.0 (Bundled with OS)

# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:         37          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          0          0          0          0          2          0          0   IO-APIC   1-edge      i8042
  4:          0          0          0          0          0          0          0       2191   IO-APIC   4-edge      ttyS0
  8:          0          0          0          0          0          0          1          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 12:          0          0          0          0          4          0          0          0   IO-APIC  12-edge      i8042
 16:          0          0         71          0          0          0          0          0   IO-APIC  16-fasteoi   ehci_hcd:usb1
 23:          0          0          0         27          0          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:       4556          0      57417          0          0          0          0          0   PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 25:          0          0          0          0        241          0    2919057          0   PCI-MSI 1048576-edge      enp2s0-rx-0
 26:          0          0          0          0          0    3009931          0          0   PCI-MSI 1048577-edge      enp2s0-tx-0
 27:          0          0          0          0          0          0          2          0   PCI-MSI 1048578-edge      enp2s0
 28:          0       5292          0          0          0          0          0          0   PCI-MSI 1572864-edge      enp3s0-rx-0
 29:          0          0         12          0         88          0          0          0   PCI-MSI 1572865-edge      enp3s0-tx-0
 30:          0          0          0          2          0          0          0          0   PCI-MSI 1572866-edge      enp3s0
NMI:         11          9         10          8          8          9         24         13   Non-maskable interrupts
LOC:     688031     752511    1074206    1117759    1284159    1576357     613882     828182   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:         11          9         10          8          8          9         24         13   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:      33978      19889      10239       7471       6435      16293      33420       8775   Rescheduling interrupts
CAL:      17249      16771      16739      15677      16670      17259      17144      16861   Function call interrupts
TLB:       7915       8115       8396       8152       8480       8481       8441       8372   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:         17         18         18         18         18         18         18         18   Machine check polls
HYP:          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event
# irqbalance --debug
Isolated CPUs: 00000000
Adaptive-ticks CPUs: 00000000
Banned CPUs: 00000000
Package 0:  numa_node 0 cpu mask is 000000ff (load 0)
        Cache domain 0:  numa_node is 0 cpu mask is 00000088  (load 0)
                CPU number 7  numa_node is 0 (load 0)
                CPU number 3  numa_node is 0 (load 0)
        Cache domain 1:  numa_node is 0 cpu mask is 00000022  (load 0)
                CPU number 5  numa_node is 0 (load 0)
                CPU number 1  numa_node is 0 (load 0)
        Cache domain 2:  numa_node is 0 cpu mask is 00000044  (load 0)
                CPU number 6  numa_node is 0 (load 0)
                CPU number 2  numa_node is 0 (load 0)
        Cache domain 3:  numa_node is 0 cpu mask is 00000011  (load 0)
                CPU number 4  numa_node is 0 (load 0)
                CPU number 0  numa_node is 0 (load 0)
Adding IRQ 28 to database
Adding IRQ 30 to database
Adding IRQ 29 to database
Adding IRQ 24 to database
Adding IRQ 16 to database
Adding IRQ 25 to database
Adding IRQ 26 to database
Adding IRQ 27 to database
Adding IRQ 11 to database
Adding IRQ 23 to database
Adding IRQ 18 to database
Adding IRQ 0 to database
Adding IRQ 1 to database
Adding IRQ 4 to database
Adding IRQ 8 to database
Adding IRQ 9 to database
Adding IRQ 12 to database
NUMA NODE NUMBER: -1
LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff

NUMA NODE NUMBER: 0
LOCAL CPU MASK: 000000ff




-----------------------------------------------------------------------------
Package 0:  numa_node 0 cpu mask is 000000ff (load 0)
        Cache domain 0:  numa_node is 0 cpu mask is 00000088  (load 0)
                CPU number 7  numa_node is 0 (load 0)
                  Interrupt 27 node_num is -1 (ethernet/0:0)
                CPU number 3  numa_node is 0 (load 0)
                  Interrupt 30 node_num is -1 (ethernet/0:0)
          Interrupt 18 node_num is -1 (legacy/0:0)
        Cache domain 1:  numa_node is 0 cpu mask is 00000022  (load 0)
                CPU number 5  numa_node is 0 (load 0)
                  Interrupt 26 node_num is -1 (ethernet/0:28295)
                CPU number 1  numa_node is 0 (load 0)
                  Interrupt 28 node_num is -1 (ethernet/0:11)
          Interrupt 23 node_num is -1 (legacy/0:0)
        Cache domain 2:  numa_node is 0 cpu mask is 00000044  (load 0)
                CPU number 6  numa_node is 0 (load 0)
                  Interrupt 25 node_num is -1 (ethernet/0:31647)
                CPU number 2  numa_node is 0 (load 0)
                  Interrupt 24 node_num is -1 (storage/0:4)
          Interrupt 11 node_num is -1 (legacy/0:0)
        Cache domain 3:  numa_node is 0 cpu mask is 00000011  (load 0)
                CPU number 4  numa_node is 0 (load 0)
                  Interrupt 29 node_num is -1 (ethernet/0:1)
                CPU number 0  numa_node is 0 (load 0)
          Interrupt 16 node_num is -1 (legacy/0:0)
  Interrupt 12 node_num is -1 (other/0:0)
  Interrupt 9 node_num is -1 (other/0:0)
  Interrupt 8 node_num is -1 (other/0:0)
  Interrupt 4 node_num is -1 (other/0:0)
  Interrupt 1 node_num is -1 (other/0:0)
  Interrupt 0 node_num is -1 (other/0:0)



-----------------------------------------------------------------------------
Package 0:  numa_node 0 cpu mask is 000000ff (load 1500000000)
        Cache domain 0:  numa_node is 0 cpu mask is 00000088  (load 90000000)
                CPU number 7  numa_node is 0 (load 80000000)
                  Interrupt 27 node_num is -1 (ethernet/1:0)
                CPU number 3  numa_node is 0 (load 10000000)
                  Interrupt 30 node_num is -1 (ethernet/1:0)
          Interrupt 18 node_num is -1 (legacy/1:0)
        Cache domain 1:  numa_node is 0 cpu mask is 00000022  (load 50000000)
                CPU number 5  numa_node is 0 (load 20000000)
                  Interrupt 26 node_num is -1 (ethernet/19992390:28358)
                CPU number 1  numa_node is 0 (load 30000000)
                  Interrupt 28 node_num is -1 (ethernet/29999997:9)
          Interrupt 23 node_num is -1 (legacy/1:0)
        Cache domain 2:  numa_node is 0 cpu mask is 00000044  (load 1300000000)
                CPU number 6  numa_node is 0 (load 1290000000)
                  Interrupt 25 node_num is -1 (ethernet/1289986894:31591)
                CPU number 2  numa_node is 0 (load 10000000)
                  Interrupt 24 node_num is -1 (storage/9999978:51)
          Interrupt 11 node_num is -1 (legacy/1:0)
        Cache domain 3:  numa_node is 0 cpu mask is 00000011  (load 60000000)
                CPU number 4  numa_node is 0 (load 10000000)
                  Interrupt 29 node_num is -1 (ethernet/1:0)
                CPU number 0  numa_node is 0 (load 50000000)
          Interrupt 16 node_num is -1 (legacy/1:0)
  Interrupt 12 node_num is -1 (other/1:0)
  Interrupt 9 node_num is -1 (other/1:0)
  Interrupt 8 node_num is -1 (other/1:0)
  Interrupt 4 node_num is -1 (other/1:0)
  Interrupt 1 node_num is -1 (other/1:0)
  Interrupt 0 node_num is -1 (other/1:0)



-----------------------------------------------------------------------------
Package 0:  numa_node 0 cpu mask is 000000ff (load 1540000000)
        Cache domain 0:  numa_node is 0 cpu mask is 00000088  (load 70000000)
                CPU number 7  numa_node is 0 (load 60000000)
                  Interrupt 27 node_num is -1 (ethernet/1:0)
                CPU number 3  numa_node is 0 (load 10000000)
                  Interrupt 30 node_num is -1 (ethernet/1:0)
          Interrupt 18 node_num is -1 (legacy/1:0)
        Cache domain 1:  numa_node is 0 cpu mask is 00000022  (load 60000000)
                CPU number 5  numa_node is 0 (load 20000000)
                  Interrupt 26 node_num is -1 (ethernet/19997232:28086)
                CPU number 1  numa_node is 0 (load 40000000)
                  Interrupt 28 node_num is -1 (ethernet/40000000:10)
          Interrupt 23 node_num is -1 (legacy/1:0)
        Cache domain 2:  numa_node is 0 cpu mask is 00000044  (load 1350000000)
                CPU number 6  numa_node is 0 (load 1340000000)
                  Interrupt 25 node_num is -1 (ethernet/1339979553:31259)
                CPU number 2  numa_node is 0 (load 10000000)
                  Interrupt 24 node_num is -1 (storage/9999999:9)
          Interrupt 11 node_num is -1 (legacy/1:0)
        Cache domain 3:  numa_node is 0 cpu mask is 00000011  (load 60000000)
                CPU number 4  numa_node is 0 (load 0)
                  Interrupt 29 node_num is -1 (ethernet/1:0)
                CPU number 0  numa_node is 0 (load 60000000)
          Interrupt 16 node_num is -1 (legacy/1:0)
  Interrupt 12 node_num is -1 (other/1:0)
  Interrupt 9 node_num is -1 (other/1:0)
  Interrupt 8 node_num is -1 (other/1:0)
  Interrupt 4 node_num is -1 (other/1:0)
  Interrupt 1 node_num is -1 (other/1:0)
  Interrupt 0 node_num is -1 (other/1:0)

content of smp_affinity files-

# for i in $(seq 0 300); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done
/proc/irq/0/smp_affinity:ff
/proc/irq/1/smp_affinity:ff
/proc/irq/2/smp_affinity:ff
/proc/irq/3/smp_affinity:ff
/proc/irq/4/smp_affinity:ff
/proc/irq/5/smp_affinity:ff
/proc/irq/6/smp_affinity:ff
/proc/irq/7/smp_affinity:ff
/proc/irq/8/smp_affinity:ff
/proc/irq/9/smp_affinity:ff
/proc/irq/10/smp_affinity:ff
/proc/irq/11/smp_affinity:44
/proc/irq/12/smp_affinity:ff
/proc/irq/13/smp_affinity:ff
/proc/irq/14/smp_affinity:ff
/proc/irq/15/smp_affinity:ff
/proc/irq/16/smp_affinity:11
/proc/irq/23/smp_affinity:22
/proc/irq/24/smp_affinity:04
/proc/irq/25/smp_affinity:40
/proc/irq/26/smp_affinity:20
/proc/irq/27/smp_affinity:80
/proc/irq/28/smp_affinity:02
/proc/irq/29/smp_affinity:10
/proc/irq/30/smp_affinity:08
# lstopo-no-graphics
Machine (16GB)
  Package L#0 + L3 L#0 (8192KB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#4)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#5)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#6)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#7)
  HostBridge L#0
    PCIBridge
      PCI 8086:10d3
        Net L#0 "enp2s0"
    PCIBridge
      PCI 8086:10d3
        Net L#1 "enp3s0"
    PCIBridge
      PCI 102b:0532
        GPU L#2 "controlD64"
        GPU L#3 "card0"
    PCI 8086:1c02
      Block(Disk) L#4 "sda"

This below htop screenshot is at very low requests per second-
htop screenshot

I referred to the following issues but nothing helped-
All eth0 interrupts going to CPU0 and saturating it on Debian Stretch, kernel 4.9.0
All interruptions seem to run on a single CPU

@nhorman
Copy link
Member

nhorman commented Oct 14, 2019

irqbalance 1.3.0 is very old. You are likely running into an issue fixed by commit 723c41c. Please try and build the latest irqbalance version.

@saifat29
Copy link
Author

I already tried v1.6.0 which is the latest version on one of my server but still the problem persists. I'll try it on other servers as well and reply back here with the results.

@nhorman
Copy link
Member

nhorman commented Oct 21, 2019

please include the debug output of version 1.6.0 as you did above with version 1.3.0

@saifat29
Copy link
Author

Sure, I will

@saifat29
Copy link
Author

saifat29 commented Nov 3, 2019

The problem still persists on latest irqbalance v1.6.0

After uninstalling irqbalance v1.3.0 that was bundled with OS, I installed the latest version (v1.6.0) of irqbalance as follows-

git clone https://github.com/Irqbalance/irqbalance.git
git tag
git checkout tags/v1.6.0 -b install-v1.6.0

apt install -y  autoconf pkg-config libtool libglib2.0-dev libncurses5-dev libncursesw5-dev

./autogen.sh
./configure
make
make install
mv /usr/local/sbin/irqbalance /usr/sbin/
mv /usr/local/sbin/irqbalance-ui /usr/sbin/
~# irqbalance --version
irqbalance version 1.6.0

~# cat /lib/systemd/system/irqbalance.service
[Unit]
Description=irqbalance daemon
Documentation=man:irqbalance(1)
Documentation=https://github.com/Irqbalance/irqbalance
ConditionVirtualization=!container

[Service]
EnvironmentFile=-/etc/default/irqbalance.env
ExecStart=/usr/sbin/irqbalance --foreground $IRQBALANCE_ARGS
CapabilityBoundingSet=
NoNewPrivileges=yes
ReadOnlyPaths=/
ReadWritePaths=/proc/irq
RestrictAddressFamilies=AF_UNIX
RuntimeDirectory=irqbalance/

[Install]
WantedBy=multi-user.target

~# cat /etc/default/irqbalance.env
# irqbalance is a daemon process that distributes interrupts across
# CPUs on SMP systems.  The default is to rebalance once every 10
# seconds.  This is the environment file that is specified to systemd via the
# EnvironmentFile key in the service unit file (or via whatever method the init
# system you're using has).

#
# IRQBALANCE_ONESHOT
#    After starting, wait for a minute, then look at the interrupt
#    load and balance it once; after balancing exit and do not change
#    it again.
#
#IRQBALANCE_ONESHOT=

#
# IRQBALANCE_BANNED_CPUS
#    64 bit bitmask which allows you to indicate which CPUs should
#    be skipped when reblancing IRQs.  CPU numbers which have their
#    corresponding bits set to one in this mask will not have any
#    IRQs assigned to them on rebalance.
#
#IRQBALANCE_BANNED_CPUS=

#
# IRQBALANCE_ARGS
#    Append any args here to the irqbalance daemon as documented in the man
#    page.
#
#IRQBALANCE_ARGS=

~# systemctl enable irqbalance.service
~# systemctl start irqbalance
~# systemctl status irqbalance
● irqbalance.service - irqbalance daemon
   Loaded: loaded (/lib/systemd/system/irqbalance.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-11-03 12:20:49 UTC; 21min ago
     Docs: man:irqbalance(1)
           https://github.com/Irqbalance/irqbalance
 Main PID: 1171 (irqbalance)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/irqbalance.service
           └─1171 /usr/sbin/irqbalance --foreground

In this htop screenshot the purple colour denotes IRQ which can be seen running on only single Core-
htop screenshot

Server Details-
Type: Dedicated
Model: Intel(R) Xeon(R) CPU E3-1270 V2
OS: Ubuntu 18.04.3 LTS (Bionic Beaver)

Irqbalance version: 1.6.0 (Manual build and install from source)

Note: enp2s0 is the IPv4 network interface being used

~# cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:         37          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          0          0          0          0          2          0          0   IO-APIC   1-edge      i8042
  4:          0          0          0          0          0          0          0       2172   IO-APIC   4-edge      ttyS0
  8:          0          0          0          0          0          0          1          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 12:          0          0          0          0          4          0          0          0   IO-APIC  12-edge      i8042
 16:          0          0         70          0          0          0          0          0   IO-APIC  16-fasteoi   ehci_hcd:usb1
 23:          0          0          0         35          0          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:       6226          0       9853          0          0          0          0          0   PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 25:          0          0          0          0        281          0     572789          0   PCI-MSI 1048576-edge      enp2s0-rx-0
 26:          0          0          0          0          0     574776          0          0   PCI-MSI 1048577-edge      enp2s0-tx-0
 27:          0          0          0          0          0          0          2          0   PCI-MSI 1048578-edge      enp2s0
 28:          0       2154          0          0          0          0          0          0   PCI-MSI 1572864-edge      enp3s0-rx-0
 29:          0          0         12          0         39          0          0          0   PCI-MSI 1572865-edge      enp3s0-tx-0
 30:          0          0          0          2          0          0          0          0   PCI-MSI 1572866-edge      enp3s0
NMI:          2          2          2          2          2          2          3          3   Non-maskable interrupts
LOC:     129900     119613     110440     119125     115008     109256     102699     131138   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          2          2          2          2          2          2          3          3   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:       4806       3508       2495       2303       1918       9876       5228       2874   Rescheduling interrupts
CAL:       3831       3429       3637       3659       3796       3673       3986       3594   Function call interrupts
TLB:        894        839        882        819       1141        782       1163        949   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          7          8          8          8          8          8          8          8   Machine check polls
HYP:          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event
~# irqbalance --debug
This machine seems not NUMA capable.
Isolated CPUs: 00000000
Adaptive-ticks CPUs: 00000000
Banned CPUs: 00000000
Package 0:  numa_node -1 cpu mask is 000000ff (load 0)
        Cache domain 0:  numa_node is -1 cpu mask is 00000088  (load 0)
                CPU number 7  numa_node is -1 (load 0)
                CPU number 3  numa_node is -1 (load 0)
        Cache domain 1:  numa_node is -1 cpu mask is 00000022  (load 0)
                CPU number 5  numa_node is -1 (load 0)
                CPU number 1  numa_node is -1 (load 0)
        Cache domain 2:  numa_node is -1 cpu mask is 00000044  (load 0)
                CPU number 6  numa_node is -1 (load 0)
                CPU number 2  numa_node is -1 (load 0)
        Cache domain 3:  numa_node is -1 cpu mask is 00000011  (load 0)
                CPU number 4  numa_node is -1 (load 0)
                CPU number 0  numa_node is -1 (load 0)
Adding IRQ 28 to database
Adding IRQ 30 to database
Adding IRQ 29 to database
Adding IRQ 24 to database
Adding IRQ 16 to database
Adding IRQ 25 to database
Adding IRQ 26 to database
Adding IRQ 27 to database
Adding IRQ 11 to database
Adding IRQ 23 to database
Adding IRQ 18 to database
Adding IRQ 0 to database
Adding IRQ 1 to database
Adding IRQ 4 to database
Adding IRQ 8 to database
Adding IRQ 9 to database
Adding IRQ 12 to database
NUMA NODE NUMBER: -1
LOCAL CPU MASK: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff




-----------------------------------------------------------------------------
Package 0:  numa_node -1 cpu mask is 000000ff (load 0)
        Cache domain 0:  numa_node is -1 cpu mask is 00000088  (load 0)
                CPU number 7  numa_node is -1 (load 0)
                  Interrupt 27 node_num is -1 (ethernet/0:0)
                CPU number 3  numa_node is -1 (load 0)
                  Interrupt 30 node_num is -1 (ethernet/0:0)
          Interrupt 18 node_num is -1 (legacy/0:0)
        Cache domain 1:  numa_node is -1 cpu mask is 00000022  (load 0)
                CPU number 5  numa_node is -1 (load 0)
                  Interrupt 26 node_num is -1 (ethernet/0:1)
                CPU number 1  numa_node is -1 (load 0)
                  Interrupt 28 node_num is -1 (ethernet/0:10)
          Interrupt 23 node_num is -1 (legacy/0:0)
        Cache domain 2:  numa_node is -1 cpu mask is 00000044  (load 0)
                CPU number 6  numa_node is -1 (load 0)
                  Interrupt 25 node_num is -1 (ethernet/0:103)
                CPU number 2  numa_node is -1 (load 0)
                  Interrupt 24 node_num is -1 (storage/0:0)
          Interrupt 11 node_num is -1 (legacy/0:0)
        Cache domain 3:  numa_node is -1 cpu mask is 00000011  (load 0)
                CPU number 4  numa_node is -1 (load 0)
                  Interrupt 29 node_num is -1 (ethernet/0:0)
                CPU number 0  numa_node is -1 (load 0)
          Interrupt 16 node_num is -1 (legacy/0:0)
  Interrupt 12 node_num is -1 (other/0:0)
  Interrupt 9 node_num is -1 (other/0:0)
  Interrupt 8 node_num is -1 (other/0:0)
  Interrupt 4 node_num is -1 (other/0:0)
  Interrupt 1 node_num is -1 (other/0:0)
  Interrupt 0 node_num is -1 (other/0:0)



-----------------------------------------------------------------------------
Package 0:  numa_node -1 cpu mask is 000000ff (load 0)
        Cache domain 0:  numa_node is -1 cpu mask is 00000088  (load 0)
                CPU number 7  numa_node is -1 (load 0)
                  Interrupt 27 node_num is -1 (ethernet/1:0)
                CPU number 3  numa_node is -1 (load 0)
                  Interrupt 30 node_num is -1 (ethernet/1:0)
          Interrupt 18 node_num is -1 (legacy/1:0)
        Cache domain 1:  numa_node is -1 cpu mask is 00000022  (load 0)
                CPU number 5  numa_node is -1 (load 0)
                  Interrupt 26 node_num is -1 (ethernet/1:13)
                CPU number 1  numa_node is -1 (load 0)
                  Interrupt 28 node_num is -1 (ethernet/1:11)
          Interrupt 23 node_num is -1 (legacy/1:0)
        Cache domain 2:  numa_node is -1 cpu mask is 00000044  (load 0)
                CPU number 6  numa_node is -1 (load 0)
                  Interrupt 25 node_num is -1 (ethernet/1:123)
                CPU number 2  numa_node is -1 (load 0)
                  Interrupt 24 node_num is -1 (storage/1:0)
          Interrupt 11 node_num is -1 (legacy/1:0)
        Cache domain 3:  numa_node is -1 cpu mask is 00000011  (load 0)
                CPU number 4  numa_node is -1 (load 0)
                  Interrupt 29 node_num is -1 (ethernet/1:0)
                CPU number 0  numa_node is -1 (load 0)
          Interrupt 16 node_num is -1 (legacy/1:0)
  Interrupt 12 node_num is -1 (other/1:0)
  Interrupt 9 node_num is -1 (other/1:0)
  Interrupt 8 node_num is -1 (other/1:0)
  Interrupt 4 node_num is -1 (other/1:0)
  Interrupt 1 node_num is -1 (other/1:0)
  Interrupt 0 node_num is -1 (other/1:0)



-----------------------------------------------------------------------------
Package 0:  numa_node -1 cpu mask is 000000ff (load 10000000)
        Cache domain 0:  numa_node is -1 cpu mask is 00000088  (load 0)
                CPU number 7  numa_node is -1 (load 0)
                  Interrupt 27 node_num is -1 (ethernet/1:0)
                CPU number 3  numa_node is -1 (load 0)
                  Interrupt 30 node_num is -1 (ethernet/1:0)
          Interrupt 18 node_num is -1 (legacy/1:0)
        Cache domain 1:  numa_node is -1 cpu mask is 00000022  (load 0)
                CPU number 5  numa_node is -1 (load 0)
                  Interrupt 26 node_num is -1 (ethernet/1:21)
                CPU number 1  numa_node is -1 (load 0)
                  Interrupt 28 node_num is -1 (ethernet/1:11)
          Interrupt 23 node_num is -1 (legacy/1:0)
        Cache domain 2:  numa_node is -1 cpu mask is 00000044  (load 10000000)
                CPU number 6  numa_node is -1 (load 0)
                  Interrupt 25 node_num is -1 (ethernet/1:122)
                CPU number 2  numa_node is -1 (load 10000000)
                  Interrupt 24 node_num is -1 (storage/10000000:50)
          Interrupt 11 node_num is -1 (legacy/1:0)
        Cache domain 3:  numa_node is -1 cpu mask is 00000011  (load 0)
                CPU number 4  numa_node is -1 (load 0)
                  Interrupt 29 node_num is -1 (ethernet/1:1)
                CPU number 0  numa_node is -1 (load 0)
          Interrupt 16 node_num is -1 (legacy/1:0)
  Interrupt 12 node_num is -1 (other/1:0)
  Interrupt 9 node_num is -1 (other/1:0)
  Interrupt 8 node_num is -1 (other/1:0)
  Interrupt 4 node_num is -1 (other/1:0)
  Interrupt 1 node_num is -1 (other/1:0)
  Interrupt 0 node_num is -1 (other/1:0)

content of smp_affinity files-

~# for i in $(seq 0 300); do grep . /proc/irq/$i/smp_affinity /dev/null 2>/dev/null; done
/proc/irq/0/smp_affinity:ff
/proc/irq/1/smp_affinity:ff
/proc/irq/2/smp_affinity:ff
/proc/irq/3/smp_affinity:ff
/proc/irq/4/smp_affinity:ff
/proc/irq/5/smp_affinity:ff
/proc/irq/6/smp_affinity:ff
/proc/irq/7/smp_affinity:ff
/proc/irq/8/smp_affinity:ff
/proc/irq/9/smp_affinity:ff
/proc/irq/10/smp_affinity:ff
/proc/irq/11/smp_affinity:44
/proc/irq/12/smp_affinity:ff
/proc/irq/13/smp_affinity:ff
/proc/irq/14/smp_affinity:ff
/proc/irq/15/smp_affinity:ff
/proc/irq/16/smp_affinity:11
/proc/irq/23/smp_affinity:22
/proc/irq/24/smp_affinity:04
/proc/irq/25/smp_affinity:40
/proc/irq/26/smp_affinity:20
/proc/irq/27/smp_affinity:80
/proc/irq/28/smp_affinity:02
/proc/irq/29/smp_affinity:10
/proc/irq/30/smp_affinity:08
~# lstopo-no-graphics
Machine (16GB)
  Package L#0 + L3 L#0 (8192KB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#4)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#5)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#6)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#7)
  HostBridge L#0
    PCIBridge
      PCI 8086:10d3
        Net L#0 "enp2s0"
    PCIBridge
      PCI 8086:10d3
        Net L#1 "enp3s0"
    PCIBridge
      PCI 102b:0532
        GPU L#2 "controlD64"
        GPU L#3 "card0"
    PCI 8086:1c02
      Block(Disk) L#4 "sda"

@saifat29
Copy link
Author

saifat29 commented Nov 6, 2019

@nhorman Sorry to bother, I've added the debug output of irqbalance v1.6.0.
I am facing the issue in latest version (v1.6.0) too.
I'm stuck on this issue for a very long time and currently don't see a road ahead without irqbalance.

@nhorman
Copy link
Member

nhorman commented Nov 20, 2019

wait a second, this is working exactly as its supposed to. I'm sorry, i should have looked closer at your initial comments. Irqbalance is meant to balance workloads on cpus, not balance any single irq accross multiple cpus. It does this intentionally, because it increases data cache hit rates. If you continually re-affine interrupts to different cpus, you will get worse perfomance than you would otherwise. Don't do that. If you want to see that in action, just don't run irqbalance. That will make all cpus elligible to receive all interupts, but what you will find is that the hardware will just select the first available cpu, which will coalesce interrupts on a single cpu, until its saturated, at which point they will overflow to the next cpu. Thats well and good, but what you will find is that any application that gets scheduled on a low cpu (i.e. cpu0), will experience higher latency than other processes.

@nhorman nhorman closed this as completed Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants