-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coredumps (Kernel 6.3.12) #267
Comments
[root@rdsan01 uglymotha]# ./irqbalance -df NUMA NODE NUMBER: 2 NUMA NODE NUMBER: 0 NUMA NODE NUMBER: 3 NUMA NODE NUMBER: 1 IRQ 214 is removed from interrupts_db. cannot change irq 98's affinity, add it to banned listIRQ 98 was BANNED. Segmentation fault (core dumped) |
provide a backtrace from the core please, and note the git hash that this was built from |
Core was generated by `./irqbalance --foreground'. It was built with git master yesterday |
Can you please rebuild from commit 184c950 and try again? If it fixes the problem I'll revert the offending commit |
That llooks to be the cause, irqbalance has now been running for 24hours without coredump. Thanks. |
reverted the offending commit |
Hmm, the root cause is probably from commit 55c5c32. |
This reverts commit 55c5c32. The banned_irqs list is meant to be configured by the user, it is not a dynamic list. Also, remove_one_irq_from_db() was freeing an irq_info struct which was still referenced in the banned_irqs list. Iterating on any of these lists will cause a use after free error. Fixes: Irqbalance#267 Signed-off-by: Robin Jarry <rjarry@redhat.com>
This reverts commit 55c5c32. The banned_irqs list is meant to be configured by the user, it is not a dynamic list. Also, remove_one_irq_from_db() was freeing an irq_info struct which was still referenced in banned_irqs and potentially in rebalance_irq_list. Iterating on any of these lists will cause a use after free error. Fixes: Irqbalance#267 Signed-off-by: Robin Jarry <rjarry@redhat.com>
add_banned_irq appends the irq_info to the banned_irqs list. remove_one_irq_from_db removes it from the interrupts_db and free()s it. This leaves an invalid pointer dangling in banned_irqs *and* potentially in rebalance_irq_list which can cause use-after-free errors. Do not move the irq_info around. Only add a flag to indicate that this irq's affinity cannot be managed and ignore the irq when this flag is set. Link: Irqbalance#267 Fixes: 55c5c32 ("arm64: Add irq aff change check For aarch64...") Signed-off-by: Robin Jarry <rjarry@redhat.com>
add_banned_irq appends the irq_info to the banned_irqs list. remove_one_irq_from_db removes it from the interrupts_db and free()s it. This leaves an invalid pointer dangling in banned_irqs *and* potentially in rebalance_irq_list which can cause use-after-free errors. Do not move the irq_info around. Only add a flag to indicate that this irq's affinity cannot be managed and ignore the irq when this flag is set. Link: Irqbalance#267 Fixes: 55c5c32 ("arm64: Add irq aff change check For aarch64...") Signed-off-by: Robin Jarry <rjarry@redhat.com>
add_banned_irq appends the irq_info to the banned_irqs list. remove_one_irq_from_db removes it from the interrupts_db and free()s it. This leaves an invalid pointer dangling in banned_irqs *and* potentially in rebalance_irq_list which can cause use-after-free errors. Do not move the irq_info around. Only add a flag to indicate that this irq's affinity cannot be managed and ignore the irq when this flag is set. Link: Irqbalance#267 Fixes: 55c5c32 ("arm64: Add irq aff change check For aarch64...") Signed-off-by: Robin Jarry <rjarry@redhat.com>
On my system ruinning 6.3.12 kernel, irqbalance coredumps with a SIGSEGV
[ 832.859285] traps: irqbalance[467787] general protection fault ip:55b99ffd5e9d sp:7ffe76006820 error:0 in irqbalance[55b99ffcb000+d000]
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, 9940) = 0 (Timeout)
openat(AT_FDCWD, "/proc/interrupts", O_RDONLY) = 7
newfstatat(7, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
read(7, " CPU0 CPU1 "..., 1024) = 1024
read(7, "0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 PCI"..., 1024) = 1024
read(7, " 0 0 0"..., 1024) = 1024
read(7, " 0 IR-PCI-MSI-0000:00:03"..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, "-PCI-MSI-0000:20:01.1 0-edge "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, ":20:07.1 0-edge PCIe PME"..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, "-edge PCIe PME\n 56: "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0-edge nvme2q0\n 67: "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, "2\n 73: 0 0 "..., 1024) = 1024
read(7, " 0 0 "..., 1024) = 1024
read(7, "0 0 0 "..., 1024) = 1024
read(7, " 0 0 "..., 1024) = 1024
read(7, " 0 0 0"..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, "0 0 0 "..., 1024) = 1024
read(7, " 480867 0 "..., 1024) = 1024
read(7, " 0 0 0"..., 1024) = 1024
read(7, " 0 0 226 "..., 1024) = 1024
read(7, " 0 0 231"..., 1024) = 1024
read(7, " 0 0 0 IR-PC"..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, "PCI-MSI-0000:28:00.0 0-edge "..., 1024) = 1024
read(7, " 0 727 145 "..., 1024) = 1024
read(7, ":00.0 3-edge eno1-TxRx-2"..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, ":00.0 3-edge eno2-TxRx-2"..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, "0.0 3-edge xhci_hcd\n 159"..., 1024) = 1024
read(7, "0 0 0 "..., 1024) = 1024
read(7, "-edge xhci_hcd\n 165: "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " xhci_hcd\n 170: 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, "hcd\n 176: 0 0 "..., 1024) = 1024
read(7, " 0 0 0"..., 1024) = 1024
read(7, ": 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 1024
read(7, " 0 0 0"..., 1024) = 1024
read(7, " 0 0 0 "..., 1024) = 976
read(7, " NMI: 0 0 "..., 1024) = 1024
close(7) = 0
openat(AT_FDCWD, "/proc/stat", O_RDONLY) = 7
newfstatat(7, "", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_EMPTY_PATH) = 0
read(7, "cpu 22176064 17 6522492 2243695"..., 1024) = 1024
read(7, "2 0\ncpu18 578787 0 169788 716682"..., 1024) = 1024
read(7, "984 118751 0 768842 485529 48751"..., 1024) = 1024
read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024
read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024
read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024
read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 1024
read(7, " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"..., 1024) = 810
close(7)
(https://github.com/Irqbalance/irqbalance/files/11982114/core.irqbalance.0.a64b7450a3574ec4a5c28d201f6cac33.27879.zip)
= 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++
[core.irqbalance.0.a64b7450a3574ec4a5c28d201f6cac33.27879.zip]
The text was updated successfully, but these errors were encountered: