-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Affinity mapping fixes #269
Conversation
I think this looks pretty reasonable, will test in the am. Can I ask what testing you have done on it? |
Hi Neil, Here is a dump of https://gist.github.com/rjarry/c465c00d151397da3f9b9e64bd174723 NB: I have set |
add_banned_irq appends the irq_info to the banned_irqs list. remove_one_irq_from_db removes it from the interrupts_db and free()s it. This leaves an invalid pointer dangling in banned_irqs *and* potentially in rebalance_irq_list which can cause use-after-free errors. Do not move the irq_info around. Only add a flag to indicate that this irq's affinity cannot be managed and ignore the irq when this flag is set. Link: Irqbalance#267 Fixes: 55c5c32 ("arm64: Add irq aff change check For aarch64...") Signed-off-by: Robin Jarry <rjarry@redhat.com>
fprintf() is buffered and may not report an error which may be deferred when fflush() is called (either explicitly or internally by fclose()). Check for errors returned by fopen(), fprintf() and fclose() and add IRQ_FLAG_AFFINITY_UNMANAGED accordingly. Fixes: 55c5c32 ("arm64: Add irq aff change check For aarch64, ...") Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2184735 Signed-off-by: Robin Jarry <rjarry@redhat.com>
Interesting that several of your irqs report an ENOENT error: I did notice one issue in your code (comment marked in the commit), other than that, my (admittedly limited testing here shows it works). Thank you for that. If you fix up the errno issue, I'll merge this |
Hi, thanks for the review and testing. I missed your comment. On what commit did you leave it? |
And yes, I was also surprised about the ENOENT errors. Maybe these IRQs are leftovers not properly cleaned up in procfs by the kernel when adding/removing VFs. The folders do exist but seem to be empty. |
Its listed immediately above your last comment. You're guess could very well be correct, if irq registration is happening while we're processing procfs we may well get that error, in either case I would imagine that your handling of it is correct. |
here: |
I did now :) |
If a given IRQ affinity cannot be set, include strerror in the warning message. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2184735 Signed-off-by: Robin Jarry <rjarry@redhat.com>
Some errors reported when writing to smp_affinity are transient. For example, when a CPU interrupt controller does not have enough room to map the IRQ, the kernel will return "No space left on device". This kind of situation can change over time. Do not mark the IRQ affinity as "unmanaged". Let irqbalance try again later. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Should be good now. I also have fixed a typo in a comment. |
Thanks 👍 |
} | ||
fclose(file); | ||
if (ret < 0) | ||
goto error; | ||
info->moved = 0; /*migration is done*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argl I forgot to return after this line... I'll send another patch to fix this.
This seems to cause log spam for some ARM 7 based routers in OpenWrt. Log and strace output for IRQ 48.
Removing EINVAL from the temporary error list in activate.c stops the log spam after one error:
(Curious that it fails in this kernel 5.15 based Linux, but seems to work for the same device with a bit newer 6.1 build.) This change is probably not a bug from irqbalance perspective, but just brings an already existing assignation problem into surface. Not quite sure what to make out of this:
|
Hi there,
This makes me think that it is a driver issue that has been fixed somehow but I could be wrong. In any case, assuming that https://elixir.bootlin.com/linux/v5.15/source/kernel/irq/proc.c#L167 I am not sure why you have that issue with 5.15 and not with 6.1 as this code has not changed in a long time. |
That was actually a wrong statement from me. The behaviour was actually the same. |
Hi,
This series is a follow up on #265 #266 #267 and #268 and addresses this issue: https://bugzilla.redhat.com/show_bug.cgi?id=2184735
The first commit fixes a potential use-after-free error introduced by commit 55c5c32
The second and third patches aim at better error reporting when an IRQ affinity cannot be set.
The last patch makes sure to only consider that an IRQ affinity cannot be managed on specific cases.
I have done some manual testing and everything looks in order.