Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix min_load to pick actual min load across all objects #61

Closed
vallishgv opened this Issue Nov 14, 2017 · 1 comment

Comments

Projects
None yet
2 participants
@vallishgv
Copy link

vallishgv commented Nov 14, 2017

We recently encountered a corner case bug in irqbalance where irqbalance
was not balancing interrupts across 2 CPU's.

The code snippet in question is:
static void gather_load_stats(struct topo_obj *obj, void *data)
{
struct load_balance_info *info = data;
if (info->min_load == 0 || obj->load < info->min_load)
info->min_load = obj->load;
info->total_load += obj->load;
info->load_sources += 1;
}

The bug we encountered had 2 CPU's with load for the first CPU being 0.
Eg: obj1->load = 0 and obj2->load = 5000.

Iteration 1: obj1 is passed and info initialized to 0 is passed in
as param.
if (info->min_load == 0 || obj->load < info->min_load)
info->min_load = obj->load;
Because info->min_load is 0, info->min_load is set to obj->load which
is also 0 in this case.

Iteration 2: obj2 is passed with info having values set from previous
iteration:
info->min_load = 0, info->total_load = 0, info->load_sources = 1

if (info->min_load == 0 || obj->load < info->min_load)
            info->min_load = obj->load;

Because of the logic used in gather_load_stats() as shown above,
info->min_load gets set to obj2->load which is 5000. This is not the
minimum load. Because of this bug, interrupts do not migrate as this
value is checked in the function which migrates the interrupts.

Test results:

i3.large system - single socket, single core, hyperthreading enabled

With fix:
[ec2-user@ip-10-0-38-254 tmp]$ cat /proc/interrupts | grep "<CPU0>|<60>|<61>|<66>|<67>|<69>|<70>"
CPU0 CPU1
60: 1593819 292297 xen-pirq-msi-x nvme0q0, nvme0q1
61: 456537 831877 xen-pirq-msi-x nvme0q2
66: 1498 1644 xen-pirq-msi-x eth1-Tx-Rx-0
67: 1535 1034 xen-pirq-msi-x eth1-Tx-Rx-1
69: 3579919 8973629 xen-pirq-msi-x eth2-Tx-Rx-0
70: 9265764 3747096 xen-pirq-msi-x eth2-Tx-Rx-1

Without fix:
[ec2-user@ip-10-0-10-75 ~]$ cat /proc/interrupts | grep "<CPU0>|<60>|<61>|<66>|<67>|<69>|<70>"
CPU0 CPU1
60: 4014 1088758 xen-pirq-msi-x nvme0q0, nvme0q1
61: 2046 2218741 xen-pirq-msi-x nvme0q2
66: 11 2978 xen-pirq-msi-x eth1-Tx-Rx-0
67: 8 43 xen-pirq-msi-x eth1-Tx-Rx-1
69: 43 13773325 xen-pirq-msi-x eth2-Tx-Rx-0
70: 4 13102621 xen-pirq-msi-x eth2-Tx-Rx-1

Patch:

Patch can be found at:
http://lists.infradead.org/pipermail/irqbalance/2017-November/000122.html

@nhorman

This comment has been minimized.

Copy link
Contributor

nhorman commented Nov 15, 2017

fixed in 723c41c. Thanks!

@nhorman nhorman closed this Nov 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.