New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
likwid-topology confused by non-standard core assignements? #46
Comments
Thanks for the perfect bug documentation. I will check it next week. |
It seems like the code I added to deal with non-standard core assignments on AMD systems is not able to deal with Intel systems. Can you please supply the output of |
Sure, no problem, you'll find it as an attachment. |
I cannot clearly identify where the problem comes from. Since you have hwloc installed, can you send me the topology tarball so I can run likwid-topology virtually on your hardware: |
The tarball is attached. |
Or not :) GitHub doesn't like bz2. Now it is zipped. |
Hi, thanks for supplying the tarball. Please try the patch: likwid-non-std-cores.zip . Basically, only the AMD fixup code is excluded for Intel systems. |
Thanks for the patch, but no cigar, I'm afraid. I've included likwid-topology's output as attachment. Just so that we're on the same page: |
Hmm, not the result I hoped for. I played around a little bit with your supplied tarball and it looks fine. I attached the topology_hwloc.c file, just change the file suffix from txt to c and copy it in the src folder. |
Unfortunately, no go. The result is still the same. In attachment, you'll find the -V 3 output. |
Are you sure that you rebuilt LIKWID properly? In the sent topology_hwloc.c the debug print is in line 329 but in you -V 3 output it is in line 236. Always do a |
In fact, I removed the directory, untarred, added my config.mk, replaced the topology_hwloc.c. So, are you sure I got the right file? ;) |
I just checked the file from above and that should behave differently. Have you installed the patched version? Or might it be that you have another liblikwid.so in you LD_LIBRARY_PATH that is used instead of the patched one?
There are some differences that are gathered from the actual system and not the tarball (No * at available). But the core assignment should be valid. |
Hm, no. This is what I do:
(Since this is a public repository, I replaced a node name and a path by something uninformative.) |
There is the problem. You only make likwid-topology executable but don't set the LD_LIBRARY_PATH to the built liblikwid.so.
With your setup, likwid-topology uses the already installed library, probably |
Any new finding? |
Dear Thomas, This is weird, I answered quite a while ago, apparently this got lost. The problem is indeed solved, I had not taken into account that some paths Thanks for solving it, best regards, -gjb- On Tue, Sep 13, 2016 at 10:56 AM, Thomas Roehl notifications@github.com
|
This is about likwid 4.1.1 (release), built to use hwloc that comes with it (config.mk included for completeness).
For some obscure reason the assignment of processors to physical address/core-id is not what one would expect on Intel hardware. Normally, one expects on a dual socket, 12-core machine (haswell E5-2680 v3), hyperthreading disabled:
0 -> 0:0
1 -> 0:1
..
11 -> 0:11
12 -> 1:0
13 -> 1:1
...
23 -> 1:11
The left-hand number is the processor, the first right-hand number the physical address, the second the core-id according to /proc/cpuinfo.
On some machines however, we get:
0 -> 0:0
1 -> 0:2
2 -> 0:4
..
5 -> 0:10
6 -> 1:0
7 -> 1:2
...
11 -> 1:10
12 -> 0:1
13 -> 0:3
..
17 -> 0:11
18 -> 1:1
19 -> 1:3
...
23 -> 1:11
Obviously, it is not what we want, but that is our problem.
However, when likwid-topology is run on such a node, it seems to get confused. It reports:
Sockets: 2
Cores per socket: 6
Threads per core: 2
Apparently, the weird round-robin assignment tricks likwid-topology into assuming that hyperthreading is enabled. The complete output of likwid-topology is in attachment.
lscpu and lstopo (version 1.10.1) reports are consistent with /proc/cpuinfo htough (output of both in attachment as well). So it would seem that the information coming for hwloc is somehow misinterpreted.
Thanks, best regards, Geert Jan Bex
lscpu_out.txt
cpuinfo_out.txt
likwid_topology_out.txt
lstopo_out.txt
config.txt
The text was updated successfully, but these errors were encountered: