Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would like option to report hwloc bindings #973

Closed
eschnett opened this issue Oct 18, 2013 · 11 comments
Closed

Would like option to report hwloc bindings #973

eschnett opened this issue Oct 18, 2013 · 11 comments
Assignees
Labels
category: init difficulty: easy Good issues for starting out with HPX development type: enhancement
Milestone

Comments

@eschnett
Copy link
Contributor

I would like to have an option that make HPX applications report the actual hwloc bindings used. This should use hwloc to read the bindings from the OS for each thread.

@ghost ghost assigned sithhell Oct 20, 2013
@hkaiser
Copy link
Member

hkaiser commented Oct 20, 2013

We should rather change --hpx:print-bind to report the actual bindings...

@sithhell
Copy link
Member

If --hpx:print-bind doesn't report the actual bindings, there is a severe bug somewhere between setting the affinity masks and binding them to the cores in use.

@hkaiser
Copy link
Member

hkaiser commented Oct 21, 2013

Either we have that bug and it needs to be fixed or we can close this ticket. Do we have any evidence for such a problem?

@sithhell
Copy link
Member

I'll investigate this issue on Wednesday. I thought i fixed those problems. Will check on various machines with different CPUs. I am currently using hwloc 1.7.2. Might be that earlier hwloc versions have a bug.

@sithhell
Copy link
Member

I can not reproduce this problem. Which version of hwloc are you using?

@eschnett
Copy link
Contributor Author

The issue was originally that --hpx:print-bind examines the command line options, creates an HPX-internal representation of these, and then outputs these. It did not call hwloc_get_cpubind to find out the actual bindings. This led to several errors in the past, since what hwloc_set_cpubind actually did was different from what was reported.

I thus request that the code should call hwloc_get_cpubind to find out the actual bindings, and then report these.

The only call to hwloc_get_cpubind is in the "tests" directory. I thus assume that hpx:print-bind does not actually call hwloc_get_cpubind.

@eschnett
Copy link
Contributor Author

One a phone call this past Wednesday, Hartmut suggested to revisit this issue once we had a new case where --hpx:print-bind outputs wrong information.

There is now such a case; see #981.

@eschnett
Copy link
Contributor Author

The current code still doesn't use hwloc_get_cpubind to output the actual bindings. As before, only HPX's view of the world is output. Given that view was wrong multiple times in the past weeks, I still strongly suggest to use hwloc_get_cpubind to query the actual CPU bindings, and to output these. Errors in CPU bindings are difficult to detect by an unsuspecting user, and tools such as --hpx:print-bind must be reliable.

@eschnett eschnett reopened this Oct 29, 2013
@pagrubel
Copy link
Member

watching

@sithhell
Copy link
Member

Am 29.10.2013 14:46 schrieb "Erik Schnetter" notifications@github.com:

The current code still doesn't use hwloc_get_cpubind to output the actual
bindings. As before, only HPX's view of the world is output.

That's not entirely true. The commit I made now uses the exact same masks
which are used to bind the threads. I ditched the code which did the whole
command line parsing and lead to errors. The only thing that could lead to
a wrong output is that hwloc fails to bind the threads correctly. I could
add code which queries the current binding again but I don't see any
additional value in that.

Given that view was wrong multiple times in the past weeks, I still
strongly suggest to use hwloc_get_cpubind to query the actual CPU bindings,
and to output these. Errors in CPU bindings are difficult to detect by an
unsuspecting user, and tools such as --hpx:print-bind must be reliable.

I agree that there were bugs in the way we reported and calculated the
thread affinities. This should be detectable now for the reasons described
above.

@sithhell
Copy link
Member

As far as I can tell, we have a reliable solution right now. It is not 100% fool proof for future changes. Will move the final resolution to 1.0.0.

@hkaiser hkaiser closed this as completed Mar 25, 2014
@hkaiser hkaiser reopened this Mar 25, 2014
@hkaiser hkaiser modified the milestones: 0.9.9, 0.9.8 Mar 25, 2014
@hkaiser hkaiser assigned hkaiser and unassigned sithhell Jun 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: init difficulty: easy Good issues for starting out with HPX development type: enhancement
Projects
None yet
Development

No branches or pull requests

4 participants