Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--hpx:bind throws unexpected error #1370

Closed
hkaiser opened this issue Feb 2, 2015 · 12 comments
Closed

--hpx:bind throws unexpected error #1370

hkaiser opened this issue Feb 2, 2015 · 12 comments

Comments

@hkaiser
Copy link
Member

hkaiser commented Feb 2, 2015

From hpx-users:

I’m trying to bind threads manually for the Xeon phi on hermione. I am able to use the —hpx:bind=balance or compact or scatter, but none of those will let me run with a specified number of cores and the same number of threads on each core (which would be a nice enhancement)

So I tired binding manually and the example in the manual fails:

$ ./hello_world -t4 --hpx:bind=thread:0-3=core:0-3.pu:0
{env}: 13 entries:
  HOME=/home/pagrubel
  LD_LIBRARY_PATH=/opt/intel/composer_xe_2013/lib/mic:/opt/hwloc/1.7-k10m-release/lib
  LOGNAME=pagrubel
  MAIL=/var/mail/pagrubel
  OLDPWD=/home/pagrubel
  PATH=/usr/bin:/bin:/usr/sbin:/sbin
  PWD=/home/pagrubel/build/hpx_buildmic/bin
  SHELL=/bin/sh
  SSH_CLIENT=172.31.1.254 41705 22
  SSH_CONNECTION=172.31.1.254 41705 172.31.1.1 22
  SSH_TTY=/dev/pts/0
  TERM=xterm-256color
  USER=pagrubel
hpx::init: std::exception caught: The number of OS threads requested (4) does not 
    match the number of threads to bind (3): HPX(bad_parameter)
@hkaiser
Copy link
Member Author

hkaiser commented Feb 2, 2015

I'm not able to reproduce this. For me (granted, not on a MIC) this command line, with an --hpx:print-bind added, produces (as expected):

*******************************************************************************
locality: 0
   0: PU L#0(P#0), Core L#0, Socket L#0, Node L#0(P#0)
   1: PU L#2(P#2), Core L#1, Socket L#0, Node L#0(P#0)
   2: PU L#4(P#4), Core L#2, Socket L#0, Node L#0(P#0)
   3: PU L#6(P#6), Core L#3, Socket L#0, Node L#0(P#0)
hello world from OS-thread 2 on locality 0
hello world from OS-thread 0 on locality 0
hello world from OS-thread 1 on locality 0
hello world from OS-thread 3 on locality 0

@pagrubel
Copy link
Member

pagrubel commented Feb 3, 2015

Yes it worked for me on the Ivy Bridge node that had hyper threading turned on too, but not on the Xeon phi

@sithhell
Copy link
Member

sithhell commented Feb 3, 2015

The bind specification only specifies 3 threads, but 4 worker threads are
requested.

@pagrubel
Copy link
Member

pagrubel commented Feb 3, 2015

0-3 is four and the exact same command works on other machines

@sithhell
Copy link
Member

sithhell commented Feb 3, 2015

Am 03.02.2015 18:08 schrieb "Patricia Grubel" notifications@github.com:

0-3 is four

You are of course right. Sorry for the noise...

@sithhell
Copy link
Member

sithhell commented Feb 4, 2015

This is related to #1254.

@hkaiser
Copy link
Member Author

hkaiser commented Feb 4, 2015

This is related to #1254.

How so?

@hkaiser
Copy link
Member Author

hkaiser commented Feb 15, 2015

When building with HPX_MAX_CPU_COUNT=256 this is still not reproducible. AFAICT this is the only setting which could possibly cause the issue.

@hkaiser
Copy link
Member Author

hkaiser commented Feb 15, 2015

@pagrubel: do you have HWLOC enabled on the Phi? Does the --hpx:print-bind option produce any output before the error is raised?

@pagrubel
Copy link
Member

Yes I have hwloc enabled
No output from --hpx:print-bind before the error is raised

@hkaiser
Copy link
Member Author

hkaiser commented Feb 16, 2015

@pagrubel: could you verify whether this breaks as well, please:

./hello_world -t4 --hpx:bind=thread:0-3=core:1-4.pu:0

I would like to make sure that the special handling of core 0 on the Phi does not get in the way here.

@pagrubel
Copy link
Member

same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants