Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing host file #643

Closed
sameershende opened this issue Dec 18, 2012 · 5 comments
Closed

Error parsing host file #643

sameershende opened this issue Dec 18, 2012 · 5 comments
Assignees
Milestone

Comments

@sameershende
Copy link

Hi,
The Aciss system has two different types of nodes. When I use:

qsub -I -V -q generic -l nodes=2:ppn=12 -d /home3/sameer
to allocate the generic nodes, I can run an application on the two nodes properly using:
[sameer@cn169 bin]$ pwd
/ibrix/packages/HPX/apps/hpx.un/bin

[sameer@cn169 bin]$ pbsdsh -v -u pwd/hello_world --hpx:nodes=cat $PBS_NODEFILE -t 2
...
pbsdsh: rescinfo from 10: Linux cn169 2.6.32-279.9.1.el6.x86_64 #1 SMP Fri Aug 31 09:04:24 EDT 2012 x86_64:nodes=2:ppn=12,walltime=24:00:00
pbsdsh: rescinfo from 11: Linux cn169 2.6.32-279.9.1.el6.x86_64 #1 SMP Fri Aug 31 09:04:24 EDT 2012 x86_64:nodes=2:ppn=12,walltime=24:00:00
pbsdsh: spawned task 0
pbsdsh: spawned task 1
pbsdsh: spawn event returned: 0 (2 spawns and 0 obits outstanding)
pbsdsh: sending obit for task 8
pbsdsh: spawn event returned: 1 (1 spawns and 1 obits outstanding)
pbsdsh: sending obit for task 9
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0
hello world from OS-thread 0 on locality 1
hello world from OS-thread 1 on locality 1
pbsdsh: obit event returned: 0 (0 spawns and 2 obits outstanding)
pbsdsh: task 0 exit status 0
pbsdsh: obit event returned: 1 (0 spawns and 1 obits outstanding)
pbsdsh: task 1 exit status 0

When I try to use the fatnodes, I get an error.

[sameer@hn1 ~]$ qsub -I -V -q fatnodes -l nodes=2:ppn=32 -d /home3/sameer

[sameer@un12 bin]$ pbsdsh -v -u pwd/hello_world --hpx:nodes=cat $PBS_NODEFILE -t 2
pbsdsh: rescinfo from 29: Linux un12 2.6.32-279.9.1.el6.x86_64 #1 SMP Fri Aug 31 09:04:24 EDT 2012 x86_64:nodes=2:ppn=32,walltime=24:00:00
pbsdsh: rescinfo from 30: Linux un12 2.6.32-279.9.1.el6.x86_64 #1 SMP Fri Aug 31 09:04:24 EDT 2012 x86_64:nodes=2:ppn=32,walltime=24:00:00
pbsdsh: rescinfo from 31: Linux un12 2.6.32-279.9.1.el6.x86_64 #1 SMP Fri Aug 31 09:04:24 EDT 2012 x86_64:nodes=2:ppn=32,walltime=24:00:00
pbsdsh: spawned task 0
pbsdsh: spawned task 1
pbsdsh: spawn event returned: 0 (2 spawns and 0 obits outstanding)
pbsdsh: sending obit for task 2
hpx::init: std::exception caught: Cannot retrieve number of OS threads for host_name: un12
pbsdsh: obit event returned: 0 (1 spawns and 1 obits outstanding)
pbsdsh: task 0 exit status 255
pbsdsh: spawn event returned: 1 (1 spawns and 0 obits outstanding)
pbsdsh: sending obit for task 3
hpx::init: std::exception caught: Cannot retrieve number of OS threads for host_name: un10
pbsdsh: obit event returned: 1 (0 spawns and 1 obits outstanding)
pbsdsh: task 1 exit status 255
cat $PBS_NODEFILE | more

fn12
fn12
fn12
fn12
fn12
fn12
fn12
fn12
fn12
fn12
fn12
fn12
...

I am not sure how it gets un from fn12?

hpx::init: std::exception caught: Cannot retrieve number of OS threads for host_name: un12
The executables are in /ibrix/packages/HPX/apps/hpx.un/bin
Thanks,

  • Sameer
@hkaiser
Copy link
Member

hkaiser commented Dec 19, 2012

Sameer,

when I simply execute ./hello_world from inside the /ibrix/packages/HPX/apps/hpx.un/bin directory I'm getting:

./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by ./hello_world)
./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/hpx.un/lib/hpx/libiostreams.so.0)
./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/hpx.un/lib/hpx/libhpx.so.0)
./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_date_time.so.1.51.0)
./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_regex.so.1.51.0)
./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_thread.so.1.51.0)
./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_serialization.so.1.51.0)

That is an indication that I'm missing something (or the system configuration is messed up, which I don't think is the case).
What settings do I need to add to my environment to make HPX work from the commandline?

@sameershende
Copy link
Author

Hi Hartmut,
I have the following modules loaded in my environment and in my login scripts:

[sameer@hn1 ~]$ module list
Currently Loaded Modulefiles:

  1. cmake/2.8.6 2) gcc/4.6.3
    [sameer@hn1 ~]$
That should take care of these dependencies. I also use 

source /etc/profile.d/modules.csh
in my .tcshrc.
Thanks,
- Sameer

On Dec 18, 2012, at 5:23 PM, Hartmut Kaiser notifications@github.com wrote:

Sameer,

when I simply execute ./hello_world from inside the /ibrix/packages/HPX/apps/hpx.un/bin directory I'm getting:

./hello_world: /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.15' not found (required by ./hello_world) ./hello_world: /usr/lib64/libstdc++.so.6: versionGLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/hpx.un/lib/hpx/libiostreams.so.0)
./hello_world: /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/hpx.un/lib/hpx/libhpx.so.0) ./hello_world: /usr/lib64/libstdc++.so.6: versionGLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_date_time.so.1.51.0)
./hello_world: /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_regex.so.1.51.0) ./hello_world: /usr/lib64/libstdc++.so.6: versionGLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_thread.so.1.51.0)
./hello_world: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /usr/local/packages/HPX/apps/boost_1_51_0/stage/lib/libboost_serialization.so.1.51.0)
That is an indication that I'm missing something (or the system configuration is messed up, which I don't think is the case).
What settings do I need to add to my environment to make HPX work from the commandline?


Reply to this email directly or view it on GitHub.

@ghost
Copy link

ghost commented Jan 22, 2013

Hartmut -

The issue is simpler than that... ACISS has two networks. The $PBS_NODEFILE contains the hostnames on the torque network. HPX is trying to get information for the hostnames on the GigE network. We were able to work around the problem by doing this in our submission script:

for node in `cat $PBS_NODEFILE` ; do ssh $node hostname >> myhosts; done

pbsdsh -u /home3/khuck/src/hpx/hpx.git/taubuild/runme.sh \
    /home3/khuck/src/hpx/hpx.git/taubuild/bin/hello_world \
    --hpx:nodefile=myhosts

That way, we pass the GigE network names to HPX, rather than the torque names. It would be nice if HPX could use the torque names, because it is the faster 10GigE network.

Thanks -
Kevin

@hkaiser
Copy link
Member

hkaiser commented Jan 22, 2013

If it is possible to deduce the 10GigE hostnames from the 1GigE ones you can use --hpx:iftransform (or --hpx:ifsuffix or --hpx:ifprefix) to mangle the given names into what you need (see also: http://stellar.cct.lsu.edu/files/hpx_0.9.5/html/hpx/manual/init/commandline.html).

@ghost ghost assigned hkaiser Jan 25, 2013
@hkaiser
Copy link
Member

hkaiser commented Jan 25, 2013

This is resolved, I'm closing it. Please reopen if you need more information/help.

@hkaiser hkaiser closed this as completed Jan 25, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants