Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exception while executing any application linked with hwloc #691

Closed
maciekab opened this issue Jan 31, 2013 · 9 comments
Closed

exception while executing any application linked with hwloc #691

maciekab opened this issue Jan 31, 2013 · 9 comments

Comments

@maciekab
Copy link
Member

Attempt to run any application linked with hwloc terminates prematurely with the following error message:

hpx::init: std::exception caught: failed to initialize core affinity mask for thread 4: HPX(kernel_error)

Disabling hwloc during configuration fixes the problem.
(boost 1.52.0, gcc 4.6.3, hwloc 1.6.1)

@ghost ghost assigned sithhell Jan 31, 2013
@hkaiser
Copy link
Member

hkaiser commented Jan 31, 2013

Could you provide the full trace info it prints? I'd like to see the filename and line number where this exception gets raised.

@maciekab
Copy link
Member Author

On 01/31/2013 12:58 PM, Hartmut Kaiser wrote:

Could you provide the full trace info it prints? I'd like to see the
filename and line number where this exception gets raised.


Reply to this email directly or view it on GitHub
#691 (comment).

What was reported is precisely what gets printed out. There's no stack
trace and no value being computed.

I can rerun with additional options/env. variables if that helps, just
let me know which.

@sithhell
Copy link
Member

sithhell commented Feb 1, 2013

I am not sure why this is happening. Can you please provide additional information:

  • The git commit you are on
  • The topology of the system the problems occur (attach the output of lstopo)
  • An example invocation of an application that leads to that error

Thanks!

@sithhell
Copy link
Member

sithhell commented Feb 1, 2013

I am not able to reproduce the problem for any machine i have access on with the latest git commit (fce1042).

@sithhell
Copy link
Member

sithhell commented Feb 1, 2013

This is really strange. Two other people independently reported the same problem. The problem is known to persist in the 0.9.5 release. However i thought i fixed it with the latest commits as i can't reproduce any of this anymore.

@maciekab
Copy link
Member Author

maciekab commented Feb 1, 2013

Thomas,

Here's the requested info.

Commit hash: 7f09dc0


Topology:
Machine (24GB)
NUMANode L#0 (P#0 10223MB) + Socket L#0 + L3 L#0 (12MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#9)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#2)
PU L#5 (P#10)
L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#3)
PU L#7 (P#11)
NUMANode L#1 (P#1 14GB) + Socket L#1 + L3 L#1 (12MB)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#4)
PU L#9 (P#12)
L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#5)
PU L#11 (P#13)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#6)
PU L#13 (P#14)
L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#15)
HostBridge L#0
PCIBridge
PCI 1002:679a
PCIBridge
PCI 14e4:1684
Net L#0 "eth0"
PCIBridge
PCI 14e4:1684
Net L#1 "eth1"
PCI 8086:2822
Block L#2 "sda"
Block L#3 "sdb"
Block L#4 "sdc"
Block L#5 "sdd"
Block L#6 "sr0"
HostBridge L#4
PCIBridge
PCI 1000:0058


Example invocation:

./packages/hpx/bin/fibonacci
hpx::init: std::exception caught: failed to initialize core affinity
mask for thread 4: HPX(kernel_error)

HTH,

Maciek

@sithhell
Copy link
Member

sithhell commented Feb 1, 2013

Thanks. This commit should not lead to that error. One of the other errors was resolved by just having a clean and fresh clone of the repo. Do you have an local modifactions? Like merges or something?
Another cause of the error could be that you use SLURM to start the job. I isolated the problem there (at least i hope so) and have a fix in mind. Please be patient.

@sithhell
Copy link
Member

sithhell commented Feb 4, 2013

I committed a series of patches which should have fixed the issue. Please retry.

@maciekab
Copy link
Member Author

maciekab commented Feb 4, 2013

It works for me now.

@maciekab maciekab closed this as completed Feb 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants