Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processing unit out of bounds #3399

Closed
Red-Portal opened this issue Aug 6, 2018 · 7 comments

Comments

Projects
None yet
3 participants
@Red-Portal
Copy link

commented Aug 6, 2018

Expected Behavior

HPX gets initialized

Actual Behavior

terminate called after throwing an instance of 'std::invalid_argument'
  what():  init_pool_data::add_resource: init_pool_data::add_resource: processing unit index out of bounds. The total available number of processing units on this machine is 40
Aborted

Steps to Reproduce the Problem

call hpx::init()

Specifications

  • HPX Version: Latest
  • Platform (compiler, OS): Centos 7.4 Server

I'm currently running HPX on a cloud server.
I think the error has something to do with the virtualized environment of the cloud server.
How could I solve this issue?

@hkaiser

This comment has been minimized.

Copy link
Member

commented Aug 6, 2018

@Red-Portal Could you post the output of ls-topo (the hwloc command line tool) here, please?

@Red-Portal

This comment has been minimized.

Copy link
Author

commented Aug 7, 2018

@hkaiser Hi, here is the ls-topo output

Machine (125GB)
  Package L#0 + L3 L#0 (25MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#20)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#21)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#22)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#23)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
      PU L#8 (P#4)
      PU L#9 (P#24)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
      PU L#10 (P#5)
      PU L#11 (P#25)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#6)
      PU L#13 (P#26)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#27)
    L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
      PU L#16 (P#8)
      PU L#17 (P#28)
    L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
      PU L#18 (P#9)
      PU L#19 (P#29)
  Package L#1 + L3 L#1 (25MB)
    L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
      PU L#20 (P#10)
      PU L#21 (P#30)
    L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
      PU L#22 (P#11)
      PU L#23 (P#31)
    L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
      PU L#24 (P#12)
      PU L#25 (P#32)
    L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
      PU L#26 (P#13)
      PU L#27 (P#33)
    L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
      PU L#28 (P#14)
      PU L#29 (P#34)
    L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
      PU L#30 (P#15)
      PU L#31 (P#35)
    L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
      PU L#32 (P#16)
      PU L#33 (P#36)
    L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
      PU L#34 (P#17)
      PU L#35 (P#37)
    L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
      PU L#36 (P#18)
      PU L#37 (P#38)
    L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
      PU L#38 (P#19)
      PU L#39 (P#39)
  HostBridge L#0
    PCIBridge
      PCI 1000:005d
        Block(Disk) L#0 "sda"
    PCIBridge
      PCI 8086:10fb
        Net L#1 "eth0"
      PCI 8086:10fb
        Net L#2 "eth1"
    PCI 8086:8d62
    PCIBridge
      PCI 19e5:1711
    PCI 8086:8d02
  HostBridge L#4
    PCIBridge
      PCI 8086:10fb
        Net L#3 "eth2"
      PCI 8086:10fb
        Net L#4 "eth3"
@hkaiser

This comment has been minimized.

Copy link
Member

commented Aug 7, 2018

@msimberg would you have any idea why this is happening? The machine topology doesn't look out of the ordinary...

@Red-Portal what version of hwloc do you use?

@msimberg

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2018

@hkaiser @Red-Portal I don't... My first guess was a variation of max cpu count being too low but that doesn't seem to be the case.

Does setting --hpx:threads=1/20/40 explicitly change anything? Are you doing any setup with the resource partitioner before calling hpx::init?

@Red-Portal

This comment has been minimized.

Copy link
Author

commented Aug 7, 2018

@msimberg I'm not doing anything before calling init else than setting boost command line. Explicitly setting threads didn't do anything but I'll try again later. In fact, hpx examples emit the same error.

@hkaiser hwloc is the Centos 7.4 official distribution. I believe 1.7 but I'll check. It should be a very old version just as any centos official package.

@hkaiser

This comment has been minimized.

Copy link
Member

commented Aug 7, 2018

hwloc is the Centos 7.4 official distribution. I believe 1.7 but I'll check. It should be a very old version just as any centos official package.

@Red-Portal could you try building a more recent version of hwloc and retry, please? This should be straightforward.

@Red-Portal

This comment has been minimized.

Copy link
Author

commented Aug 7, 2018

@hkaiser @msimberg Hi, I built the latest hwloc 2.1 from source and now everything works.
this is the lstopo output

   NUMANode L#0 (P#0 128GB)
  Package L#0 + L3 L#0 (25MB)
    L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#20)
    L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
      PU L#2 (P#1)
      PU L#3 (P#21)
    L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
      PU L#4 (P#2)
      PU L#5 (P#22)
    L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
      PU L#6 (P#3)
      PU L#7 (P#23)
    L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
      PU L#8 (P#4)
      PU L#9 (P#24)
    L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
      PU L#10 (P#5)
      PU L#11 (P#25)
    L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
      PU L#12 (P#6)
      PU L#13 (P#26)
    L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#27)
    L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
      PU L#16 (P#8)
      PU L#17 (P#28)
    L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
      PU L#18 (P#9)
      PU L#19 (P#29)
  Package L#1 + L3 L#1 (25MB)
    L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
      PU L#20 (P#10)
      PU L#21 (P#30)
    L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
      PU L#22 (P#11)
      PU L#23 (P#31)
    L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
      PU L#24 (P#12)
      PU L#25 (P#32)
    L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
      PU L#26 (P#13)
      PU L#27 (P#33)
    L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
      PU L#28 (P#14)
      PU L#29 (P#34)
    L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
      PU L#30 (P#15)
      PU L#31 (P#35)
    L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
      PU L#32 (P#16)
      PU L#33 (P#36)
    L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
      PU L#34 (P#17)
      PU L#35 (P#37)
    L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
      PU L#36 (P#18)
      PU L#37 (P#38)
    L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
      PU L#38 (P#19)
      PU L#39 (P#39)
  HostBridge
    PCIBridge
      PCI 01:00.0 (RAID)
        Block(Disk) "sda"
    PCIBridge
      PCI 04:00.0 (Ethernet)
        Net "eth0"
      PCI 04:00.1 (Ethernet)
        Net "eth1"
    PCI 00:11.4 (SATA)
    PCIBridge
      PCI 06:00.0 (VGA)
    PCI 00:1f.2 (SATA)
  HostBridge
    PCIBridge
      PCI 81:00.0 (Ethernet)
        Net "eth2"
      PCI 81:00.1 (Ethernet)
        Net "eth3"

Thanks everyone for the support.

@Red-Portal Red-Portal closed this Aug 7, 2018

@hkaiser hkaiser added this to the 1.2.0 milestone Aug 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.