You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I further simplified my code, to extract my problem. I run this code 8
nodes of our cluster with 1 locality per node.
#include <hpx/hpx_init.hpp>
#include <hpx/hpx.hpp>
#include <hpx/include/actions.hpp>
#include <hpx/runtime/serialization/serialize.hpp>
#include <hpx/include/iostreams.hpp>
#include <math.h>
#include <vector>
#include <list>
#include <set>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <boost/ref.hpp>
#include <boost/format.hpp>
#include <boost/thread/locks.hpp>
#include <boost/serialization/vector.hpp>
void out(std::vector<uint> vec)
{
hpx::cout << "out called " << hpx::find_here() << std::endl <<
hpx::flush;
}
HPX_PLAIN_ACTION(out, out_action);
int main(int argc, char* argv[])
{
// Initialize and run HPX.
return hpx::init(argc, argv);
}
int hpx_main(boost::program_options::variables_map& vm)
{
// find locality info
std::vector<hpx::naming::id_type> locs = hpx::find_all_localities();
uint locid = hpx::get_locality_id();
// create data
std::vector<uint> vec;
for (unsigned long j=0; j < 300000; j++)
{
vec.push_back(1);
}
// send out data
for (uint j = 0; j < 8; j++)
{
std::vector<hpx::future<void> > fut1;
for (uint i = 0; i < locs.size(); i++)
{
typedef out_action out_act;
fut1.push_back(hpx::async<out_act>(locs.at(i), vec));
hpx::cout << "Scheduled out to " << i+1 << std::endl <<
hpx::flush;
}
wait_all(fut1);
hpx::cout << j+1 << ". round finished " << std::endl << hpx::flush;
}
hpx::cout << "program finished!!!" << std::endl << hpx::flush;
return hpx::finalize();
}
And this is my output:
Scheduled out to 1
out called {0000000300000000, 0000000000000000}
out called {0000000500000000, 0000000000000000}
Scheduled out to 2
Scheduled out to 3
Scheduled out to 4
Scheduled out to 5
Scheduled out to 6
Scheduled out to 7
Scheduled out to 8
out called {0000000400000000, 0000000000000000}
out called {0000000100000000, 0000000000000000}
out called {0000000800000000, 0000000000000000}
out called {0000000600000000, 0000000000000000}
out called {0000000700000000, 0000000000000000}
out called {0000000200000000, 0000000000000000}
1. round finished
Scheduled out to 1
Scheduled out to 2
Scheduled out to 3
Scheduled out to 4
Scheduled out to 5
Scheduled out to 6
Scheduled out to 7
Scheduled out to 8
out called {0000000100000000, 0000000000000000}
then i get stuck in an endless loop until my job times out.
This same code runs completely if i take 8 localities on a single node,
or decrease my "vec"-size to 3000.
I need to send data of this size, because I'm trying to do image
compositing.
The text was updated successfully, but these errors were encountered:
I could not reproduce this issue on supermic using multiple MPI versions
(mvapich2/2.0 and impi/5.0.1.035) in debug and release compiled with
intel compiler version 15.0.0 and boost 1.55.0.
Now I have “good” news: I have recompiled HPX 0.9.12 (git) using OpenMPI 1.8.5 (instead of Intel MPI 2016), and now the example seems to work as expected. As Jan already said, using 0.9.12 with Intel MPI 2016 before did not resolve the issue. So the problem seems to be MPI related.
Jan-Tobias Sohns wrote:
I further simplified my code, to extract my problem. I run this code 8
nodes of our cluster with 1 locality per node.
And this is my output:
then i get stuck in an endless loop until my job times out.
This same code runs completely if i take 8 localities on a single node,
or decrease my "vec"-size to 3000.
I need to send data of this size, because I'm trying to do image
compositing.
The text was updated successfully, but these errors were encountered: