Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_partitioner is not working as expected. #2325

Closed
sithhell opened this issue Sep 8, 2016 · 15 comments
Closed

scan_partitioner is not working as expected. #2325

sithhell opened this issue Sep 8, 2016 · 15 comments

Comments

@sithhell
Copy link
Member

sithhell commented Sep 8, 2016

Parallel algorithms using the scan partitioner are spuriously failing when
the continuation in dataflow is executed asynchronously.

It seems that when passing hpx::launch::sync to the dataflow invocation works around that issue.

A fix needs to be found such that it works with (async) executors again.

@taeguk
Copy link
Member

taeguk commented Jul 2, 2017

How is it progressed now?
Because of this issue, parallel algorithms that use scan_partitioner are very slow.
What problem does it occurs in detail to use policy.executor() instead of hpx::launch::sync?
I want to try to resolve this issue.

@hkaiser
Copy link
Member

hkaiser commented Jul 2, 2017

What problem does it occurs in detail to use policy.executor() instead of hpx::launch::sync ?

We see random segfaults in this case. Apparently things are going out of scope too early.

I want to try to resolve this issue.

Sure, go ahead - please be our guest.

@taeguk
Copy link
Member

taeguk commented Jul 3, 2017

@hkaiser Can I get more informations about this issue?
I couldn't find the problem with looking the code.
Can you tell when the problem is happening? (ex, which parallel algorithms is executing when the segfault occurs?)

@hkaiser
Copy link
Member

hkaiser commented Jul 4, 2017

@sithhell: do you have any recollection of how to reproduce the original problem?

@taeguk
Copy link
Member

taeguk commented Aug 13, 2017

@hkaiser One suspicion:

for(auto const& elem: shape)
{
FwdIter it = hpx::util::get<0>(elem);
std::size_t size = hpx::util::get<1>(elem);
hpx::shared_future<Result1> prev = workitems.back();
auto curr = execution::async_execute(
policy.executor(), f1, it, size).share();
finalitems.push_back(dataflow(hpx::launch::sync,
f3, it, size, prev, curr));
workitems.push_back(dataflow(hpx::launch::sync,
f2, prev, curr));
}

Is there no possibility about dangling reference to prev and curr?
If prev or curr is passed by reference to f2 or f3, dangling reference problem can be occurred when we use policy.executor() instead of hpx::launch::sync.

@hkaiser
Copy link
Member

hkaiser commented Oct 12, 2017

@taeguk, @heller: has this been resolved now?

@taeguk
Copy link
Member

taeguk commented Oct 12, 2017

@hkaiser I don't know. I have no test code which reproduces original problem. So I can't try to resolve this issue. Can I get the code which reproduces the problem?

@hkaiser
Copy link
Member

hkaiser commented Oct 13, 2017

@taeguk, @sithhell Let's close this for now. We can always reopen it, if needed.

@hkaiser hkaiser closed this as completed Oct 13, 2017
@sithhell
Copy link
Member Author

Reopening again. The issue hasn't been worked ok. The partitioner still doesn't use the passed executors.

@sithhell sithhell reopened this Oct 13, 2017
@taeguk taeguk mentioned this issue Dec 14, 2017
12 tasks
@taeguk
Copy link
Member

taeguk commented Jan 5, 2018

@sithhell Can you give me the test cases that makes scan_partitioner strange?
I really want to address this issue. But I have no test code which reproduces original problem.

@sithhell
Copy link
Member Author

sorry for not getting back to you earlier ...
If you look at https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/scan_partitioner.hpp,
there are a few places where we use dataflow(hpx::launch::sync, ...) instead of making use of the passed executor (The PR mentioned above should point you to those places). By changing that to use the executors, you should see a failure in the tests using the scan partitioner, it's a race condition, so you will need to run a few samples.

@taeguk
Copy link
Member

taeguk commented Jan 27, 2018

@sithhell I can't reproduce the problem.
I tested the unit tests that use scan_partitioner in hpx repository with various random seeds.
But I can't see strange things or fails of the tests.
Do you have any tests that reproduce the problem? And can you check one more whether this problem still exists in now?

taeguk added a commit to taeguk/hpx that referenced this issue Jan 27, 2018
@taeguk taeguk mentioned this issue Jan 31, 2018
Standard Algorithms automation moved this from Open Tickets to Merged to master Feb 1, 2018
msimberg added a commit that referenced this issue Feb 1, 2018
msimberg added a commit that referenced this issue Feb 2, 2018
msimberg added a commit that referenced this issue Feb 2, 2018
@taeguk
Copy link
Member

taeguk commented Feb 3, 2018

Reopen because of #3136

@taeguk taeguk reopened this Feb 3, 2018
@msimberg msimberg removed this from the 1.1.0 milestone Mar 22, 2018
@stale
Copy link

stale bot commented Jul 4, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the tag: wontfix label Jul 4, 2019
@stale
Copy link

stale bot commented Aug 3, 2019

This issue has been automatically closed. Please re-open if necessary.

@stale stale bot closed this as completed Aug 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Standard Algorithms
  
Merged to master
Development

No branches or pull requests

4 participants