New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_partitioner is not working as expected. #2325

Open
sithhell opened this Issue Sep 8, 2016 · 13 comments

Comments

4 participants
@sithhell
Member

sithhell commented Sep 8, 2016

Parallel algorithms using the scan partitioner are spuriously failing when
the continuation in dataflow is executed asynchronously.

It seems that when passing hpx::launch::sync to the dataflow invocation works around that issue.

A fix needs to be found such that it works with (async) executors again.

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 2, 2017

Member

How is it progressed now?
Because of this issue, parallel algorithms that use scan_partitioner are very slow.
What problem does it occurs in detail to use policy.executor() instead of hpx::launch::sync?
I want to try to resolve this issue.

Member

taeguk commented Jul 2, 2017

How is it progressed now?
Because of this issue, parallel algorithms that use scan_partitioner are very slow.
What problem does it occurs in detail to use policy.executor() instead of hpx::launch::sync?
I want to try to resolve this issue.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 2, 2017

Member

What problem does it occurs in detail to use policy.executor() instead of hpx::launch::sync ?

We see random segfaults in this case. Apparently things are going out of scope too early.

I want to try to resolve this issue.

Sure, go ahead - please be our guest.

Member

hkaiser commented Jul 2, 2017

What problem does it occurs in detail to use policy.executor() instead of hpx::launch::sync ?

We see random segfaults in this case. Apparently things are going out of scope too early.

I want to try to resolve this issue.

Sure, go ahead - please be our guest.

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 3, 2017

Member

@hkaiser Can I get more informations about this issue?
I couldn't find the problem with looking the code.
Can you tell when the problem is happening? (ex, which parallel algorithms is executing when the segfault occurs?)

Member

taeguk commented Jul 3, 2017

@hkaiser Can I get more informations about this issue?
I couldn't find the problem with looking the code.
Can you tell when the problem is happening? (ex, which parallel algorithms is executing when the segfault occurs?)

@taeguk taeguk referenced this issue Jul 3, 2017

Merged

Implement parallel::partition_copy. #2716

10 of 10 tasks complete
@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 4, 2017

Member

@sithhell: do you have any recollection of how to reproduce the original problem?

Member

hkaiser commented Jul 4, 2017

@sithhell: do you have any recollection of how to reproduce the original problem?

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Aug 13, 2017

Member

@hkaiser One suspicion:

for(auto const& elem: shape)
{
FwdIter it = hpx::util::get<0>(elem);
std::size_t size = hpx::util::get<1>(elem);
hpx::shared_future<Result1> prev = workitems.back();
auto curr = execution::async_execute(
policy.executor(), f1, it, size).share();
finalitems.push_back(dataflow(hpx::launch::sync,
f3, it, size, prev, curr));
workitems.push_back(dataflow(hpx::launch::sync,
f2, prev, curr));
}

Is there no possibility about dangling reference to prev and curr?
If prev or curr is passed by reference to f2 or f3, dangling reference problem can be occurred when we use policy.executor() instead of hpx::launch::sync.

Member

taeguk commented Aug 13, 2017

@hkaiser One suspicion:

for(auto const& elem: shape)
{
FwdIter it = hpx::util::get<0>(elem);
std::size_t size = hpx::util::get<1>(elem);
hpx::shared_future<Result1> prev = workitems.back();
auto curr = execution::async_execute(
policy.executor(), f1, it, size).share();
finalitems.push_back(dataflow(hpx::launch::sync,
f3, it, size, prev, curr));
workitems.push_back(dataflow(hpx::launch::sync,
f2, prev, curr));
}

Is there no possibility about dangling reference to prev and curr?
If prev or curr is passed by reference to f2 or f3, dangling reference problem can be occurred when we use policy.executor() instead of hpx::launch::sync.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Oct 12, 2017

Member

@taeguk, @heller: has this been resolved now?

Member

hkaiser commented Oct 12, 2017

@taeguk, @heller: has this been resolved now?

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Oct 12, 2017

Member

@hkaiser I don't know. I have no test code which reproduces original problem. So I can't try to resolve this issue. Can I get the code which reproduces the problem?

Member

taeguk commented Oct 12, 2017

@hkaiser I don't know. I have no test code which reproduces original problem. So I can't try to resolve this issue. Can I get the code which reproduces the problem?

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Oct 13, 2017

Member

@taeguk, @sithhell Let's close this for now. We can always reopen it, if needed.

Member

hkaiser commented Oct 13, 2017

@taeguk, @sithhell Let's close this for now. We can always reopen it, if needed.

@hkaiser hkaiser closed this Oct 13, 2017

@sithhell

This comment has been minimized.

Show comment
Hide comment
@sithhell

sithhell Oct 13, 2017

Member

Reopening again. The issue hasn't been worked ok. The partitioner still doesn't use the passed executors.

Member

sithhell commented Oct 13, 2017

Reopening again. The issue hasn't been worked ok. The partitioner still doesn't use the passed executors.

@sithhell sithhell reopened this Oct 13, 2017

@taeguk taeguk referenced this issue Dec 14, 2017

Closed

Version 1.1.0 must-haves #3050

12 of 12 tasks complete
@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jan 5, 2018

Member

@sithhell Can you give me the test cases that makes scan_partitioner strange?
I really want to address this issue. But I have no test code which reproduces original problem.

Member

taeguk commented Jan 5, 2018

@sithhell Can you give me the test cases that makes scan_partitioner strange?
I really want to address this issue. But I have no test code which reproduces original problem.

@sithhell

This comment has been minimized.

Show comment
Hide comment
@sithhell

sithhell Jan 12, 2018

Member

sorry for not getting back to you earlier ...
If you look at https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/scan_partitioner.hpp,
there are a few places where we use dataflow(hpx::launch::sync, ...) instead of making use of the passed executor (The PR mentioned above should point you to those places). By changing that to use the executors, you should see a failure in the tests using the scan partitioner, it's a race condition, so you will need to run a few samples.

Member

sithhell commented Jan 12, 2018

sorry for not getting back to you earlier ...
If you look at https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/parallel/util/scan_partitioner.hpp,
there are a few places where we use dataflow(hpx::launch::sync, ...) instead of making use of the passed executor (The PR mentioned above should point you to those places). By changing that to use the executors, you should see a failure in the tests using the scan partitioner, it's a race condition, so you will need to run a few samples.

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jan 27, 2018

Member

@sithhell I can't reproduce the problem.
I tested the unit tests that use scan_partitioner in hpx repository with various random seeds.
But I can't see strange things or fails of the tests.
Do you have any tests that reproduce the problem? And can you check one more whether this problem still exists in now?

Member

taeguk commented Jan 27, 2018

@sithhell I can't reproduce the problem.
I tested the unit tests that use scan_partitioner in hpx repository with various random seeds.
But I can't see strange things or fails of the tests.
Do you have any tests that reproduce the problem? And can you check one more whether this problem still exists in now?

taeguk added a commit to taeguk/hpx that referenced this issue Jan 27, 2018

@taeguk taeguk referenced this issue Jan 31, 2018

Merged

Fixing #2325 #3131

Standard Algorithms automation moved this from Open Tickets to Merged to master Feb 1, 2018

msimberg added a commit that referenced this issue Feb 1, 2018

msimberg added a commit that referenced this issue Feb 2, 2018

msimberg added a commit that referenced this issue Feb 2, 2018

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Feb 3, 2018

Member

Reopen because of #3136

Member

taeguk commented Feb 3, 2018

Reopen because of #3136

@taeguk taeguk reopened this Feb 3, 2018

@msimberg msimberg removed this from the 1.1.0 milestone Mar 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment