Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel::partition_copy. #2716

Merged
merged 14 commits into from Jul 10, 2017

Conversation

taeguk
Copy link
Member

@taeguk taeguk commented Jun 25, 2017

Check Box

** Issue List **

Note

2017/6/25

I implemented partition_copy with reference to copy_if.
The behavior is good, but the benchmark results are very bad.
I must find new efficient way and re-implement partition_copy

2017/7/4

The bad performance is due to #2325.
With using policy.executor() instead of hpx::launch::sync in scan_partitioner, the performance is good. (https://gist.github.com/taeguk/6abe03f9b4cb878872d2bb634cae65b0)

@hkaiser
Copy link
Member

hkaiser commented Jun 25, 2017

@taeguk: invoking the predicate twice for each element is not allowed, see here: http://en.cppreference.com/w/cpp/algorithm/partition_copy

Complexity

Exactly distance(first, last) applications of p.

@taeguk
Copy link
Member Author

taeguk commented Jun 25, 2017

@hkaiser You're right. I did not know that.
But, still it does not seem good to allocate additional memory.
However, unfortunately I have no way to resolve that issue.
It would be nice to be able to use memory space of the container pointed to by the Output Iterator, but this does not seem easy because of generalization of algorithm interface.

@hkaiser
Copy link
Member

hkaiser commented Jun 25, 2017

Yes, I agree. I think the only viable solution is to allocate that additional array of Booleans as you had it in the first place.

// sequential partition_copy with projection function
template <typename InIter, typename OutIter1, typename OutIter2,
typename Pred, typename Proj>
inline std::pair<OutIter1, OutIter2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small hint: member functions and templated functions are implicitily inline

{
while (first != last)
{
if (hpx::util::invoke(pred, hpx::util::invoke(proj, *first)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should perfect forward callable objects since those can be overloaded on r-value references:

struct my_callable {
  void operator()()&& {
    // ...
  }
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naios: Not sure if all of our compilers support that. We should add a feature test if we want to use this. Also, if we do, this could be applied in many places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naios: To clarify, I meant the operator()() &&

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naios: also, in this context, perfect forwarding wouldn't work as pred (and proj) shouldn't be invalidated (as it might - if moved); pred will be used more than once.

Copy link
Contributor

@Naios Naios Jun 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkaiser I guess all required compiler versions should support it:

  • GCC since 4.8 or 4.9
  • Clang since 3.3 or 3.4
  • MSVC since 2015 (19.0)

However, it is safe for the future to support perfect forwarding for callable objects.

// EDIT Because of the second comment: yes that's right

@hkaiser
Copy link
Member

hkaiser commented Jun 26, 2017

Also, the inspect tool is not happy with your code: https://7095-4455628-gh.circle-artifacts.com/0/tmp/circle-artifacts.KS6bV0d/hpx_inspect_report.html

Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to keep the work on the scan_partitioner separate from the partition_copy algorithm implementation.

@taeguk
Copy link
Member Author

taeguk commented Jul 2, 2017

@hkaiser Sorry, this is my mistake. This is just for backup. This commit will be removed.

@hkaiser
Copy link
Member

hkaiser commented Jul 8, 2017

@taeguk: could we merge this step by step? Could we merge the current state of partition_copy?

Wrt bad performance: I'd suggest to make the change from hpx::launch::sync to policy.executor() and watch the regression tests (rostam.cct.lsu.edu).

@taeguk
Copy link
Member Author

taeguk commented Jul 8, 2017

@hkaiser I had already done such a test. With policy.executor, I could get acceptable performance.
I have some commits that are not pushed. And I have to fix some tiny things like inspection errors.

I stopped the progress of this PR because of #2733.
I had a plan to add some commits for adapting this PR to #2733 after #2733 is merged.
I am waiting for it to be merged.

Do you want that I will finish this PR without adapting to #2733?

@hkaiser
Copy link
Member

hkaiser commented Jul 8, 2017

@taeguk ok, understood. I applied the first batch of name changes to #2733. I will try to apply the rest asap. In the meantime it might be helpful if you added your review to #2733 as our policy is to have at least one review before merging.

@taeguk taeguk changed the title [Working] Implement parallel::partition_copy. Implement parallel::partition_copy. Jul 8, 2017
@taeguk
Copy link
Member Author

taeguk commented Jul 8, 2017

@hkaiser I'm ready to be merged if you want that you will do adapting this PR to #2733.
Otherwise, if you want that I will do that, I will do more commits in here after #2733 is merged.

@taeguk
Copy link
Member Author

taeguk commented Jul 8, 2017

@hkaiser Sorry, I have one thing to do for Ranges TS. I will add partition_copy to container_algorithms/partition.hpp.

I have a question. As you think, which one is better between "one PR for both an implemention of parallel algorithm and an adaption for Ranges TS" and "divide one PR into two for each"?

@taeguk taeguk changed the title Implement parallel::partition_copy. [Working] Implement parallel::partition_copy. Jul 8, 2017
… unit tests for parallel is_heap, is_heap_until, and partition_copy.
…s of parallel::partition_copy. And fix some tiny things.
@taeguk taeguk changed the title [Working] Implement parallel::partition_copy. Implement parallel::partition_copy. Jul 9, 2017
@taeguk
Copy link
Member Author

taeguk commented Jul 9, 2017

@hkaiser Finally, I'm ready to be merged!

Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

@hkaiser
Copy link
Member

hkaiser commented Jul 10, 2017

Benchmark results are inconsistent. (They are different whenever I benchmark.) (https://gist.github.com/taeguk/e40b628fefe025aa3ffc1335bbeed4ee)

@taeguk: Have you tried whether you get repeatable results when forcing the same seed? The different results could be caused by a different amount of work needed to perform the partitioning.

@taeguk
Copy link
Member Author

taeguk commented Jul 10, 2017

@hkaiser Oh, I got almost the same results when forcing the same seed! Great :)

@hkaiser hkaiser merged commit 11bce88 into STEllAR-GROUP:master Jul 10, 2017
@hkaiser
Copy link
Member

hkaiser commented Jul 10, 2017

@taeguk: Thanks a lot for your work on this. That's much appreciated! Please get in contact with aserio on IRC so he can send you a STE||AR-group t-shirt ;)

@hkaiser hkaiser moved this from Work in progress to Merged to master in Standard Algorithms Jul 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Standard Algorithms
  
Merged to master
Development

Successfully merging this pull request may close these issues.

None yet

3 participants