New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented a variety of segmented algorithms #2859

Open
wants to merge 11 commits into
from

Conversation

4 participants
@ajaivgeorge
Contributor

ajaivgeorge commented Aug 24, 2017

Implemented following segmented algorithms -

  1. all_of, any_of, none_of
  2. binary transform_reduce
  3. adjacent_find
  4. adjacent_difference

Unit tests provided for each.

See #1338

@ajaivgeorge

This comment has been minimized.

Show comment
Hide comment
@ajaivgeorge

ajaivgeorge Aug 24, 2017

Contributor

@hkaiser, @mcopik Please have a look.

Contributor

ajaivgeorge commented Aug 24, 2017

@hkaiser, @mcopik Please have a look.

@mcopik

This comment has been minimized.

Show comment
Hide comment
@mcopik

mcopik Aug 24, 2017

Contributor

This implements several missing algorithms from #1338

Contributor

mcopik commented Aug 24, 2017

This implements several missing algorithms from #1338

@mcopik

Good work Ajai! I have left few comments suggesting improvements. Most of them are minor nitpicks, except the question of invoking the binary operator in algorithms from adjacent family. We should discuss this issue further.

dest, std::forward<Op>(op), is_seq());
}
// forward declare the non-segmented version of this algorithm

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

Why is there a forward declaration if you are including the definition with algorithms/adjacent_difference.hpp?

@mcopik

mcopik Aug 27, 2017

Contributor

Why is there a forward declaration if you are including the definition with algorithms/adjacent_difference.hpp?

This comment has been minimized.

@ajaivgeorge

ajaivgeorge Aug 29, 2017

Contributor

This is being done in all the segmented algorithms I have implemented. I started with the existing for_each as base, which contained this forward declaration. It is also there in the existing scans and the generate etc. I am not sure if it is needed, but I have just been following the standard. :P

@ajaivgeorge

ajaivgeorge Aug 29, 2017

Contributor

This is being done in all the segmented algorithms I have implemented. I started with the existing for_each as base, which contained this forward declaration. It is also there in the existing scans and the generate etc. I am not sure if it is needed, but I have just been following the standard. :P

@@ -0,0 +1,100 @@
// Copyright (c) 2017 Ajai V George

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

Tests include only a case where the difference in adjacent pairs is constant. I think it should be extended with at least one case where it is either constantly increasing/decreasing or random.

@mcopik

mcopik Aug 27, 2017

Contributor

Tests include only a case where the difference in adjacent pairs is constant. I think it should be extended with at least one case where it is either constantly increasing/decreasing or random.

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

A test case with an operator (x, y) -> y would produce non-constant results which can be easily checked on the vector with N elements (0, 1, ..., n-1).

@mcopik

mcopik Aug 27, 2017

Contributor

A test case with an operator (x, y) -> y would produce non-constant results which can be easily checked on the vector with N elements (0, 1, ..., n-1).

This comment has been minimized.

@ajaivgeorge

ajaivgeorge Aug 29, 2017

Contributor

Please check the current test cases.

@ajaivgeorge

ajaivgeorge Aug 29, 2017

Contributor

Please check the current test cases.

std::true_type(), beg, end, ldest, op);
beginning = traits1::compose(sit, beg);
if(beginning != last)

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

If I understand correctly, here you apply the difference operator to the adjacent pair (the last value in the previous segment, the first value in the current segment)? I think it will cause an unnecessary communication between localities. I think this could be simplified by dispatching a modified function object which accepts an additional parameter and returns the last value in the segment. I think it's worth saving time by sending data together.

@hkaiser, what do you think?

@mcopik

mcopik Aug 27, 2017

Contributor

If I understand correctly, here you apply the difference operator to the adjacent pair (the last value in the previous segment, the first value in the current segment)? I think it will cause an unnecessary communication between localities. I think this could be simplified by dispatching a modified function object which accepts an additional parameter and returns the last value in the segment. I think it's worth saving time by sending data together.

@hkaiser, what do you think?

This comment has been minimized.

@ajaivgeorge

ajaivgeorge Aug 28, 2017

Contributor

Ok, So i will create another function object for both adjacent difference and adjacent find which accept a parameter for the previous value. Also just to confirm, the function object should return the last value right? Not an iterator to the last input value? Currently the function object returns an iterator to the last output.

What about the case of the parallel version? As you know parallel and sequential versions cannot call different function objects. So the parallel version will be an exact copy of the existing parallel version, right?

@ajaivgeorge

ajaivgeorge Aug 28, 2017

Contributor

Ok, So i will create another function object for both adjacent difference and adjacent find which accept a parameter for the previous value. Also just to confirm, the function object should return the last value right? Not an iterator to the last input value? Currently the function object returns an iterator to the last output.

What about the case of the parallel version? As you know parallel and sequential versions cannot call different function objects. So the parallel version will be an exact copy of the existing parallel version, right?

while(start != between_segments.end())
{
FwdIter2 curr = dest;
std::advance(curr, std::distance(first, *start));

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

Since you're invoking synchronously op on the executing thread, the execution time of this code depends on the latency of communication between localities. Two reads and a one write are necessary for each call to op. There are different ways to do the same thing, though. For example, could we attach a continuation to each future which would immediately write the last value of adjacent_difference on a given segment to the first element of next segment? It would only require disabling the first copy in adjacent_difference function object to avoid a race condition.

Perhaps it would be good to make a quick benchmark comparing how much time could be spent in this section, compared to the parallel execution. I might be wrong about this.

@hkaiser, do you have an opinion on this?

@mcopik

mcopik Aug 27, 2017

Contributor

Since you're invoking synchronously op on the executing thread, the execution time of this code depends on the latency of communication between localities. Two reads and a one write are necessary for each call to op. There are different ways to do the same thing, though. For example, could we attach a continuation to each future which would immediately write the last value of adjacent_difference on a given segment to the first element of next segment? It would only require disabling the first copy in adjacent_difference function object to avoid a race condition.

Perhaps it would be good to make a quick benchmark comparing how much time could be spent in this section, compared to the parallel execution. I might be wrong about this.

@hkaiser, do you have an opinion on this?

This comment has been minimized.

@ajaivgeorge

ajaivgeorge Aug 28, 2017

Contributor

Hmm, so the parallel function object will be exactly same except the first value is ignored. Instead I will attach a then clause to the future, which will call the function object on the first value of current segment and last value of previous segment and write the result to the correct position. Am I right? Also something similar in adjacent find right?

@ajaivgeorge

ajaivgeorge Aug 28, 2017

Contributor

Hmm, so the parallel function object will be exactly same except the first value is ignored. Instead I will attach a then clause to the future, which will call the function object on the first value of current segment and last value of previous segment and write the result to the correct position. Am I right? Also something similar in adjacent find right?

output = traits::compose(sit, out);
}
}
FwdIter ending = traits::compose(sit, std::prev(end));

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

I think that arguments of this logical conjunction should be reversed. The comparison should be performed only iff found = false.

@mcopik

mcopik Aug 27, 2017

Contributor

I think that arguments of this logical conjunction should be reversed. The comparison should be performed only iff found = false.

Show outdated Hide outdated hpx/parallel/segmented_algorithms/all_any_none.hpp
std::vector<bool> res =
hpx::util::unwrap(std::move(r));
auto it = res.begin();
while (it != res.end())

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

Not a problem, but a small remark: this loop could be replaced by one call to std::all_of.

@mcopik

mcopik Aug 27, 2017

Contributor

Not a problem, but a small remark: this loop could be replaced by one call to std::all_of.

Show outdated Hide outdated hpx/parallel/segmented_algorithms/all_any_none.hpp
>::call(r, errors);
std::vector<bool> res =
hpx::util::unwrap(std::move(r));
auto it = res.begin();

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

As above, std::any_of.

@mcopik

mcopik Aug 27, 2017

Contributor

As above, std::any_of.

local_iterator_type1 end1 = traits1::local(last1);
if (beg1 != end1)
{
overall_result = hpx::util::invoke(red_op, overall_result,

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

Indentation and braces locations are slightly confusing here. The style used later in this function (line 292) is much easier to read.

@mcopik

mcopik Aug 27, 2017

Contributor

Indentation and braces locations are slightly confusing here. The style used later in this function (line 292) is much easier to read.

> forced_seq;
std::vector<shared_future<T> > segments;

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

Same as any_of etc. Is a shared_future really necessary here?

@mcopik

mcopik Aug 27, 2017

Contributor

Same as any_of etc. Is a shared_future really necessary here?

template <typename T>
void initialize(hpx::partitioned_vector<T> & xvalues)
{
T init_array[SIZE] = {1,2,3,4, 5,1,2,3, 1,5,2,3, 4,2,3,2, 1,2,3,4, 5,6,5,6,

This comment has been minimized.

@mcopik

mcopik Aug 27, 2017

Contributor

Could you add a test using the binary predicate?

@mcopik

mcopik Aug 27, 2017

Contributor

Could you add a test using the binary predicate?

This comment has been minimized.

@ajaivgeorge

ajaivgeorge Aug 29, 2017

Contributor

Please check the current test cases.

@ajaivgeorge

ajaivgeorge Aug 29, 2017

Contributor

Please check the current test cases.

@ajaivgeorge

This comment has been minimized.

Show comment
Hide comment
@ajaivgeorge

ajaivgeorge Aug 28, 2017

Contributor

@mcopik, I will try to fix the minor issues by today and then work on the new function objects.

Contributor

ajaivgeorge commented Aug 28, 2017

@mcopik, I will try to fix the minor issues by today and then work on the new function objects.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Sep 5, 2017

Member

Now that GSoC is formally over - what is the state of this PR?

Member

hkaiser commented Sep 5, 2017

Now that GSoC is formally over - what is the state of this PR?

@mcopik

This comment has been minimized.

Show comment
Hide comment
@mcopik

mcopik Sep 6, 2017

Contributor

@hkaiser @ajaivgeorge has fixed some issues which I have mentioned but I believe that two algorithms should be improved. Right now, there is an additional round of communication between segments after finishing the dispatched work. I think it should be possible to hide latencies introduced by this additional communication.

Contributor

mcopik commented Sep 6, 2017

@hkaiser @ajaivgeorge has fixed some issues which I have mentioned but I believe that two algorithms should be improved. Right now, there is an additional round of communication between segments after finishing the dispatched work. I think it should be possible to hide latencies introduced by this additional communication.

@ajaivgeorge

This comment has been minimized.

Show comment
Hide comment
@ajaivgeorge

ajaivgeorge Sep 7, 2017

Contributor

@hkaiser This PR is almost ready for merging.

@mcopik I am working on eliminating the additional round of communication in the sequential versions of adjacent_find and adjacent_difference with a custom function object which returns the first value (as you described). Will try to get that done by evening.

Regarding the parallel version, I do not see how appending a future will eliminate the extra round of communication.

Contributor

ajaivgeorge commented Sep 7, 2017

@hkaiser This PR is almost ready for merging.

@mcopik I am working on eliminating the additional round of communication in the sequential versions of adjacent_find and adjacent_difference with a custom function object which returns the first value (as you described). Will try to get that done by evening.

Regarding the parallel version, I do not see how appending a future will eliminate the extra round of communication.

@ajaivgeorge

This comment has been minimized.

Show comment
Hide comment
@ajaivgeorge

ajaivgeorge Sep 11, 2017

Contributor

@mcopik I have been trying to implement the custom function object for the sequential version to reduce communication overhead, as you suggested. For this, the new function objects return the last value of each segment, which is taken as a parameter by the function object working on the next segment. An issue is that, I also need to finally return an iterator to the last destination computed as per the specification of adjacent_find/adjacent_difference. This is easy enough to compute when all the segments are successfully computed. But what happens if one of the segments errors out and is unable to compute the entire result. So last computed dest != end of dest segment. How will this error be found out if the function object does not return this iterator. I suppose I could return a pair<last element in input, last computed iterator in dest> but this does not seem very elegant. What do you suggest?

Contributor

ajaivgeorge commented Sep 11, 2017

@mcopik I have been trying to implement the custom function object for the sequential version to reduce communication overhead, as you suggested. For this, the new function objects return the last value of each segment, which is taken as a parameter by the function object working on the next segment. An issue is that, I also need to finally return an iterator to the last destination computed as per the specification of adjacent_find/adjacent_difference. This is easy enough to compute when all the segments are successfully computed. But what happens if one of the segments errors out and is unable to compute the entire result. So last computed dest != end of dest segment. How will this error be found out if the function object does not return this iterator. I suppose I could return a pair<last element in input, last computed iterator in dest> but this does not seem very elegant. What do you suggest?

@hkaiser hkaiser added this to Open Tickets in Standard Algorithms Jan 21, 2018

@hkaiser hkaiser moved this from Open Tickets to Work in progress in Standard Algorithms Jan 21, 2018

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jan 26, 2018

Member

@ajaivgeorge, @mcopik what will happen to this PR? Should we abandon it? How much work is left to get this into the main branch?

Member

hkaiser commented Jan 26, 2018

@ajaivgeorge, @mcopik what will happen to this PR? Should we abandon it? How much work is left to get this into the main branch?

@mcopik

This comment has been minimized.

Show comment
Hide comment
@mcopik

mcopik Jan 29, 2018

Contributor

@hkaiser Ignoring merge conflicts, the PR is ready - all algorithms seem to be correctly implemented. I have few doubts about the most optimal way to achieve that, that's all.

Contributor

mcopik commented Jan 29, 2018

@hkaiser Ignoring merge conflicts, the PR is ready - all algorithms seem to be correctly implemented. I have few doubts about the most optimal way to achieve that, that's all.

@mcopik

This comment has been minimized.

Show comment
Hide comment
@mcopik

mcopik Jan 29, 2018

Contributor

@ajaivgeorge Can you resolve those conflicts?

Contributor

mcopik commented Jan 29, 2018

@ajaivgeorge Can you resolve those conflicts?

@msimberg msimberg removed this from the 1.1.0 milestone Mar 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment