Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding stencil_iterator and transform_iterator #1427

Merged
merged 12 commits into from Apr 9, 2015
Merged

Conversation

hkaiser
Copy link
Member

@hkaiser hkaiser commented Mar 27, 2015

This pull request proposes to add two new iterators: transform_iterator and stencil3_iterator.

The transform_iterator is very similar to boost::transform_iterator with the main difference of allowing to modify the iterator itself instead of the dereferenced value only.

The stencil3_iterator is an example of how the new transform_iterator can be used to generate a stencil (3 elements wide) on the fly, which allows to reuse the standard algorithms for problems relying on stencil computations:

std::vector<double> values = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
auto r = hpx::util::make_stencil3_range(
    values.begin(), values.end(), &values.back(), &values.front());

typedef typename std::iterator_traits<decltype(r.first)>::reference
    reference;

std::for_each(r.first, r.second, [](reference val)
{
    using hpx::util::get;
    hpx::cout
        << get<0>(val) << " "
        << get<1>(val) << " "
        << get<2>(val) << std::endl;
});

This will print:

9 0 1
0 1 2
...
7 8 9
8 9 0


template <typename StencilIterator, typename Iterator>
inline StencilIterator
make_stencil3_iterator(Iterator const& it)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this function? Doesn't look like it is necessarily creating a stencil3 iterator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this function is to create a new stencil3_iterator from the given iterator (it) for the underlying range [begin, end) and a pair of iterators referring to the values to be used as the boundary elements (begin_val and end_val).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't a return type of stencil3_iterator<Iterator, Iterator, Iterator, Iterator, Iterator> be better? That way the purpose of that function gets clearer by it's return type. Apart from the name it is not clear what this function does, this function could do anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that wouldn't work. This function is supposed to produce an end-iterator compatible with the given iterator type (sorry, I misinterpreted what function you were referring to).

@sithhell
Copy link
Member

I like the general idea of this, however I think it the stencil3_iterator has a potential performance problem: Dereferencing it involves 3 comparision operations. Compared to an equivalent "naive" implementation this sounds like quite some performance hit.
In addition, I am not sure if hpx::util Is the proper namespace for that, as naming is hard, I don't have a better suggestion ...

@sithhell
Copy link
Member

Unit tests and documentation would be nice too ;)

@hkaiser
Copy link
Member Author

hkaiser commented Mar 27, 2015

Yes, the performance has to be properly tested. I'll go and add a corresponding test. I'm not too concerned about the dereferencing (this has to be done in any case, even for the 'naive' case), but I agree that the comparison operation (there are only two of those, not three) might impact the performance. Measurements will show...

@sithhell
Copy link
Member

Of course ... two comparisions per dereference and the three in each loop have already been optimized. Sure, dereferencing per se is not the problem, the comparison hidden inside probably is.

@sithhell
Copy link
Member

Yes, the performance has to be properly tested. I'll go and add a
corresponding test. I'm not too concerned about the dereferencing (this has
to be done in any case, even for the 'naive' case), but I agree that the
comparison operation (there are only two of those, not three) might impact
the performance. Measurements will show...

I guess one possible way to avoid those unnecessary comparisons is that the
stencil iterator only deals with the interior region and the boundary
should be handled separately, at least that's what the current best
practices suggest.

@hkaiser
Copy link
Member Author

hkaiser commented Mar 28, 2015

I guess one possible way to avoid those unnecessary comparisons is that the
stencil iterator only deals with the interior region and the boundary
should be handled separately, at least that's what the current best
practices suggest.

I'll do some measurements.

…the user has to handle the boundary cases explicitly. Adding full set of tests.
@hkaiser
Copy link
Member Author

hkaiser commented Mar 28, 2015

The commit above (5e372cd) adds versions for both iterators which require to handle the boundary cases explicitly. It also adds tests.

@eschnett
Copy link
Contributor

I want to suggest a slight generalization of this API. This generalization is probably not relevant for shared memory, but improves efficiency in many cases when using distributed memory.

The use case is that the iterator is not used for finite differencing, but for finite volumes or DG finite elements. In this case, each element of the data structure contains not just one value, but many -- for example, it could hold five coefficients of a Chebyshev expansion.

What is needed when calculating fluxes for one element is usually not the full information of the neighbouring elements, but only their boundary data. Thus I want to suggest to expand the API to allow for an additional transformation function to be called before creating the tuple that represents (prev, elem, next). Let's call the function b (for "extract boundary value"); the tuple should then contain (b(prev), elem, b(next)).

The efficiency gain comes from evaluating b on the remote locality where the respective element is already processed, and then transferring less data.

@hkaiser
Copy link
Member Author

hkaiser commented Mar 29, 2015

@eschnett: Thanks for those suggestions. I added the ability to supply your own transformer. For an example, see: https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/tests/unit/util/stencil3_iterator.cpp#L40.

@hkaiser
Copy link
Member Author

hkaiser commented Mar 29, 2015

The implementation of hpx::util::stencil3_iterator is now almost as efficient as the plain explicit implementation of a 3-wide 1d stencil (see https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/tests/performance/local/stencil3_iterators.cpp for some timing results). I think we can go ahead and merge this pull request now.

@eschnett
Copy link
Contributor

That was quick. Thanks!

@sithhell
Copy link
Member

I am still not particularly happy that the stencil3 iterator lives in the util namespace. I think we should stop polluting this namespace unnecessarily, especially since this iterator is a very special case and more of an example of how to use the transfomer functionality.

@hkaiser
Copy link
Member Author

hkaiser commented Mar 30, 2015

I am still not particularly happy that the stencil3 iterator lives in the util namespace. I think we should
stop polluting this namespace unnecessarily, especially since this iterator is a very special case and
more of an example of how to use the transfomer functionality.

What do you suggest as an alternative?

@eschnett
Copy link
Contributor

In my mind, a stencil3 iterator is a special case of a more generic operation that lives on the same footing as each of map, reduce, or mapreduce. It should be treated as such. If you modify the name to stencil_nearest_neighbour, then it is clear that this special case is rather important in practice and deserves to be part of a standard library. The namespace choices are thus either hpx::experimental or hpx::example, with the goal of putting it directly into hpx.

@gentryx
Copy link
Member

gentryx commented Mar 31, 2015

Are there any plans to extend this beyond a 1D stencil? If so: what's the overhead (compared to an optimized implementation) you'd be able to accept?

@sithhell
Copy link
Member

My suggestion would be to provide a generic iterator utility module. Probably living under hpx::iterators. I don't think it is wise for us to include something like stencil or nearest neighbor iterators with HPX itself, this clearly is out of scope for HPX itself. As such, I think the stencil3_iterator should be merely an example while the other iterator utilities should live under hpx::iterators. Does this make sense?

@hkaiser
Copy link
Member Author

hkaiser commented Mar 31, 2015

@sithhell: I agree. My plan is to build a set of tools which allow to efficiently build views (stencil3_iterator is just one example of doing this). The concept of views allows to integrate parallelization techniques with container access. All of the current work is merely some experiment to find the right abstractions.

@eschnett
Copy link
Contributor

@gentryx: Overhead? Why should there be an overhead? I expect the actual stencil operation to be cheap (otherwise overhead would not be an issue). Thus the stencil operation and the iterator calls should be inlined into the enclosing loop, which should then be SIMD-vectorized. I would not expect HPX to stand in the way of this.

In a multi-dimensional loop, one probably needs to employ loop blocking. I don't expect the compiler to make a good choice; the blocking size would be chosen by the user either with the loop or with the iterator's container. In this case, I expect the innermost loop to be inlined as described above, and the outermost loop to schedule each innermost loop as a separate thread. I don't know whether HPX's containers or iterators can do this already, but that should be the goal. In the end, it should run as efficiently as an OpenMP construct if the loop is perfectly regular, and faster if HPX can exploit any dynamic differences.

@sithhell
Copy link
Member

@eschnett: That's exactly the goal of this excercise! Nicely put!

@sithhell
Copy link
Member

Am 31.03.2015 15:02 schrieb "Hartmut Kaiser" notifications@github.com:

@sithhell: I agree. My plan is to build a set of tools which allow to
efficiently build views (stencil3_iterator is just one example of doing
this). The concept of views allows to integrate parallelization techniques
with container access. All of the current work is merely some experiment to
find the right abstractions.

nod I think having those experiments live in the right namespace from the
start avoids future confusions and potential massive refactoring.

@gentryx
Copy link
Member

gentryx commented Mar 31, 2015

@eschnett That's why I was asking. The assumption that the compiler will efficiently vectorize the code should not be taken for granted. Compilers easily get confused, especially with boundary cases. In 3D there are 2^6 = 64 different boundary cases. All can be resolved at compile time, so that there is (next to) no runtime overhead, but it's complicated. Very subtle changes in the iterator can severely affect performance. Things get worse if you want to ensure efficiency across a wide range of compilers and hardware platforms.

It's all doable (I'm doing it with LibGeoDecomp), but I'd suggest to not reinvent the wheel if HPX library support for 3D stencils was planned. Then again @sithhell didn't sound like this was the overall goal.

@hkaiser
Copy link
Member Author

hkaiser commented Apr 4, 2015

The latest commit solves all issues mentioned above.

#include <hpx/include/iostreams.hpp>

#include <hpx/util/transform_iterator.hpp>
#include <hpx/util/stencil3_iterator.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an oversight, this file doesn't seem to exist anymore.

sithhell added a commit that referenced this pull request Apr 9, 2015
Adding stencil_iterator and transform_iterator
@sithhell sithhell merged commit 472bcca into master Apr 9, 2015
@sithhell sithhell deleted the transform_iterator branch April 9, 2015 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants