Adding stencil_iterator and transform_iterator #1427

hkaiser · 2015-03-27T19:13:20Z

This pull request proposes to add two new iterators: transform_iterator and stencil3_iterator.

The transform_iterator is very similar to boost::transform_iterator with the main difference of allowing to modify the iterator itself instead of the dereferenced value only.

The stencil3_iterator is an example of how the new transform_iterator can be used to generate a stencil (3 elements wide) on the fly, which allows to reuse the standard algorithms for problems relying on stencil computations:

std::vector<double> values = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
auto r = hpx::util::make_stencil3_range(
    values.begin(), values.end(), &values.back(), &values.front());

typedef typename std::iterator_traits<decltype(r.first)>::reference
    reference;

std::for_each(r.first, r.second, [](reference val)
{
    using hpx::util::get;
    hpx::cout
        << get<0>(val) << " "
        << get<1>(val) << " "
        << get<2>(val) << std::endl;
});

This will print:

sithhell · 2015-03-27T20:25:16Z

hpx/util/stencil3_iterator.hpp

+
+    template <typename StencilIterator, typename Iterator>
+    inline StencilIterator
+    make_stencil3_iterator(Iterator const& it)


What is the purpose of this function? Doesn't look like it is necessarily creating a stencil3 iterator.

The purpose of this function is to create a new stencil3_iterator from the given iterator (it) for the underlying range [begin, end) and a pair of iterators referring to the values to be used as the boundary elements (begin_val and end_val).

wouldn't a return type of stencil3_iterator<Iterator, Iterator, Iterator, Iterator, Iterator> be better? That way the purpose of that function gets clearer by it's return type. Apart from the name it is not clear what this function does, this function could do anything.

No, that wouldn't work. This function is supposed to produce an end-iterator compatible with the given iterator type (sorry, I misinterpreted what function you were referring to).

sithhell · 2015-03-27T20:39:59Z

I like the general idea of this, however I think it the stencil3_iterator has a potential performance problem: Dereferencing it involves 3 comparision operations. Compared to an equivalent "naive" implementation this sounds like quite some performance hit.
In addition, I am not sure if hpx::util Is the proper namespace for that, as naming is hard, I don't have a better suggestion ...

sithhell · 2015-03-27T20:46:26Z

Unit tests and documentation would be nice too ;)

hkaiser · 2015-03-27T20:48:05Z

Yes, the performance has to be properly tested. I'll go and add a corresponding test. I'm not too concerned about the dereferencing (this has to be done in any case, even for the 'naive' case), but I agree that the comparison operation (there are only two of those, not three) might impact the performance. Measurements will show...

sithhell · 2015-03-27T20:59:16Z

Of course ... two comparisions per dereference and the three in each loop have already been optimized. Sure, dereferencing per se is not the problem, the comparison hidden inside probably is.

sithhell · 2015-03-28T11:58:10Z

Yes, the performance has to be properly tested. I'll go and add a
corresponding test. I'm not too concerned about the dereferencing (this has
to be done in any case, even for the 'naive' case), but I agree that the
comparison operation (there are only two of those, not three) might impact
the performance. Measurements will show...

I guess one possible way to avoid those unnecessary comparisons is that the
stencil iterator only deals with the interior region and the boundary
should be handled separately, at least that's what the current best
practices suggest.

hkaiser · 2015-03-28T12:32:18Z

I guess one possible way to avoid those unnecessary comparisons is that the
stencil iterator only deals with the interior region and the boundary
should be handled separately, at least that's what the current best
practices suggest.

I'll do some measurements.

…the user has to handle the boundary cases explicitly. Adding full set of tests.

hkaiser · 2015-03-28T22:50:40Z

The commit above (5e372cd) adds versions for both iterators which require to handle the boundary cases explicitly. It also adds tests.

… transform_iterator and stencil3_iterator

eschnett · 2015-03-29T19:48:52Z

I want to suggest a slight generalization of this API. This generalization is probably not relevant for shared memory, but improves efficiency in many cases when using distributed memory.

The use case is that the iterator is not used for finite differencing, but for finite volumes or DG finite elements. In this case, each element of the data structure contains not just one value, but many -- for example, it could hold five coefficients of a Chebyshev expansion.

What is needed when calculating fluxes for one element is usually not the full information of the neighbouring elements, but only their boundary data. Thus I want to suggest to expand the API to allow for an additional transformation function to be called before creating the tuple that represents (prev, elem, next). Let's call the function b (for "extract boundary value"); the tuple should then contain (b(prev), elem, b(next)).

The efficiency gain comes from evaluating b on the remote locality where the respective element is already processed, and then transferring less data.

hkaiser · 2015-03-29T20:25:00Z

@eschnett: Thanks for those suggestions. I added the ability to supply your own transformer. For an example, see: https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/tests/unit/util/stencil3_iterator.cpp#L40.

hkaiser · 2015-03-29T20:27:23Z

The implementation of hpx::util::stencil3_iterator is now almost as efficient as the plain explicit implementation of a 3-wide 1d stencil (see https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/tests/performance/local/stencil3_iterators.cpp for some timing results). I think we can go ahead and merge this pull request now.

eschnett · 2015-03-29T20:33:41Z

That was quick. Thanks!

sithhell · 2015-03-30T06:20:27Z

I am still not particularly happy that the stencil3 iterator lives in the util namespace. I think we should stop polluting this namespace unnecessarily, especially since this iterator is a very special case and more of an example of how to use the transfomer functionality.

hkaiser · 2015-03-30T13:10:59Z

I am still not particularly happy that the stencil3 iterator lives in the util namespace. I think we should
stop polluting this namespace unnecessarily, especially since this iterator is a very special case and
more of an example of how to use the transfomer functionality.

What do you suggest as an alternative?

eschnett · 2015-03-30T14:09:44Z

In my mind, a stencil3 iterator is a special case of a more generic operation that lives on the same footing as each of map, reduce, or mapreduce. It should be treated as such. If you modify the name to stencil_nearest_neighbour, then it is clear that this special case is rather important in practice and deserves to be part of a standard library. The namespace choices are thus either hpx::experimental or hpx::example, with the goal of putting it directly into hpx.

gentryx · 2015-03-31T04:36:51Z

Are there any plans to extend this beyond a 1D stencil? If so: what's the overhead (compared to an optimized implementation) you'd be able to accept?

sithhell · 2015-03-31T08:36:45Z

My suggestion would be to provide a generic iterator utility module. Probably living under hpx::iterators. I don't think it is wise for us to include something like stencil or nearest neighbor iterators with HPX itself, this clearly is out of scope for HPX itself. As such, I think the stencil3_iterator should be merely an example while the other iterator utilities should live under hpx::iterators. Does this make sense?

hkaiser · 2015-03-31T13:01:38Z

@sithhell: I agree. My plan is to build a set of tools which allow to efficiently build views (stencil3_iterator is just one example of doing this). The concept of views allows to integrate parallelization techniques with container access. All of the current work is merely some experiment to find the right abstractions.

eschnett · 2015-03-31T13:33:16Z

@gentryx: Overhead? Why should there be an overhead? I expect the actual stencil operation to be cheap (otherwise overhead would not be an issue). Thus the stencil operation and the iterator calls should be inlined into the enclosing loop, which should then be SIMD-vectorized. I would not expect HPX to stand in the way of this.

In a multi-dimensional loop, one probably needs to employ loop blocking. I don't expect the compiler to make a good choice; the blocking size would be chosen by the user either with the loop or with the iterator's container. In this case, I expect the innermost loop to be inlined as described above, and the outermost loop to schedule each innermost loop as a separate thread. I don't know whether HPX's containers or iterators can do this already, but that should be the goal. In the end, it should run as efficiently as an OpenMP construct if the loop is perfectly regular, and faster if HPX can exploit any dynamic differences.

sithhell · 2015-03-31T13:42:19Z

@eschnett: That's exactly the goal of this excercise! Nicely put!

sithhell · 2015-03-31T14:27:49Z

Am 31.03.2015 15:02 schrieb "Hartmut Kaiser" notifications@github.com:

@sithhell: I agree. My plan is to build a set of tools which allow to
efficiently build views (stencil3_iterator is just one example of doing
this). The concept of views allows to integrate parallelization techniques
with container access. All of the current work is merely some experiment to
find the right abstractions.

nod I think having those experiments live in the right namespace from the
start avoids future confusions and potential massive refactoring.

gentryx · 2015-03-31T15:02:17Z

@eschnett That's why I was asking. The assumption that the compiler will efficiently vectorize the code should not be taken for granted. Compilers easily get confused, especially with boundary cases. In 3D there are 2^6 = 64 different boundary cases. All can be resolved at compile time, so that there is (next to) no runtime overhead, but it's complicated. Very subtle changes in the iterator can severely affect performance. Things get worse if you want to ensure efficiency across a wide range of compilers and hardware platforms.

It's all doable (I'm doing it with LibGeoDecomp), but I'd suggest to not reinvent the wheel if HPX library support for 3D stencils was planned. Then again @sithhell didn't sound like this was the overall goal.

hkaiser · 2015-04-04T17:38:36Z

The latest commit solves all issues mentioned above.

sithhell · 2015-04-06T21:16:15Z

examples/quickstart/vector_stencil_operations.cpp

+#include <hpx/include/iostreams.hpp>
+
+#include <hpx/util/transform_iterator.hpp>
+#include <hpx/util/stencil3_iterator.hpp>


Looks like an oversight, this file doesn't seem to exist anymore.

Adding stencil_iterator and transform_iterator

hkaiser added 4 commits March 22, 2015 13:30

Snapshot of work on transform_iterator

f714a52

Removing unnecessary support for segmented iterators

0e327f4

Adding stencil3_iterator

f68e9e3

Merge branch 'master' into transform_iterator

bc8b07f

hkaiser added category: algorithms affecting: LSU category: core labels Mar 27, 2015

hkaiser added this to the 0.9.11 milestone Mar 27, 2015

hkaiser added the type: enhancement label Mar 27, 2015

sithhell reviewed Mar 27, 2015
View reviewed changes

Adding versions of transform_iterator and stencil_iterator for which …

5e372cd

…the user has to handle the boundary cases explicitly. Adding full set of tests.

hkaiser added 3 commits March 28, 2015 18:00

Relaxing iterator category limitation for the base iterators used for…

6bae083

… transform_iterator and stencil3_iterator

Simplify stencil3_iterator_nocheck

87de47c

Adding various performance tests

fc31a1e

hkaiser added 2 commits March 29, 2015 15:00

Using most efficient implementation as the default inside HPX

1b28473

Adding ability to customize transformer used in the stencil3_iterator

5a35a62

Remove stencil3_iterator header, moving it to benchmark application

19b9da7

sithhell reviewed Apr 6, 2015
View reviewed changes

Adapted code for deleted header file

a46c731

sithhell added a commit that referenced this pull request Apr 9, 2015

Merge pull request #1427 from STEllAR-GROUP/transform_iterator

472bcca

Adding stencil_iterator and transform_iterator

sithhell merged commit 472bcca into master Apr 9, 2015

sithhell deleted the transform_iterator branch April 9, 2015 06:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding stencil_iterator and transform_iterator #1427

Adding stencil_iterator and transform_iterator #1427

hkaiser commented Mar 27, 2015

sithhell Mar 27, 2015

hkaiser Mar 27, 2015

sithhell Mar 27, 2015

hkaiser Mar 27, 2015

sithhell commented Mar 27, 2015

sithhell commented Mar 27, 2015

hkaiser commented Mar 27, 2015

sithhell commented Mar 27, 2015

sithhell commented Mar 28, 2015

hkaiser commented Mar 28, 2015

hkaiser commented Mar 28, 2015

eschnett commented Mar 29, 2015

hkaiser commented Mar 29, 2015

hkaiser commented Mar 29, 2015

eschnett commented Mar 29, 2015

sithhell commented Mar 30, 2015

hkaiser commented Mar 30, 2015

eschnett commented Mar 30, 2015

gentryx commented Mar 31, 2015

sithhell commented Mar 31, 2015

hkaiser commented Mar 31, 2015

eschnett commented Mar 31, 2015

sithhell commented Mar 31, 2015

sithhell commented Mar 31, 2015

gentryx commented Mar 31, 2015

hkaiser commented Apr 4, 2015

sithhell Apr 6, 2015

Adding stencil_iterator and transform_iterator #1427

Adding stencil_iterator and transform_iterator #1427

Conversation

hkaiser commented Mar 27, 2015

sithhell Mar 27, 2015

Choose a reason for hiding this comment

hkaiser Mar 27, 2015

Choose a reason for hiding this comment

sithhell Mar 27, 2015

Choose a reason for hiding this comment

hkaiser Mar 27, 2015

Choose a reason for hiding this comment

sithhell commented Mar 27, 2015

sithhell commented Mar 27, 2015

hkaiser commented Mar 27, 2015

sithhell commented Mar 27, 2015

sithhell commented Mar 28, 2015

hkaiser commented Mar 28, 2015

hkaiser commented Mar 28, 2015

eschnett commented Mar 29, 2015

hkaiser commented Mar 29, 2015

hkaiser commented Mar 29, 2015

eschnett commented Mar 29, 2015

sithhell commented Mar 30, 2015

hkaiser commented Mar 30, 2015

eschnett commented Mar 30, 2015

gentryx commented Mar 31, 2015

sithhell commented Mar 31, 2015

hkaiser commented Mar 31, 2015

eschnett commented Mar 31, 2015

sithhell commented Mar 31, 2015

sithhell commented Mar 31, 2015

gentryx commented Mar 31, 2015

hkaiser commented Apr 4, 2015

sithhell Apr 6, 2015

Choose a reason for hiding this comment