Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding stencil_iterator and transform_iterator #1427

Merged
merged 12 commits into from Apr 9, 2015

Conversation

Projects
None yet
4 participants
@hkaiser
Copy link
Member

commented Mar 27, 2015

This pull request proposes to add two new iterators: transform_iterator and stencil3_iterator.

The transform_iterator is very similar to boost::transform_iterator with the main difference of allowing to modify the iterator itself instead of the dereferenced value only.

The stencil3_iterator is an example of how the new transform_iterator can be used to generate a stencil (3 elements wide) on the fly, which allows to reuse the standard algorithms for problems relying on stencil computations:

std::vector<double> values = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
auto r = hpx::util::make_stencil3_range(
    values.begin(), values.end(), &values.back(), &values.front());

typedef typename std::iterator_traits<decltype(r.first)>::reference
    reference;

std::for_each(r.first, r.second, [](reference val)
{
    using hpx::util::get;
    hpx::cout
        << get<0>(val) << " "
        << get<1>(val) << " "
        << get<2>(val) << std::endl;
});

This will print:

9 0 1
0 1 2
...
7 8 9
8 9 0

template <typename StencilIterator, typename Iterator>
inline StencilIterator
make_stencil3_iterator(Iterator const& it)

This comment has been minimized.

Copy link
@sithhell

sithhell Mar 27, 2015

Member

What is the purpose of this function? Doesn't look like it is necessarily creating a stencil3 iterator.

This comment has been minimized.

Copy link
@hkaiser

hkaiser Mar 27, 2015

Author Member

The purpose of this function is to create a new stencil3_iterator from the given iterator (it) for the underlying range [begin, end) and a pair of iterators referring to the values to be used as the boundary elements (begin_val and end_val).

This comment has been minimized.

Copy link
@sithhell

sithhell Mar 27, 2015

Member

wouldn't a return type of stencil3_iterator<Iterator, Iterator, Iterator, Iterator, Iterator> be better? That way the purpose of that function gets clearer by it's return type. Apart from the name it is not clear what this function does, this function could do anything.

This comment has been minimized.

Copy link
@hkaiser

hkaiser Mar 27, 2015

Author Member

No, that wouldn't work. This function is supposed to produce an end-iterator compatible with the given iterator type (sorry, I misinterpreted what function you were referring to).

typedef typename std::iterator_traits<decltype(r.first)>::reference
reference;

std::for_each(r.first, r.second, [](reference val)

This comment has been minimized.

Copy link
@sithhell

sithhell Mar 27, 2015

Member

Minor nitpick: wouldn't std::for_each(std::begin(r), std::end(r), [](reference val) be better here?

This comment has been minimized.

Copy link
@hkaiser

hkaiser Mar 27, 2015

Author Member

Good point. Will fix.

This comment has been minimized.

Copy link
@hkaiser

hkaiser Mar 28, 2015

Author Member

Actually, decltype(r) === std::pair<Iter, Iter> which prevents us from using std::begin() or std::end()

This comment has been minimized.

Copy link
@sithhell

sithhell Mar 30, 2015

Member

I don't like that. This should really have first class range support. In this particular case, it should be easy to write:

for(auto const & val : r)
{
    using hpx::util::get;
    hpx::cout
         << get<0>(val) << " "
         << get<1>(val) << " "
         << get<2>(val) << std::endl;
}
};
};
};
}

This comment has been minimized.

Copy link
@sithhell

sithhell Mar 27, 2015

Member

What is the purpose of those?

This comment has been minimized.

Copy link
@hkaiser

hkaiser Mar 27, 2015

Author Member

Those function objects have been moved from the zip_iterator file to here. Initially I was planning to enable the stencil iterator to be usable in distributed contexts (for which I needed those function objects). Their purpose is to lift the transform all iterators wrapped by a zip_iterator in order to create a new zip_iterator wrapping the transformed iterators (see for instance: https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/hpx/util/zip_iterator.hpp#L579).

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 27, 2015

I like the general idea of this, however I think it the stencil3_iterator has a potential performance problem: Dereferencing it involves 3 comparision operations. Compared to an equivalent "naive" implementation this sounds like quite some performance hit.
In addition, I am not sure if hpx::util Is the proper namespace for that, as naming is hard, I don't have a better suggestion ...

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 27, 2015

Unit tests and documentation would be nice too ;)

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Mar 27, 2015

Yes, the performance has to be properly tested. I'll go and add a corresponding test. I'm not too concerned about the dereferencing (this has to be done in any case, even for the 'naive' case), but I agree that the comparison operation (there are only two of those, not three) might impact the performance. Measurements will show...

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 27, 2015

Of course ... two comparisions per dereference and the three in each loop have already been optimized. Sure, dereferencing per se is not the problem, the comparison hidden inside probably is.

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 28, 2015

Yes, the performance has to be properly tested. I'll go and add a
corresponding test. I'm not too concerned about the dereferencing (this has
to be done in any case, even for the 'naive' case), but I agree that the
comparison operation (there are only two of those, not three) might impact
the performance. Measurements will show...

I guess one possible way to avoid those unnecessary comparisons is that the
stencil iterator only deals with the interior region and the boundary
should be handled separately, at least that's what the current best
practices suggest.

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2015

I guess one possible way to avoid those unnecessary comparisons is that the
stencil iterator only deals with the interior region and the boundary
should be handled separately, at least that's what the current best
practices suggest.

I'll do some measurements.

Adding versions of transform_iterator and stencil_iterator for which …
…the user has to handle the boundary cases explicitly. Adding full set of tests.
@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2015

The commit above (5e372cd) adds versions for both iterators which require to handle the boundary cases explicitly. It also adds tests.

@eschnett

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2015

I want to suggest a slight generalization of this API. This generalization is probably not relevant for shared memory, but improves efficiency in many cases when using distributed memory.

The use case is that the iterator is not used for finite differencing, but for finite volumes or DG finite elements. In this case, each element of the data structure contains not just one value, but many -- for example, it could hold five coefficients of a Chebyshev expansion.

What is needed when calculating fluxes for one element is usually not the full information of the neighbouring elements, but only their boundary data. Thus I want to suggest to expand the API to allow for an additional transformation function to be called before creating the tuple that represents (prev, elem, next). Let's call the function b (for "extract boundary value"); the tuple should then contain (b(prev), elem, b(next)).

The efficiency gain comes from evaluating b on the remote locality where the respective element is already processed, and then transferring less data.

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2015

@eschnett: Thanks for those suggestions. I added the ability to supply your own transformer. For an example, see: https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/tests/unit/util/stencil3_iterator.cpp#L40.

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2015

The implementation of hpx::util::stencil3_iterator is now almost as efficient as the plain explicit implementation of a 3-wide 1d stencil (see https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/tests/performance/local/stencil3_iterators.cpp for some timing results). I think we can go ahead and merge this pull request now.

@eschnett

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2015

That was quick. Thanks!

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 30, 2015

I am still not particularly happy that the stencil3 iterator lives in the util namespace. I think we should stop polluting this namespace unnecessarily, especially since this iterator is a very special case and more of an example of how to use the transfomer functionality.

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Mar 30, 2015

I am still not particularly happy that the stencil3 iterator lives in the util namespace. I think we should
stop polluting this namespace unnecessarily, especially since this iterator is a very special case and
more of an example of how to use the transfomer functionality.

What do you suggest as an alternative?

@eschnett

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2015

In my mind, a stencil3 iterator is a special case of a more generic operation that lives on the same footing as each of map, reduce, or mapreduce. It should be treated as such. If you modify the name to stencil_nearest_neighbour, then it is clear that this special case is rather important in practice and deserves to be part of a standard library. The namespace choices are thus either hpx::experimental or hpx::example, with the goal of putting it directly into hpx.

@gentryx

This comment has been minimized.

Copy link
Member

commented Mar 31, 2015

Are there any plans to extend this beyond a 1D stencil? If so: what's the overhead (compared to an optimized implementation) you'd be able to accept?

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 31, 2015

My suggestion would be to provide a generic iterator utility module. Probably living under hpx::iterators. I don't think it is wise for us to include something like stencil or nearest neighbor iterators with HPX itself, this clearly is out of scope for HPX itself. As such, I think the stencil3_iterator should be merely an example while the other iterator utilities should live under hpx::iterators. Does this make sense?

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Mar 31, 2015

@sithhell: I agree. My plan is to build a set of tools which allow to efficiently build views (stencil3_iterator is just one example of doing this). The concept of views allows to integrate parallelization techniques with container access. All of the current work is merely some experiment to find the right abstractions.

@eschnett

This comment has been minimized.

Copy link
Contributor

commented Mar 31, 2015

@gentryx: Overhead? Why should there be an overhead? I expect the actual stencil operation to be cheap (otherwise overhead would not be an issue). Thus the stencil operation and the iterator calls should be inlined into the enclosing loop, which should then be SIMD-vectorized. I would not expect HPX to stand in the way of this.

In a multi-dimensional loop, one probably needs to employ loop blocking. I don't expect the compiler to make a good choice; the blocking size would be chosen by the user either with the loop or with the iterator's container. In this case, I expect the innermost loop to be inlined as described above, and the outermost loop to schedule each innermost loop as a separate thread. I don't know whether HPX's containers or iterators can do this already, but that should be the goal. In the end, it should run as efficiently as an OpenMP construct if the loop is perfectly regular, and faster if HPX can exploit any dynamic differences.

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 31, 2015

@eschnett: That's exactly the goal of this excercise! Nicely put!

@sithhell

This comment has been minimized.

Copy link
Member

commented Mar 31, 2015

Am 31.03.2015 15:02 schrieb "Hartmut Kaiser" notifications@github.com:

@sithhell: I agree. My plan is to build a set of tools which allow to
efficiently build views (stencil3_iterator is just one example of doing
this). The concept of views allows to integrate parallelization techniques
with container access. All of the current work is merely some experiment to
find the right abstractions.

nod I think having those experiments live in the right namespace from the
start avoids future confusions and potential massive refactoring.

@gentryx

This comment has been minimized.

Copy link
Member

commented Mar 31, 2015

@eschnett That's why I was asking. The assumption that the compiler will efficiently vectorize the code should not be taken for granted. Compilers easily get confused, especially with boundary cases. In 3D there are 2^6 = 64 different boundary cases. All can be resolved at compile time, so that there is (next to) no runtime overhead, but it's complicated. Very subtle changes in the iterator can severely affect performance. Things get worse if you want to ensure efficiency across a wide range of compilers and hardware platforms.

It's all doable (I'm doing it with LibGeoDecomp), but I'd suggest to not reinvent the wheel if HPX library support for 3D stencils was planned. Then again @sithhell didn't sound like this was the overall goal.

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Apr 4, 2015

The latest commit solves all issues mentioned above.

#include <hpx/include/iostreams.hpp>

#include <hpx/util/transform_iterator.hpp>
#include <hpx/util/stencil3_iterator.hpp>

This comment has been minimized.

Copy link
@sithhell

sithhell Apr 6, 2015

Member

Looks like an oversight, this file doesn't seem to exist anymore.

#include <hpx/util/lightweight_test.hpp>

#include <hpx/util/transform_iterator.hpp>
#include <hpx/util/stencil3_iterator.hpp>

This comment has been minimized.

Copy link
@sithhell

sithhell Apr 6, 2015

Member

Looks like an oversight, this file doesn't seem to exist anymore.

#include <hpx/util/lightweight_test.hpp>

#include <hpx/util/transform_iterator.hpp>
#include <hpx/util/stencil3_iterator.hpp>

This comment has been minimized.

Copy link
@sithhell

sithhell Apr 6, 2015

Member

Looks like an oversight, this file doesn't seem to exist anymore.

sithhell added a commit that referenced this pull request Apr 9, 2015

Merge pull request #1427 from STEllAR-GROUP/transform_iterator
Adding stencil_iterator and transform_iterator

@sithhell sithhell merged commit 472bcca into master Apr 9, 2015

1 check failed

ci/circleci Your tests failed
Details

@sithhell sithhell deleted the transform_iterator branch Apr 9, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.