New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding stencil_iterator and transform_iterator #1427
Conversation
|
||
template <typename StencilIterator, typename Iterator> | ||
inline StencilIterator | ||
make_stencil3_iterator(Iterator const& it) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this function? Doesn't look like it is necessarily creating a stencil3 iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this function is to create a new stencil3_iterator
from the given iterator (it
) for the underlying range [begin, end)
and a pair of iterators referring to the values to be used as the boundary elements (begin_val
and end_val
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't a return type of stencil3_iterator<Iterator, Iterator, Iterator, Iterator, Iterator>
be better? That way the purpose of that function gets clearer by it's return type. Apart from the name it is not clear what this function does, this function could do anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that wouldn't work. This function is supposed to produce an end-iterator compatible with the given iterator type (sorry, I misinterpreted what function you were referring to).
I like the general idea of this, however I think it the stencil3_iterator has a potential performance problem: Dereferencing it involves 3 comparision operations. Compared to an equivalent "naive" implementation this sounds like quite some performance hit. |
Unit tests and documentation would be nice too ;) |
Yes, the performance has to be properly tested. I'll go and add a corresponding test. I'm not too concerned about the dereferencing (this has to be done in any case, even for the 'naive' case), but I agree that the comparison operation (there are only two of those, not three) might impact the performance. Measurements will show... |
Of course ... two comparisions per dereference and the three in each loop have already been optimized. Sure, dereferencing per se is not the problem, the comparison hidden inside probably is. |
I guess one possible way to avoid those unnecessary comparisons is that the |
I'll do some measurements. |
…the user has to handle the boundary cases explicitly. Adding full set of tests.
The commit above (5e372cd) adds versions for both iterators which require to handle the boundary cases explicitly. It also adds tests. |
… transform_iterator and stencil3_iterator
I want to suggest a slight generalization of this API. This generalization is probably not relevant for shared memory, but improves efficiency in many cases when using distributed memory. The use case is that the iterator is not used for finite differencing, but for finite volumes or DG finite elements. In this case, each element of the data structure contains not just one value, but many -- for example, it could hold five coefficients of a Chebyshev expansion. What is needed when calculating fluxes for one element is usually not the full information of the neighbouring elements, but only their boundary data. Thus I want to suggest to expand the API to allow for an additional transformation function to be called before creating the tuple that represents The efficiency gain comes from evaluating |
@eschnett: Thanks for those suggestions. I added the ability to supply your own transformer. For an example, see: https://github.com/STEllAR-GROUP/hpx/blob/transform_iterator/tests/unit/util/stencil3_iterator.cpp#L40. |
The implementation of |
That was quick. Thanks! |
I am still not particularly happy that the stencil3 iterator lives in the util namespace. I think we should stop polluting this namespace unnecessarily, especially since this iterator is a very special case and more of an example of how to use the transfomer functionality. |
What do you suggest as an alternative? |
In my mind, a |
Are there any plans to extend this beyond a 1D stencil? If so: what's the overhead (compared to an optimized implementation) you'd be able to accept? |
My suggestion would be to provide a generic iterator utility module. Probably living under hpx::iterators. I don't think it is wise for us to include something like stencil or nearest neighbor iterators with HPX itself, this clearly is out of scope for HPX itself. As such, I think the stencil3_iterator should be merely an example while the other iterator utilities should live under hpx::iterators. Does this make sense? |
@sithhell: I agree. My plan is to build a set of tools which allow to efficiently build views (stencil3_iterator is just one example of doing this). The concept of views allows to integrate parallelization techniques with container access. All of the current work is merely some experiment to find the right abstractions. |
@gentryx: Overhead? Why should there be an overhead? I expect the actual stencil operation to be cheap (otherwise overhead would not be an issue). Thus the stencil operation and the iterator calls should be inlined into the enclosing loop, which should then be SIMD-vectorized. I would not expect HPX to stand in the way of this. In a multi-dimensional loop, one probably needs to employ loop blocking. I don't expect the compiler to make a good choice; the blocking size would be chosen by the user either with the loop or with the iterator's container. In this case, I expect the innermost loop to be inlined as described above, and the outermost loop to schedule each innermost loop as a separate thread. I don't know whether HPX's containers or iterators can do this already, but that should be the goal. In the end, it should run as efficiently as an OpenMP construct if the loop is perfectly regular, and faster if HPX can exploit any dynamic differences. |
@eschnett: That's exactly the goal of this excercise! Nicely put! |
Am 31.03.2015 15:02 schrieb "Hartmut Kaiser" notifications@github.com:
nod I think having those experiments live in the right namespace from the |
@eschnett That's why I was asking. The assumption that the compiler will efficiently vectorize the code should not be taken for granted. Compilers easily get confused, especially with boundary cases. In 3D there are 2^6 = 64 different boundary cases. All can be resolved at compile time, so that there is (next to) no runtime overhead, but it's complicated. Very subtle changes in the iterator can severely affect performance. Things get worse if you want to ensure efficiency across a wide range of compilers and hardware platforms. It's all doable (I'm doing it with LibGeoDecomp), but I'd suggest to not reinvent the wheel if HPX library support for 3D stencils was planned. Then again @sithhell didn't sound like this was the overall goal. |
The latest commit solves all issues mentioned above. |
#include <hpx/include/iostreams.hpp> | ||
|
||
#include <hpx/util/transform_iterator.hpp> | ||
#include <hpx/util/stencil3_iterator.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an oversight, this file doesn't seem to exist anymore.
Adding stencil_iterator and transform_iterator
This pull request proposes to add two new iterators:
transform_iterator
andstencil3_iterator
.The
transform_iterator
is very similar toboost::transform_iterator
with the main difference of allowing to modify the iterator itself instead of the dereferenced value only.The
stencil3_iterator
is an example of how the newtransform_iterator
can be used to generate a stencil (3 elements wide) on the fly, which allows to reuse the standard algorithms for problems relying on stencil computations:This will print: