Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding prefetching_iterator and related tests used for prefetching containers within lambda functions #2179

Closed
wants to merge 34 commits into from

Conversation

zkhatami88
Copy link
Contributor

prefetching_iterator is created within loop.hpp to prefetch all containers within lambda functions.
for_each.hpp is changed as well to work with loop_n within loop.hpp to call prefetching_iterator.
foreach_prefetching.cpp, foreach_prefetching_executors.cpp and foreach_tests_prefetching.hpp are related tests created for testing the prefetching_iterator performance.

@sithhell
Copy link
Member

The files you added miss a newline at the end. This needs to get fixed.
In addition, would it be possible to have the prefetch stuff in a seperate header instead of having it all in the for_loop header? Could we reuse it for other algorithms as well?

for(std::size_t i=0; i<vector_size; ++i) range[i] = i + begin;
it_begin = range.begin();
it_end = range.begin() + vector_size - 1;
chunk_size = p_factor * 64ul / sizeof(T);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does the 64 come from?

@zkhatami88
Copy link
Contributor Author

zkhatami88 commented May 25, 2016

sithhell : Honestly we are thinking about having prefetching as a policy parameter, so it can be used by other algorithms too. Also, current prefetching could be apply easily on transform algorithm too.

chunk_size used for prefetching here, is determined based on the cache line size. On x86 cache lines are 64 bytes. As far as I know, there is not any command to find out cache line sizes on different architectures. So, I used 64 as a default for determining chunk_size here.

Instead of using _mm_prefetch, reading the next element in the container can be used for prefetching next chunk_size, but it is more expensive than using _mm_prefetch. Do you have any suggestion?

According to the range index, first I used boost::irange, but I found it to have a high overhead which prevents scaling. So I changed it to vector, and it scales much better.

I will try using array_view instead of initializer_list.

@zkhatami88
Copy link
Contributor Author

Can anyone please tell me why my pull request fails?
It builds completely fine, but for "$ docker run -v $PWD:/hpx -w /hpx ${IMAGE_NAME} ./build/bin/inspect --all --output=./build/hpx_inspect_report.html /hpx", it gives "exit code 1".

@hkaiser
Copy link
Member

hkaiser commented May 26, 2016

@sithhell
Copy link
Member

sithhell commented May 26, 2016

Am 25.05.2016 3:27 nachm. schrieb "Zahra Khatami" <notifications@github.com

:

sithhell: Honestly we are thing about having prefetching as a policy parameter, so it can be used by other algorithms too. Also, current prefetching could be apply easily on transform algorithm too.

It would be really nice to have it in its own headers then.

chunk_size used for prefetching here, is determined based on the cache line size. On x86 cache lines are 64 bytes. As far as I know, there is not any command to find out cache line sizes on different architectures. So, I used 64 as a default for determining chunk_size here.

Turn it into a configurable (through cmake) define).

Instead of using _mm_prefetch, reading the next element in the container can be used for prefetching next chunk_size, but it is more expensive than using _mm_prefetch. Do you have any suggestion?

As said. This works only for x86. It will not work on other platforms.
Either ignore it for now or figure out the intrinsics for other platforms
in addition to the fallback you describe.

According to the rage index, first I used boost::irange, but I found it to have a high overhead which prevents scaling. So I changed it to vector, and it scales much better.

Having a vector sounds wasteful. That's all I'm saying.

I will try using array_view instead of initializer_list.


You are receiving this because you commented.

Reply to this email directly or view it on GitHub

@@ -71,11 +72,13 @@ namespace hpx { namespace parallel { HPX_INLINE_NAMESPACE(v1)
[f, proj](std::size_t /*part_index*/,
InIter part_begin, std::size_t part_size) mutable
{
typedef typename util::loop_n_iterator_mapping<
InIter>::type iterator_type;
Copy link
Member

@hkaiser hkaiser Jun 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be combined with the loop_n<> template.

@hkaiser
Copy link
Member

hkaiser commented Jun 9, 2016

@zkhatami88 I will have a look


namespace hpx { namespace parallel { namespace util
{
///////////////////////////////////////////////////////////////////////////
namespace detail
{
///////////////////////////////////////////////////////////////////////
// Helper class to repeatedly call a function starting from a given
// Helper class to repeatedly call a function starting from a givena
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a typo here.

@hkaiser
Copy link
Member

hkaiser commented Jun 14, 2016

@zkhatami88 FWIW, I'm working on fixing things here, feel free to chime in.

@hkaiser
Copy link
Member

hkaiser commented Jun 19, 2016

@zkhatami88 I'd be happy to merge this once the conflicts with our master branch are resolved.

@zkhatami88
Copy link
Contributor Author

@hkaiser sure, that's a good idea. Also, the test is failed, I am going to fix it too.

@sithhell
Copy link
Member

sithhell commented Jun 19, 2016 via email

@zkhatami88
Copy link
Contributor Author

@sithhell Thank you so much for your comments. I fixed typo in loop.hpp and in prefetching.hpp, sequencer is implemented to have iteration over pefetching containers in each time step instead of having recursion.

@@ -21,7 +21,7 @@
#include <hpx/parallel/algorithms/detail/is_negative.hpp>
#include <hpx/parallel/util/detail/algorithm_result.hpp>
#include <hpx/parallel/util/foreach_partitioner.hpp>
#include <hpx/parallel/util/loop.hpp>
#include <hpx/parallel/util/prefetching.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not correct. We should not pull in the prefetch.hpp header into the for_each.hpp. If a user wants to use prefetching he/she should explicitly include an appropriate header file. Also, for_each depends on loop.hpp, so why did you remove it?

@zkhatami88
Copy link
Contributor Author

zkhatami88 commented Jun 19, 2016

@hkaiser loop.hpp is included in the header in prefetching.hpp, so by having prefetching.hpp in the headers in for_each.hpp, loop.hpp will be included as well.
But my main reason was that in my own test, when I wanted to include prefetching.hpp separately in main.cpp file, there were so many compilation error. After including prefetching.hpp in for_each.hpp, the problem solved. Also, with this way, user should not care of header. He/She can use prefetching or not using prefetching with the same headers.

@hkaiser
Copy link
Member

hkaiser commented Jun 19, 2016

@zkhatami88 for_each depends on loop.hpp, but not on prefetching.hpp, thus it should include the appropriate header. Also, we should avoid relying on implicit #includes as much as possible. Every header file should include the minimal but complete set of headers it depends on. We do this to a) make every header self-contained (which is checked by CircleCI), and b) minimize the compile time for each of the headers (which unfortunately we can't check currently).

If including prefetching.hpp in user code leads to compilation errors, those have to be fixed separately, but most likely your user code missed other includes in the first place.

@zkhatami88
Copy link
Contributor Author

@hkaiser oh ok. I am going to fix it and I will push another commit.

@@ -223,22 +226,15 @@ namespace hpx { namespace parallel { HPX_INLINE_NAMESPACE(v1)
sequential(ExPolicy, InIter first, InIter last, F && f,
Proj && proj)
{
typedef typename util::detail::loop<InIter>::type it_type;

return util::loop(first, last,
[&f, &proj](Iter curr)
{
f(hpx::util::invoke(proj, *curr));
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkaiser :
Test fails because of this line, that:

/hpx/hpx/parallel/algorithms/for_each.hpp:231:24: error: no viable conversion from returned value of type hpx::local_raw_vector_iterator<int, _gnu_cxx::_normal_iterator<int *, std::vector<int, std::allocator<int> > > > to function return type hpx::local_vector_iterator<int>

In all lambdas in this file, hpx::util::invoke(f, hpx::util::invoke(proj, *curr)); is used except this line, shouldn't we change f(hpx::util::invoke(proj, *curr)); to hpx::util::invoke(f, hpx::util::invoke(proj, *curr)); as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a look

@hkaiser
Copy link
Member

hkaiser commented Jun 25, 2016

This is superseded by #2228

@hkaiser hkaiser closed this Jun 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants