New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel::partition. #2778

Merged
merged 15 commits into from Aug 8, 2017

Conversation

4 participants
@taeguk
Member

taeguk commented Jul 21, 2017

This is related to #1141

Check Box

  • Implementation of parallel::partition.
  • Unit tests.
  • Benchmark codes.
  • Determine block size for sub-partitioning.
  • Refactor parallel::partition.
  • Fix inspection errors and compile errors in gcc.
  • Adapt to Ranges TS.
  • Add more comments.

- [ ] Enable that the user can control block size. (see #2803)

Issues

  1. How we determine 'block size' which is used for sub-partitioning in main parallel phrase. => For now, I determine block size to 20000 through my experiments and the paper I referred. But, it may be not perfect number for block size. In future, we may find better block size for get better performance.

Notes for future

  1. Why is benchmark with random access iterator tag somewhat slower than benchmarks with bidirectional or forward tag?
  2. The block_manager for random access iterator tag is not much useful.

@hkaiser hkaiser added this to the 1.1.0 milestone Jul 21, 2017

@hkaiser hkaiser referenced this pull request Jul 21, 2017

Open

Implement N4409 on top of HPX #1141

40 of 41 tasks complete

@hkaiser hkaiser added this to Work in progress in Standard Algorithms Jul 21, 2017

taeguk added some commits Jul 25, 2017

Use our own implementation of swap_ranges instead of std::swap_ranges…
… to ensure the way swap_ranges works in parallel::partition.
Refactor parallel::partition.
1. Remove non-meaning unnamed namespace.
2. Use HPX_CONCEPT_REQUIRES_ instead of explicit std::enable_if.
3. Put codes for parallel partition into partition_helper.
@biddisco

This comment has been minimized.

Show comment
Hide comment
@biddisco

biddisco Jul 26, 2017

Contributor

When working on various scan based algorithms (and also parallel::sort), I came to the conclusion that it was a difficult problem to decide what the 'best' chunk/block size to use would be without first running the algorithm to benchmark it with a few test sizes.

My own view is that we should have a cmake integrated set of tests for certain of the parallel algorithms (and others if desired), that runs those algorithms and generates a config.h file in the build directory that is then picked up during the main build of hpx.

the idea would be that a user downloads hpx and builds it and gets 'default' values for the block sizes etc., but if they run a helper application that is also compiled - with a name like - hpx_platform_optimization(.exe) then the algorithms will be run with some adaptive checking that detects the optimal (or at least a reasonable) value for certain params like block size, and dumps these out to the config file. the user can then recompile hpx and have some faith that it has been tweaksed a bit to work well on that machine.

Contributor

biddisco commented Jul 26, 2017

When working on various scan based algorithms (and also parallel::sort), I came to the conclusion that it was a difficult problem to decide what the 'best' chunk/block size to use would be without first running the algorithm to benchmark it with a few test sizes.

My own view is that we should have a cmake integrated set of tests for certain of the parallel algorithms (and others if desired), that runs those algorithms and generates a config.h file in the build directory that is then picked up during the main build of hpx.

the idea would be that a user downloads hpx and builds it and gets 'default' values for the block sizes etc., but if they run a helper application that is also compiled - with a name like - hpx_platform_optimization(.exe) then the algorithms will be run with some adaptive checking that detects the optimal (or at least a reasonable) value for certain params like block size, and dumps these out to the config file. the user can then recompile hpx and have some faith that it has been tweaksed a bit to work well on that machine.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 26, 2017

Member

This implementation does not allow for the input ranges to be overlapped either. What is your use-case?

Member

hkaiser commented on hpx/parallel/algorithms/partition.hpp in d853b1a Jul 26, 2017

This implementation does not allow for the input ranges to be overlapped either. What is your use-case?

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 26, 2017

Member

@hkaiser
My implementation is just general implementation of std::swap_ranges.
It does not allow overlapped ranges, too.
But, it can be useful with specific situations and objectives.

If dest is previous to first, the range [first, last) can be successfully moved to the range [dest, dest+distance(first, last)).
Otherwise, the range [first, last) cannot be moved to the range [dest, dest+distance(first, last)).

I want to use swap_ranges for first case (dest is previous to first).
But, C++ standard doesn't guarantee the implementation of std::swap_ranges.
So, I implement swap_ranges myself for guaranteeing the implementation.

Member

taeguk replied Jul 26, 2017

@hkaiser
My implementation is just general implementation of std::swap_ranges.
It does not allow overlapped ranges, too.
But, it can be useful with specific situations and objectives.

If dest is previous to first, the range [first, last) can be successfully moved to the range [dest, dest+distance(first, last)).
Otherwise, the range [first, last) cannot be moved to the range [dest, dest+distance(first, last)).

I want to use swap_ranges for first case (dest is previous to first).
But, C++ standard doesn't guarantee the implementation of std::swap_ranges.
So, I implement swap_ranges myself for guaranteeing the implementation.

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 26, 2017

Member

Even if dest < first but dest + size > first your algorithm will fail.

Member

hkaiser replied Jul 26, 2017

Even if dest < first but dest + size > first your algorithm will fail.

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 26, 2017

Member

@hkaiser No. See an example below.

image

Progress

1 1 2 2 2 3 3 // START
2 1 1 2 2 3 3 // after iteration 1
2 2 1 1 2 3 3 // after iteration 2
2 2 2 1 1 3 3 // after iteration 3
2 2 2 3 1 1 3 // after iteration 4
2 2 2 3 3 1 1 // after iteration 5, FINISH.

[first, last) can be successfully moved to [dest, dest + size) when dest < first.

Member

taeguk replied Jul 26, 2017

@hkaiser No. See an example below.

image

Progress

1 1 2 2 2 3 3 // START
2 1 1 2 2 3 3 // after iteration 1
2 2 1 1 2 3 3 // after iteration 2
2 2 2 1 1 3 3 // after iteration 3
2 2 2 3 1 1 3 // after iteration 4
2 2 2 3 3 1 1 // after iteration 5, FINISH.

[first, last) can be successfully moved to [dest, dest + size) when dest < first.

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 26, 2017

Member

Sure, except that you're not moving things in one direction, you're swapping the values.

Member

hkaiser replied Jul 26, 2017

Sure, except that you're not moving things in one direction, you're swapping the values.

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 26, 2017

Member

Ok, I see now what you mean. All is well - sorry for the noise.

Member

hkaiser replied Jul 26, 2017

Ok, I see now what you mean. All is well - sorry for the noise.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 26, 2017

Member

We should definitely take the traits into account. This is a means for the user to influence things. I wouldn't like for this to be not possible any more.

Member

hkaiser commented on aae9fb6 Jul 26, 2017

We should definitely take the traits into account. This is a means for the user to influence things. I wouldn't like for this to be not possible any more.

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 26, 2017

Member

@hkaiser But, the problem is that max_chunks which is returned by traits::maximal_number_of_chunks(...) is very big and it decreases the performance very very much.

In this case, chunk size of existed traits is inadaptable to use for block size.
As I think, block size should not be influenced by data count.

Member

taeguk replied Jul 26, 2017

@hkaiser But, the problem is that max_chunks which is returned by traits::maximal_number_of_chunks(...) is very big and it decreases the performance very very much.

In this case, chunk size of existed traits is inadaptable to use for block size.
As I think, block size should not be influenced by data count.

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 26, 2017

Member

In the end we need to have a means for the user to control the used chunk sizes. How do you suggest we do that?

Member

hkaiser replied Jul 26, 2017

In the end we need to have a means for the user to control the used chunk sizes. How do you suggest we do that?

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 26, 2017

Member

@hkaiser
One idea is adding new thing to executor parameters traits. (

struct executor_parameter_traits
)
For example, add the function get_block_size(...) into struct executor_parameter_traits { ... }.
The problem is that this new feature is only used in parallel::partition. (I think that it is not good because 'executor parameter traits' is generic interface.)
In fact, this new feature is used for parallel algorithms using parallel::partition, too.
And the user can be confused chunk size and block size.

Another idea is adding parameter to interface of parallel::partition.
It violates the interface of C++ standard, so it seems bad. But, it is meaningful because this is very simple and clear solution. But, because block size should be propagated when other parallel algorithms use parallel::partition, it is bad solution.

Member

taeguk replied Jul 26, 2017

@hkaiser
One idea is adding new thing to executor parameters traits. (

struct executor_parameter_traits
)
For example, add the function get_block_size(...) into struct executor_parameter_traits { ... }.
The problem is that this new feature is only used in parallel::partition. (I think that it is not good because 'executor parameter traits' is generic interface.)
In fact, this new feature is used for parallel algorithms using parallel::partition, too.
And the user can be confused chunk size and block size.

Another idea is adding parameter to interface of parallel::partition.
It violates the interface of C++ standard, so it seems bad. But, it is meaningful because this is very simple and clear solution. But, because block size should be propagated when other parallel algorithms use parallel::partition, it is bad solution.

This comment has been minimized.

Show comment
Hide comment
@mcopik

mcopik Jul 26, 2017

Contributor

@taeguk Is there at least one chunker which implements maximal_number_of_chunks? Looking at traits implementation, you should get 4 * cores. It should not be big.

The concept of "block size" is itself quite generic, many algorithms are "blocked", especially in linear algebra. I think it's perfectly fine to have both chunk and block size because they are intended to represent different concepts. As you noticed, chunk size may depend on the number of elements. For many problems, block size depends on cache size and CPU architecture.

Contributor

mcopik replied Jul 26, 2017

@taeguk Is there at least one chunker which implements maximal_number_of_chunks? Looking at traits implementation, you should get 4 * cores. It should not be big.

The concept of "block size" is itself quite generic, many algorithms are "blocked", especially in linear algebra. I think it's perfectly fine to have both chunk and block size because they are intended to represent different concepts. As you noticed, chunk size may depend on the number of elements. For many problems, block size depends on cache size and CPU architecture.

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 27, 2017

Member

@mcopik Sorry, I mean not max_chunks, but chunk size. Like you say, maximal_number_of_chunks returns small number. And so, chunk size is very big because chunk size is max_chunks / data count.
In fact, there is get_chunk_size(...) in executor parameter traits. If I use that with execution::par, I get (num_tasks + 4 * cores - 1) / (4 * cores) as chunk size. Anyway, using chunk size for block size in parallel::partition brings bad performance. The reason is that if we use big block size, remaining_blocks that are remained after sub-partitioning by each thread have very big size, so sequential codes should be performed long time.
As I think, the concept of "block size" is not generic. In parallel::partition, cache size and CPU architecture are not all considerations to determine block size. There is more thing. Like I said above, big block size makes remaining_blocks bigger. The max value of sum of sizes of remaining_blocks is block size * cores. The point is that remaining_blocks are processed sequentially. So, big block size can decrease parallelism. But, the very small block size is also bad because very small block size occurs bad cache utilization and excessive many block fetches. Therefore, it is important to find adequate block size. (If you want to know what remaining_blocks are, see

call(ExPolicy && policy, FwdIter first, FwdIter last,
Pred && pred, Proj && proj)
{
typedef util::detail::algorithm_result<
ExPolicy, FwdIter
> algorithm_result;
typedef typename
hpx::util::decay<ExPolicy>::type::executor_parameters_type
parameters_type;
try {
if (first == last)
return algorithm_result::get(std::move(first));
std::size_t const cores = execution::processing_units_count(
policy.executor(), policy.parameters());
// TODO: Find more better block size.
const std::size_t block_size = std::size_t(20000);
block_manager<FwdIter> block_manager(first, last, block_size);
std::vector<hpx::future<block<FwdIter>>>
remaining_block_futures(cores);
// Main parallel phrase: perform sub-partitioning in each thread.
for (std::size_t i = 0; i < remaining_block_futures.size(); ++i)
{
remaining_block_futures[i] = execution::async_execute(
policy.executor(),
[&block_manager, pred, proj]()
{
return partition_thread(block_manager, pred, proj);
});
}
// Wait sub-partitioning to be all finished.
hpx::wait_all(remaining_block_futures);
// Handle exceptions in parallel phrase.
std::list<std::exception_ptr> errors;
// TODO: Is it okay to use thing in util::detail:: ?
util::detail::handle_local_exceptions<ExPolicy>::call(
remaining_block_futures, errors);
std::vector<block<FwdIter>> remaining_blocks(
remaining_block_futures.size());
// Get remaining blocks from the result of sub-partitioning.
for (std::size_t i = 0; i < remaining_block_futures.size(); ++i)
remaining_blocks[i] = remaining_block_futures[i].get();
// Remove blocks that are empty.
FwdIter boundary = block_manager.boundary();
remaining_blocks.erase(std::remove_if(
std::begin(remaining_blocks), std::end(remaining_blocks),
[boundary](block<FwdIter> const& block) -> bool
{
return block.empty();
}), std::end(remaining_blocks));
// Sort remaining blocks to be listed from left to right.
std::sort(std::begin(remaining_blocks),
std::end(remaining_blocks));
// Collapse remaining blocks each other.
collapse_remaining_blocks(remaining_blocks, pred, proj);
// Merge remaining blocks into one block
// which is adjacent to boundary.
block<FwdIter> unpartitioned_block =
merge_remaining_blocks(remaining_blocks,
block_manager.boundary(), first);
// Perform sequetial partition to unpartitioned range.
FwdIter real_boundary = sequential_partition(
unpartitioned_block.first, unpartitioned_block.last,
pred, proj);
return algorithm_result::get(std::move(real_boundary));
}
)
Anyway, because of above reason, I think "block size of parallel::partition" is not generic concept.
And, I want to say that maybe generally 'block size' and 'chunk size' are used for same meaning. But, "block size of parallel::partition" is different from meanings of general 'block size' and 'chunk size' because "block size of parallel::partition" is determined with considering remaining_blocks. (maybe it can be confusing because of its naming.)
If we should add "block size of parallel::partition" into traits, maybe using obvious namings may be better like using 'partition_block_size', 'block_size_for_partition', 'partition_chunk_size', or 'chunk_size_for_partition'.

Member

taeguk replied Jul 27, 2017

@mcopik Sorry, I mean not max_chunks, but chunk size. Like you say, maximal_number_of_chunks returns small number. And so, chunk size is very big because chunk size is max_chunks / data count.
In fact, there is get_chunk_size(...) in executor parameter traits. If I use that with execution::par, I get (num_tasks + 4 * cores - 1) / (4 * cores) as chunk size. Anyway, using chunk size for block size in parallel::partition brings bad performance. The reason is that if we use big block size, remaining_blocks that are remained after sub-partitioning by each thread have very big size, so sequential codes should be performed long time.
As I think, the concept of "block size" is not generic. In parallel::partition, cache size and CPU architecture are not all considerations to determine block size. There is more thing. Like I said above, big block size makes remaining_blocks bigger. The max value of sum of sizes of remaining_blocks is block size * cores. The point is that remaining_blocks are processed sequentially. So, big block size can decrease parallelism. But, the very small block size is also bad because very small block size occurs bad cache utilization and excessive many block fetches. Therefore, it is important to find adequate block size. (If you want to know what remaining_blocks are, see

call(ExPolicy && policy, FwdIter first, FwdIter last,
Pred && pred, Proj && proj)
{
typedef util::detail::algorithm_result<
ExPolicy, FwdIter
> algorithm_result;
typedef typename
hpx::util::decay<ExPolicy>::type::executor_parameters_type
parameters_type;
try {
if (first == last)
return algorithm_result::get(std::move(first));
std::size_t const cores = execution::processing_units_count(
policy.executor(), policy.parameters());
// TODO: Find more better block size.
const std::size_t block_size = std::size_t(20000);
block_manager<FwdIter> block_manager(first, last, block_size);
std::vector<hpx::future<block<FwdIter>>>
remaining_block_futures(cores);
// Main parallel phrase: perform sub-partitioning in each thread.
for (std::size_t i = 0; i < remaining_block_futures.size(); ++i)
{
remaining_block_futures[i] = execution::async_execute(
policy.executor(),
[&block_manager, pred, proj]()
{
return partition_thread(block_manager, pred, proj);
});
}
// Wait sub-partitioning to be all finished.
hpx::wait_all(remaining_block_futures);
// Handle exceptions in parallel phrase.
std::list<std::exception_ptr> errors;
// TODO: Is it okay to use thing in util::detail:: ?
util::detail::handle_local_exceptions<ExPolicy>::call(
remaining_block_futures, errors);
std::vector<block<FwdIter>> remaining_blocks(
remaining_block_futures.size());
// Get remaining blocks from the result of sub-partitioning.
for (std::size_t i = 0; i < remaining_block_futures.size(); ++i)
remaining_blocks[i] = remaining_block_futures[i].get();
// Remove blocks that are empty.
FwdIter boundary = block_manager.boundary();
remaining_blocks.erase(std::remove_if(
std::begin(remaining_blocks), std::end(remaining_blocks),
[boundary](block<FwdIter> const& block) -> bool
{
return block.empty();
}), std::end(remaining_blocks));
// Sort remaining blocks to be listed from left to right.
std::sort(std::begin(remaining_blocks),
std::end(remaining_blocks));
// Collapse remaining blocks each other.
collapse_remaining_blocks(remaining_blocks, pred, proj);
// Merge remaining blocks into one block
// which is adjacent to boundary.
block<FwdIter> unpartitioned_block =
merge_remaining_blocks(remaining_blocks,
block_manager.boundary(), first);
// Perform sequetial partition to unpartitioned range.
FwdIter real_boundary = sequential_partition(
unpartitioned_block.first, unpartitioned_block.last,
pred, proj);
return algorithm_result::get(std::move(real_boundary));
}
)
Anyway, because of above reason, I think "block size of parallel::partition" is not generic concept.
And, I want to say that maybe generally 'block size' and 'chunk size' are used for same meaning. But, "block size of parallel::partition" is different from meanings of general 'block size' and 'chunk size' because "block size of parallel::partition" is determined with considering remaining_blocks. (maybe it can be confusing because of its naming.)
If we should add "block size of parallel::partition" into traits, maybe using obvious namings may be better like using 'partition_block_size', 'block_size_for_partition', 'partition_chunk_size', or 'chunk_size_for_partition'.

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Jul 27, 2017

Member

I find it to be very confusing to introduce both chunk_size and block_size From the users perspective these look very similar, even more s as (to the best of my knowledge) none of the algorithms would need both at the same time.

Member

hkaiser replied Jul 27, 2017

I find it to be very confusing to introduce both chunk_size and block_size From the users perspective these look very similar, even more s as (to the best of my knowledge) none of the algorithms would need both at the same time.

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 26, 2017

Member

@biddisco I think that your idea is good. But, if we decide to do those works, those works should be progressed in individual PR.

Member

taeguk commented Jul 26, 2017

@biddisco I think that your idea is good. But, if we decide to do those works, those works should be progressed in individual PR.

@biddisco

This comment has been minimized.

Show comment
Hide comment
@biddisco

biddisco Jul 26, 2017

Contributor

@taeguk correct: Any block/chunk size optimization would be orthogonal to the main algorithm development and contained in its own branch (preferably once the algorithms are all implemented and reliably tested)

Contributor

biddisco commented Jul 26, 2017

@taeguk correct: Any block/chunk size optimization would be orthogonal to the main algorithm development and contained in its own branch (preferably once the algorithms are all implemented and reliably tested)

@taeguk taeguk changed the title from [WIP] Implement parallel::partition. to Implement parallel::partition. Jul 27, 2017

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Jul 27, 2017

Member

@hkaiser I'm all finished except the thing which enables the user to control the block size.
There is an one of benchmarks below.
image

Member

taeguk commented Jul 27, 2017

@hkaiser I'm all finished except the thing which enables the user to control the block size.
There is an one of benchmarks below.
image

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Aug 5, 2017

Member

@taeguk What should we do with this? Any ideas how to re-introduce controlling the block-size?

Member

hkaiser commented Aug 5, 2017

@taeguk What should we do with this? Any ideas how to re-introduce controlling the block-size?

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Aug 7, 2017

Member

@hkaiser Unfortunatelly, I waited @mcopik 's answer to my comment above.
In conclusion, I don't know what is better. I need more experiences and understandings for that.
I want to leave the issue about block-size for now.
I'll resolve that issue after I implement some parallel algorithms.

I suggest merging this PR for now, and resolving that issue later.
I will take responsibility for resolving the issue of block size even though GSoC is over.

Member

taeguk commented Aug 7, 2017

@hkaiser Unfortunatelly, I waited @mcopik 's answer to my comment above.
In conclusion, I don't know what is better. I need more experiences and understandings for that.
I want to leave the issue about block-size for now.
I'll resolve that issue after I implement some parallel algorithms.

I suggest merging this PR for now, and resolving that issue later.
I will take responsibility for resolving the issue of block size even though GSoC is over.

@mcopik

This comment has been minimized.

Show comment
Hide comment
@mcopik

mcopik Aug 7, 2017

Contributor

@taeguk The value get_chunk_size depends on the actual chunk size provided by the user, doesn't it? Instead of using the static chunker with its specific method of computing work size, the user could provide the value of block size by using a dynamic_chunk_size. You only need to mention in the documentation that this is the default way of controlling algorithm's performance.

@hkaiser If we intend to use chunk size as a generic interface for controlling all parallel algorithms, then the documentation needs an update. Docs should clearly specify which chunk types are expected and supported for a given algorithm. Furthermore, parameters docs have to be updated because right now the chunk_size is described only as a way of controlling the parallelization of loop iterations.

We may even give some thought to renaming chunker types. In this case, it makes sense to use a dynamic chunk size, since it's the only one which allows the user to pass a block size directly to the algorithm. However, the name doesn't really describe the purpose. The block size here is not dynamic, it's completely static.

Contributor

mcopik commented Aug 7, 2017

@taeguk The value get_chunk_size depends on the actual chunk size provided by the user, doesn't it? Instead of using the static chunker with its specific method of computing work size, the user could provide the value of block size by using a dynamic_chunk_size. You only need to mention in the documentation that this is the default way of controlling algorithm's performance.

@hkaiser If we intend to use chunk size as a generic interface for controlling all parallel algorithms, then the documentation needs an update. Docs should clearly specify which chunk types are expected and supported for a given algorithm. Furthermore, parameters docs have to be updated because right now the chunk_size is described only as a way of controlling the parallelization of loop iterations.

We may even give some thought to renaming chunker types. In this case, it makes sense to use a dynamic chunk size, since it's the only one which allows the user to pass a block size directly to the algorithm. However, the name doesn't really describe the purpose. The block size here is not dynamic, it's completely static.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Aug 7, 2017

Member

I want to leave the issue about block-size for now.

Ok, fair enough. Could you please create a ticket reminding us of this?

Member

hkaiser commented Aug 7, 2017

I want to leave the issue about block-size for now.

Ok, fair enough. Could you please create a ticket reminding us of this?

@hkaiser

hkaiser approved these changes Aug 7, 2017

LGTM, thanks!

@hkaiser hkaiser merged commit c721a35 into STEllAR-GROUP:master Aug 8, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@hkaiser hkaiser moved this from Work in progress to Merged to master in Standard Algorithms Aug 8, 2017

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Aug 9, 2017

Member

@taeguk ping? Will you be able to look into the test failures on master? Please fix those problems as soon as possible to allow for all tests to pass.

Member

hkaiser commented Aug 9, 2017

@taeguk ping? Will you be able to look into the test failures on master? Please fix those problems as soon as possible to allow for all tests to pass.

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Aug 9, 2017

Member

@hkaiser Very sorry. I'm now fixing. I'll send PR within 2 hours.

Member

taeguk commented Aug 9, 2017

@hkaiser Very sorry. I'm now fixing. I'll send PR within 2 hours.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Aug 12, 2017

Member

@taeguk there is another test-error caused by the partition_range test which only shows up when using clang. Please see here for the details: http://rostam.cct.lsu.edu/builders/hpx_clang_3_9_1_boost_1_61_centos_x86_64_debug/builds/140/steps/build_unit_tests/logs/stdio

Would you be able to fix this, please?

Member

hkaiser commented Aug 12, 2017

@taeguk there is another test-error caused by the partition_range test which only shows up when using clang. Please see here for the details: http://rostam.cct.lsu.edu/builders/hpx_clang_3_9_1_boost_1_61_centos_x86_64_debug/builds/140/steps/build_unit_tests/logs/stdio

Would you be able to fix this, please?

@taeguk

This comment has been minimized.

Show comment
Hide comment
@taeguk

taeguk Aug 12, 2017

Member

@hkaiser Oh, very very sorry to my mistake. That is an error which is caused by omission of 'const' keyword. It is same to past compile error.

Member

taeguk commented Aug 12, 2017

@hkaiser Oh, very very sorry to my mistake. That is an error which is caused by omission of 'const' keyword. It is same to past compile error.

@hkaiser

This comment has been minimized.

Show comment
Hide comment
@hkaiser

hkaiser Aug 12, 2017

Member

@taeguk no worries, that's what we have the tests for! Your work is absolutely appreciated!

Member

hkaiser commented Aug 12, 2017

@taeguk no worries, that's what we have the tests for! Your work is absolutely appreciated!

@hkaiser hkaiser referenced this pull request Oct 29, 2017

Open

Adapt all parallel algorithms to Ranges TS #1668

21 of 38 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment