Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel::partition. #2778

Merged
merged 15 commits into from Aug 8, 2017
Merged

Conversation

taeguk
Copy link
Member

@taeguk taeguk commented Jul 21, 2017

This is related to #1141

Check Box

  • Implementation of parallel::partition.
  • Unit tests.
  • Benchmark codes.
  • Determine block size for sub-partitioning.
  • Refactor parallel::partition.
  • Fix inspection errors and compile errors in gcc.
  • Adapt to Ranges TS.
  • Add more comments.

- [ ] Enable that the user can control block size. (see #2803)

Issues

  1. How we determine 'block size' which is used for sub-partitioning in main parallel phrase. => For now, I determine block size to 20000 through my experiments and the paper I referred. But, it may be not perfect number for block size. In future, we may find better block size for get better performance.

Notes for future

  1. Why is benchmark with random access iterator tag somewhat slower than benchmarks with bidirectional or forward tag?
  2. The block_manager for random access iterator tag is not much useful.

@hkaiser hkaiser added this to the 1.1.0 milestone Jul 21, 2017
@hkaiser hkaiser mentioned this pull request Jul 21, 2017
47 tasks
@hkaiser hkaiser added this to Work in progress in Standard Algorithms Jul 21, 2017
@biddisco
Copy link
Contributor

When working on various scan based algorithms (and also parallel::sort), I came to the conclusion that it was a difficult problem to decide what the 'best' chunk/block size to use would be without first running the algorithm to benchmark it with a few test sizes.

My own view is that we should have a cmake integrated set of tests for certain of the parallel algorithms (and others if desired), that runs those algorithms and generates a config.h file in the build directory that is then picked up during the main build of hpx.

the idea would be that a user downloads hpx and builds it and gets 'default' values for the block sizes etc., but if they run a helper application that is also compiled - with a name like - hpx_platform_optimization(.exe) then the algorithms will be run with some adaptive checking that detects the optimal (or at least a reasonable) value for certain params like block size, and dumps these out to the config file. the user can then recompile hpx and have some faith that it has been tweaksed a bit to work well on that machine.

@taeguk
Copy link
Member Author

taeguk commented Jul 26, 2017

@biddisco I think that your idea is good. But, if we decide to do those works, those works should be progressed in individual PR.

@biddisco
Copy link
Contributor

@taeguk correct: Any block/chunk size optimization would be orthogonal to the main algorithm development and contained in its own branch (preferably once the algorithms are all implemented and reliably tested)

@taeguk taeguk changed the title [WIP] Implement parallel::partition. Implement parallel::partition. Jul 27, 2017
@taeguk
Copy link
Member Author

taeguk commented Jul 27, 2017

@hkaiser I'm all finished except the thing which enables the user to control the block size.
There is an one of benchmarks below.
image

@hkaiser
Copy link
Member

hkaiser commented Aug 5, 2017

@taeguk What should we do with this? Any ideas how to re-introduce controlling the block-size?

@taeguk
Copy link
Member Author

taeguk commented Aug 7, 2017

@hkaiser Unfortunatelly, I waited @mcopik 's answer to my comment above.
In conclusion, I don't know what is better. I need more experiences and understandings for that.
I want to leave the issue about block-size for now.
I'll resolve that issue after I implement some parallel algorithms.

I suggest merging this PR for now, and resolving that issue later.
I will take responsibility for resolving the issue of block size even though GSoC is over.

@mcopik
Copy link
Contributor

mcopik commented Aug 7, 2017

@taeguk The value get_chunk_size depends on the actual chunk size provided by the user, doesn't it? Instead of using the static chunker with its specific method of computing work size, the user could provide the value of block size by using a dynamic_chunk_size. You only need to mention in the documentation that this is the default way of controlling algorithm's performance.

@hkaiser If we intend to use chunk size as a generic interface for controlling all parallel algorithms, then the documentation needs an update. Docs should clearly specify which chunk types are expected and supported for a given algorithm. Furthermore, parameters docs have to be updated because right now the chunk_size is described only as a way of controlling the parallelization of loop iterations.

We may even give some thought to renaming chunker types. In this case, it makes sense to use a dynamic chunk size, since it's the only one which allows the user to pass a block size directly to the algorithm. However, the name doesn't really describe the purpose. The block size here is not dynamic, it's completely static.

@hkaiser
Copy link
Member

hkaiser commented Aug 7, 2017

I want to leave the issue about block-size for now.

Ok, fair enough. Could you please create a ticket reminding us of this?

Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@hkaiser hkaiser merged commit c721a35 into STEllAR-GROUP:master Aug 8, 2017
@hkaiser hkaiser moved this from Work in progress to Merged to master in Standard Algorithms Aug 8, 2017
@hkaiser
Copy link
Member

hkaiser commented Aug 8, 2017

@hkaiser
Copy link
Member

hkaiser commented Aug 9, 2017

@taeguk ping? Will you be able to look into the test failures on master? Please fix those problems as soon as possible to allow for all tests to pass.

@taeguk
Copy link
Member Author

taeguk commented Aug 9, 2017

@hkaiser Very sorry. I'm now fixing. I'll send PR within 2 hours.

@hkaiser
Copy link
Member

hkaiser commented Aug 12, 2017

@taeguk there is another test-error caused by the partition_range test which only shows up when using clang. Please see here for the details: http://rostam.cct.lsu.edu/builders/hpx_clang_3_9_1_boost_1_61_centos_x86_64_debug/builds/140/steps/build_unit_tests/logs/stdio

Would you be able to fix this, please?

@taeguk
Copy link
Member Author

taeguk commented Aug 12, 2017

@hkaiser Oh, very very sorry to my mistake. That is an error which is caused by omission of 'const' keyword. It is same to past compile error.

@hkaiser
Copy link
Member

hkaiser commented Aug 12, 2017

@taeguk no worries, that's what we have the tests for! Your work is absolutely appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Standard Algorithms
  
Merged to master
Development

Successfully merging this pull request may close these issues.

None yet

4 participants