Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing performance regression introduced earlier #4111

Merged
merged 2 commits into from Sep 30, 2019

Conversation

@hkaiser
Copy link
Member

commented Sep 28, 2019

@tianyi93 Could you please verify whether this fixes your performance issue?

@biddisco

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2019

The diff here is quite large and mostly stuff moved around, with some changes for 128 bit atomic presence - can you summarize please what is the difference that causes a speed improvement - thanks.

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2019

The diff here is quite large and mostly stuff moved around, with some changes for 128 bit atomic presence - can you summarize please what is the difference that causes a speed improvement - thanks.

This actual change is very small. The gist is we changed everything to lockfree::queue (from lockfree::deque), which has caused the issue. See here for the important patch: https://github.com/STEllAR-GROUP/hpx/pull/4111/files#diff-6dece8d84d34dc5dc3d0df6e67f26f7dR37-R69.

Everything else are changes caused by clang-format.

hkaiser added 2 commits Sep 28, 2019
- flyby typedef -> using
@hkaiser hkaiser force-pushed the fixing_fifo_performance branch from 357a8bc to 20d13ab Sep 28, 2019
@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2019

The diff here is quite large and mostly stuff moved around, with some changes for 128 bit atomic presence - can you summarize please what is the difference that causes a speed improvement - thanks.

This actual change is very small. The gist is we changed everything to lockfree::queue (from lockfree::deque), which has caused the issue. See here for the important patch: https://github.com/STEllAR-GROUP/hpx/pull/4111/files#diff-6dece8d84d34dc5dc3d0df6e67f26f7dR37-R69.

Everything else are changes caused by clang-format.

Sorry for this. I went back and redid the patch separating the actual fix from the reformatting.

Copy link
Member

left a comment

LGTM.

@tianyi93

This comment has been minimized.

Copy link
Member

commented Sep 28, 2019

@hkaiser I did a small test showing that the performance issue has been fixed. I will do the whole benchmark run. Thanks!

@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Sep 28, 2019

I did a small test showing that the performance issue has been fixed.

@tianyi93 Thanks! I'll merge this as soon as the tests have cycled.

@msimberg

This comment has been minimized.

Copy link
Contributor

commented Sep 30, 2019

Have I understood correctly that boost::lockfree::deque is actually faster than boost::lockfree::queue as a FIFO queue? I admit I did just assume queue would be better than deque.

@hkaiser hkaiser merged commit 2760227 into master Sep 30, 2019
10 checks passed
10 checks passed
Codacy/PR Quality Review Up to standards. A positive pull request.
Details
build-and-test Workflow: build-and-test
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
pycicle daint-clang-oldest Build errors 0
Details
pycicle daint-clang-oldest Config errors 0
Details
pycicle daint-clang-oldest Test errors 0
Details
pycicle daint-gcc-oldest Build errors 0
Details
pycicle daint-gcc-oldest Config errors 0
Details
pycicle daint-gcc-oldest Test errors 0
Details
@hkaiser hkaiser deleted the fixing_fifo_performance branch Sep 30, 2019
@hkaiser

This comment has been minimized.

Copy link
Member Author

commented Sep 30, 2019

Have I understood correctly that boost::lockfree::deque is actually faster than boost::lockfree::queue as a FIFO queue? I admit I did just assume queue would be better than deque.

@msimberg yes, that seems to be the case. I was not expecting this either. We ran into the performance regression while testing hpxMP. There we mostly use hpx::apply to schedule threads which are then joined using hpx::lcos::local::latch. Both facilities are very lightweight, thus the effect of the used scheduler was very pronounced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.