Fix compilation with dynamic bitset for CPU masks #3566

msimberg · 2018-11-27T10:55:51Z

This is related to #3482. It fixes compilation with a dynamic bitset for the CPU masks. The default is unchanged.

~~It changes the default cpu mask to dynamic_bitset. If this doesn't have a performance impact I think we should remove the other options and always use a dynamic bitset.~~

As far as I can tell there are two places where this might have a performance impact:

local_queue_scheduler: when numa_sensitive != 0 this scheduler operates on the cpu mask in get_next_thread. However, this could be done the same way as local_priority_queue_scheduler which does all numa sensitive work in on_start_thread only.
shared_priority_queue_scheduler: this one uses fixed size arrays with HPX_HAVE_MAX_CPU_COUNT and HPX_HAVE_MAX_NUMA_DOMAIN_COUNT. I'd expect vectors to work just as well since we're not dynamically allocating them in a tight loop, but I haven't checked. Update: The shared priority scheduler doesn't work with this option and gives warnings to the user.

hkaiser · 2018-11-27T12:49:49Z

On Knight's Landing architectures (might not be representative, but anyways...) we have seen a massive speedup in the scheduler from letting the compiler vectorize (using AVX512) the statically sized bitmasks. We should verify that dynamic_bitset will be properly vectorized (i.e. all cores up to 512 are handled by a single operation).

msimberg · 2018-11-27T14:32:07Z

That's a good point. My hope is it wouldn't even need to be vectorized for good performance if the masks are never used in tight loops. Is that an unreasonable restriction?

hkaiser · 2018-11-27T15:39:39Z

@msimberg I fully agree that we should do some thorough performance analysis before removing the static stuff.

msimberg · 2018-12-07T15:13:30Z

I did a quick test and it seems like simple things like bitwise and do vectorize nicely, while something like any doesn't (not sure how much sense that makes). As I said earlier my intention would be to remove all uses of cpu masks in critical sections in the local queue scheduler. That leaves the question: @hkaiser in what kind of hot loops did you or would you like to use cpu masks? If there's a clear use case I'd go ahead and benchmark, otherwise I won't bother.

hkaiser · 2018-12-07T15:58:15Z

I'd like to test on KNL (or any other AVX512 platform) before removing the static bitmaps. In general however, I agree with your assessment.

msimberg · 2018-12-07T16:01:29Z

Sure, completely understand. I would also not completely remove them, just change the default. (Edit: I realized now that I first said I would remove the others if there's no slowdown, but there's no harm in keeping them as long as they're still tested.)

msimberg · 2019-02-20T09:49:50Z

Is there interest in having this in? If not I'd like to clean it up so that the dynamic bitset option at least works (but I wouldn't make it the default), or completely remove it if it's not going to be used or tested by anyone in any case. Don't want to leave this hanging around.

hkaiser · 2019-02-20T11:11:48Z

Is there interest in having this in? If not I'd like to clean it up so that the dynamic bitset option at least works (but I wouldn't make it the default), or completely remove it if it's not going to be used or tested by anyone in any case. Don't want to leave this hanging around.

Let's get this in. I'd feel better if we left the other options in place for now, however.

msimberg · 2019-02-20T12:29:03Z

All right, I'll get it cleaned up.

msimberg · 2019-02-21T09:46:04Z

I changed one of the pycicle builders to use the dynamic bitset.

msimberg · 2019-03-14T13:36:56Z

This should be ready to go now.

hkaiser

Let's go ahead with this. Thanks!

msimberg added category: scheduler category: topology labels Nov 27, 2018

msimberg requested a review from biddisco November 27, 2018 10:55

msimberg force-pushed the dynamic-bitset-default branch from 1b8bc6c to a729c30 Compare November 27, 2018 15:19

msimberg force-pushed the dynamic-bitset-default branch 6 times, most recently from d472197 to ea11416 Compare November 30, 2018 14:40

msimberg changed the title ~~WIP: Use dynamic bitset by default for cpu mask~~ Use dynamic bitset by default for cpu mask Dec 11, 2018

msimberg changed the title ~~Use dynamic bitset by default for cpu mask~~ WIP: Use dynamic bitset by default for cpu mask Feb 20, 2019

msimberg force-pushed the dynamic-bitset-default branch from ea11416 to 0973222 Compare February 21, 2019 09:38

msimberg mentioned this pull request Feb 22, 2019

Parallel executor latch #3659

Merged

msimberg force-pushed the dynamic-bitset-default branch 3 times, most recently from eff5e51 to 6f41a08 Compare February 27, 2019 09:54

Fix use of dynamic bitset for CPU masks

e96bb99

msimberg force-pushed the dynamic-bitset-default branch from 6f41a08 to e96bb99 Compare March 11, 2019 15:58

msimberg changed the title ~~WIP: Use dynamic bitset by default for cpu mask~~ Fix compilation with dynamic bitset for CPU masks Mar 12, 2019

msimberg added this to the 1.3.0 milestone Mar 14, 2019

hkaiser approved these changes Mar 14, 2019

View reviewed changes

msimberg merged commit 83c1a06 into STEllAR-GROUP:master Mar 15, 2019

msimberg deleted the dynamic-bitset-default branch March 15, 2019 12:49

msimberg mentioned this pull request Apr 1, 2019

Rename PAGE_SIZE to PAGE_SIZE_ because AppleClang #3759

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compilation with dynamic bitset for CPU masks #3566

Fix compilation with dynamic bitset for CPU masks #3566

msimberg commented Nov 27, 2018 •

edited

hkaiser commented Nov 27, 2018

msimberg commented Nov 27, 2018

hkaiser commented Nov 27, 2018

msimberg commented Dec 7, 2018

hkaiser commented Dec 7, 2018

msimberg commented Dec 7, 2018 •

edited

msimberg commented Feb 20, 2019

hkaiser commented Feb 20, 2019

msimberg commented Feb 20, 2019

msimberg commented Feb 21, 2019

msimberg commented Mar 14, 2019

hkaiser left a comment

Fix compilation with dynamic bitset for CPU masks #3566

Fix compilation with dynamic bitset for CPU masks #3566

Conversation

msimberg commented Nov 27, 2018 • edited

hkaiser commented Nov 27, 2018

msimberg commented Nov 27, 2018

hkaiser commented Nov 27, 2018

msimberg commented Dec 7, 2018

hkaiser commented Dec 7, 2018

msimberg commented Dec 7, 2018 • edited

msimberg commented Feb 20, 2019

hkaiser commented Feb 20, 2019

msimberg commented Feb 20, 2019

msimberg commented Feb 21, 2019

msimberg commented Mar 14, 2019

hkaiser left a comment

Choose a reason for hiding this comment

msimberg commented Nov 27, 2018 •

edited

msimberg commented Dec 7, 2018 •

edited