Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect pool usage masks setup in RP/thread manager #2878

Merged
merged 2 commits into from Sep 20, 2017

Conversation

@biddisco
Copy link
Contributor

commented Aug 31, 2017

Just a minor tweak to get the pool_usage_mask set correctly

@hkaiser
Copy link
Member

left a comment

Please add a test verifying the problem is fixed.

@@ -95,7 +95,7 @@ namespace hpx { namespace threads { namespace detail
auto const& rp = resource::get_partitioner();
for (std::size_t i = 0; i != pool_threads; ++i)
{
bit_or(used_processing_units_, rp.get_pu_mask(threads_offset + i));
used_processing_units_ |= rp.get_pu_mask(threads_offset + i);

This comment has been minimized.

Copy link
@hkaiser

hkaiser Aug 31, 2017

Member

This may not work for all configurations. I think the correct fix would be:

used_processing_units_ = 
    bit_or(used_processing_units_, rp.get_pu_mask(threads_offset + i));

This comment has been minimized.

Copy link
@biddisco

biddisco Sep 4, 2017

Author Contributor

No.

bit_or(a,b)

returns a boolean true or false depending on the two masks and so it only sets the lowest bit. We actually want the true bitwise or operation and the submitted patch is correct. if CPU's <= 64 then a uint64_t is bitwise or'ed (ok), if CPUs>64, then a bitset<blah> is or'ed.
I believe my patch is correct.

This comment has been minimized.

Copy link
@hkaiser

hkaiser Sep 4, 2017

Member

Ok, thanks for pointing this out. Still. there is a third option, namely if we use boost::dynamic_bitset. I'm not sure if operator|=() is provided in that case. IIRC, it provides only operator|().

This comment has been minimized.

Copy link
@biddisco

biddisco Sep 5, 2017

Author Contributor

Sorry, I was mistaken about the small CPU count version.
boost::dynamic_bitset has the != operator and it has the signature
dynamic_bitset& operator|=(const dynamic_bitset& b);
which is exactly what we want in this case.

This comment has been minimized.

Copy link
@hkaiser

hkaiser Sep 5, 2017

Member

Ok. Thanks for clarifying.

This comment has been minimized.

Copy link
@biddisco

biddisco Sep 5, 2017

Author Contributor

so we can merge then yes?

This comment has been minimized.

Copy link
@hkaiser

hkaiser Sep 5, 2017

Member

Sure, could we still add that test, please?

I believe the patch is good.

@sithhell

This comment has been minimized.

Copy link
Member

commented Sep 11, 2017

#2891 Does not specifically test for the used processing unit, however it uses the mask to test against the currently used processing units in the RP. Should we mark this as duplicate and consider the added test case as good enough to check for correctness of the mask?

@hkaiser

This comment has been minimized.

Copy link
Member

commented Sep 11, 2017

#2891 Does not specifically test for the used processing unit, however it uses the mask to test against the currently used processing units in the RP. Should we mark this as duplicate and consider the added test case as good enough to check for correctness of the mask?

How much effort would it be to add an explicit check for this to this test?

@biddisco biddisco added this to Work in Progress in resource partitioner/manager work Sep 12, 2017

@sithhell
Copy link
Member

left a comment

Unit test has been added. This looks good to me.

int main(int argc, char* argv[])
{
std::vector<std::string> cfg = {
"hpx.os_threads=4"

This comment has been minimized.

Copy link
@biddisco

biddisco Sep 15, 2017

Author Contributor

Are you sure this is a good idea? The pu mask will be 1 for each pu used. If you run this test on a single core machine, then asking for 4 threads will set the pu flag, but only once and not 4 times. So bits in pu mask will not match num threads.

This comment has been minimized.

Copy link
@sithhell

sithhell Sep 15, 2017

Member

Ok, the only thing that will fail in this scenario will the test on line 25 then... I guess having at least 4 PUs available is a prerequisite of this test (this is also true for the other RP test)

This comment has been minimized.

Copy link
@biddisco

biddisco Sep 17, 2017

Author Contributor

I think that's sensible. If the machine being tested on has < N cores, - just skip the test and return pass. There is no point testing unless N is useful.

@hkaiser
Copy link
Member

left a comment

LGTM, thanks!

@hkaiser hkaiser merged commit 2e05958 into master Sep 20, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@hkaiser hkaiser deleted the fix_rp_again branch Sep 20, 2017

@biddisco biddisco moved this from Work in Progress to completed in resource partitioner/manager work Oct 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.