Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing std::rand() to a better inbuilt PRNG generator. #3204

Merged
merged 7 commits into from Mar 13, 2018

Conversation

Anushi1998
Copy link
Contributor

Fixes #2954

Since std::rand is now deprecated in C++11, I have tried changing it to a better inbuilt random generator like std::mt19937 and std::random_device.

It requires changing 210 files and right now I have fixed only some 30 files to get some reviews and/or suggestions :)

@Anushi1998 Anushi1998 closed this Feb 28, 2018
@hkaiser
Copy link
Member

hkaiser commented Feb 28, 2018

@Anushi1998 Why did you close this?

@Anushi1998
Copy link
Contributor Author

@hkaiser I have squashed all commits now.I will add more to it asap

@hkaiser
Copy link
Member

hkaiser commented Mar 1, 2018

@Anushi1998 could you please not add more to this PR? It's already very big and adding to it makes it even more difficult to properly review things. Please create a new PR for additional changes.

.gitignore Outdated
@@ -12,3 +12,5 @@
/.dockerignore
.ycm_extra_conf.py
apex
config/
runtime/parcelset/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have these directories been added to the .gitignore file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has been accidentally committed.I will soon remove it

@@ -98,8 +99,10 @@ namespace hpx { namespace util
/// constructors and destructor
jenkins_hash() : seed_(0) {}

std::mt19937 gen{(unsigned int)std::random_device{}()};
Copy link
Member

@hkaiser hkaiser Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use direct initializations of member variables for now. We have not decided to allow this in HPX yet. Also, I don't think you want to invoke std::random_device for each instance of the jenkins_hash type. Creating and using std::random_device is a potentially very expensive operation.

@@ -791,9 +791,10 @@ void big_boot_barrier::wait_hosted(
, unassigned
, suggested_prefix);

std::srand(static_cast<unsigned>(util::high_resolution_clock::now()));
std::mt19937 gen(std::random_device{}());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the result returned from std::random_device here be sufficient as a random value for the first parcel-id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change it :)

@@ -109,7 +109,7 @@ void run_benchmark(std::size_t vector_size, int test_count, IteratorTag)

std::cout << "* Running Benchmark..." << std::endl;

int rand_base = std::rand();
int rand_base = std::random_device{}();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One call to std::random_device should be sufficient. As said, its use is potentially very expensive.

@@ -185,7 +185,7 @@ int hpx_main(int argc, char **argv)
HPX_TEST(equality);

// Do moduler operation for avoiding overflow in ramdom_fill. (#2954)
int rand_base = std::rand() % 10000;
int rand_base = std::random_device{}() % 10000;
Copy link
Member

@hkaiser hkaiser Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't call std::random_device more than once per application. Also, please use a distribution instead of modulo arithmetic for the generated numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I then have to initialize seed globally using std::random_device.

@Anushi1998
Copy link
Contributor Author

@hkaiser Sure, I will create more PR than to add on these.But please note that there will be many PR since there are 210 files to be edited.

@hkaiser
Copy link
Member

hkaiser commented Mar 1, 2018

@hkaiser Sure, I will create more PR than to add on these.

Thanks!

But please note that there will be many PR since there are 210 files to be edited.

No worries, there is a sufficient number of electrons available to accommodate for a couple more PRs... ;-)

Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

Copy link
Member

@taeguk taeguk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many places, std::random_device{}() is used for constructing std::mt19937.
std::random_device{}() generates different number sequences. So, we can't reproduce the problem in test if std::mt19937 is constructed with std::random_device{}().
So, std::mt19937 must be constructed with explicit seed like 'seed' of program option.

And this PR cannot fix #2954 because UB still can be occured:
https://github.com/Anushi1998/hpx/blob/3a1e3d8be803740487487f1a35a29d96740855dc/tests/regressions/parallel/stable_merge_2964.cpp#L31

@@ -34,7 +34,7 @@ const int random_fill_range = (std::min)(100000, RAND_MAX);
struct random_fill
{
random_fill()
: gen(std::rand()),
: gen(std::random_device{}()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.
The gen must be initialized with the 'seed' of program option, not with std::random_device{}().

And there are many same problems in other codes.

struct random_fill
{
random_fill(std::size_t random_range)
: gen(std::rand()),
: gen(std::random_device{}()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this file still uses std::srand() below... Thanks for noticing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hkaiser There is no use to srand because there is no call to rand but I will remove it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -25,10 +25,13 @@
#include <vector>

///////////////////////////////////////////////////////////////////////////////

std::mt19937 _rand(std::random_device{}());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

@@ -25,10 +25,13 @@
#include <vector>

///////////////////////////////////////////////////////////////////////////////

std::mt19937 _rand(std::random_device{}());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

@@ -28,10 +28,13 @@
#include "utils.hpp"

///////////////////////////////////////////////////////////////////////////////

std::mt19937 _rand(std::random_device{}());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

@@ -27,10 +27,15 @@
#include "utils.hpp"

///////////////////////////////////////////////////////////////////////////////
unsigned int seed = (unsigned int)std::random_device{}();
std::mt19937 _rand(seed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_rand must be initialized with 'seed' of program option, not with the output of std::random_device{}().

@@ -31,7 +31,7 @@
struct random_fill
{
random_fill(std::size_t random_range)
: gen(std::rand()),
: gen(std::random_device{}()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

@@ -31,7 +31,7 @@
struct random_fill
{
random_fill(std::size_t random_range)
: gen(std::rand()),
: gen(std::random_device{}()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

@@ -31,7 +31,7 @@
struct random_fill
{
random_fill(std::size_t random_range)
: gen(std::rand()),
: gen(std::random_device{}()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

@@ -29,7 +29,7 @@
struct random_fill
{
random_fill(std::size_t random_range)
: gen(std::rand()),
: gen(std::random_device{}()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem that gen generates different numbers even if same 'seed' is used in program option.

@Anushi1998
Copy link
Contributor Author

@taeguk Please can you elaborate more on

So, we can't reproduce the problem in test if std::mt19937 is constructed with std::random_device{}().So, std::mt19937 must be constructed with explicit seed like 'seed' of program option.

Meanwhile, I have tried to look at some better seeding option
randutils::auto_seed_128{}.base() or randutils::auto_seed_256{}.base()
and also
boost::random::mt19937 gen{boost::random::random_device{}};

And this PR cannot fix #2954 because UB still can be occured:

As far as I think the problem of UB is due to integer overflow.Can't we avoid it by restricting the random values to a narrower range?Please correct me if I am wrong.

@hkaiser
Copy link
Member

hkaiser commented Mar 3, 2018

@Anushi1998: the only thing left is to fix the issues reported by inspect: https://9684-4455628-gh.circle-artifacts.com/0/tmp/circle-artifacts.T3OlciG/hpx_inspect_report.html

@taeguk
Copy link
Member

taeguk commented Mar 3, 2018

@Anushi1998 The unit or benchmark tests are performed with random numbers. And we must can control the random numbers, that is, random number generator must be deterministic. So, std::mt19937 must be initialized with specific seed and we can control the seed value through program parameter. But, in your PR, std::mt19937 is initialized with std::random_device{}(). And std::random_device{}() produces non-deterministic values. So, we can't reproduce the situation when some tests are failed.

@taeguk
Copy link
Member

taeguk commented Mar 3, 2018

@Anushi1998

As far as I think the problem of UB is due to integer overflow.Can't we avoid it by restricting the random values to a narrower range?

Yes you're right. The problem is due to integer overflow. And we can avoid it by the way you suggested.
But, when only see struct random_fill, it's constructor take int as parameter. This means the constructor can take any numbers int can represent. And so, it means that there is potential UB problem. So I think that we should find more better solution. One candidate is using short for rand_base and using unsigned short for range in constructor of struct random_fill.
https://github.com/Anushi1998/hpx/blob/3a1e3d8be803740487487f1a35a29d96740855dc/tests/unit/parallel/algorithms/merge_tests.hpp#L97-L99

@Anushi1998
Copy link
Contributor Author

@taeguk

So, std::mt19937 must be initialized with specific seed and we can control the seed value through program parameter.

I think std::seed_seq generated from these program(machine) dependent entropy could be an option to consider

  • std::chrono::high_resolution_clock
  • getpid()
  • __builtin_readcyclecounter() ,and
  • Entropy from the Memory Configuration

But we need to discuss it with @hkaiser, he suggested to use std::random_device

@Anushi1998
Copy link
Contributor Author

Anushi1998 commented Mar 4, 2018

@taeguk

We can also consider generating an exception whenever someone tries to call a large rand_base and large range which could potentially lead to overflow in UB?Or, if these are generated randomly we should try to come up with some better until we find some values which can be assigned.

@hkaiser
Copy link
Member

hkaiser commented Mar 4, 2018

@taeguk std::random_device is the preferred way of seeding a random number generator (at least AFAIU). I don't see any contradiction in using it and still being able to seed things from the command line, I even think @Anushi1998 has already done that for most of the tests he has adapted. He just decided to do it differently for the performance tests, @Anushi1998 what was your rationale?

@Anushi1998
Copy link
Contributor Author

I think std::random_device is a good option but I request you to take a look here, they have some good points regarding random generators and how to use a better seed sequence instead of a single number seed used by me.I want your suggestion regarding this, should we consider this or not?
Also, I don't think that we should make random number generator deterministic (because otherwise, I don't see the point of being random) and I think the situation can be reproduced (by printing the seed value, which is already there in many programs) and then explicitly seeding mt19937.I am sorry if it doesn't make any sense and I have understood the words of taeguk wrongly.
I would like to hear any criticism/suggestion from @taeguk and @hkaiser

@taeguk
Copy link
Member

taeguk commented Mar 5, 2018

@Anushi1998 Sorry, I may misused the word 'deterministic' and 'non-deterministic'. What I just wanted to say is that we must be able to initialize mt19937 with the specific seed. But, there are some places where you just initialize mt19937 with std::random_device{}() like https://github.com/Anushi1998/hpx/blob/b59b8342ac36ec7afe58da9dcbfb73ec6d92792b/tests/performance/parallel_algorithms/local/benchmark_partition.cpp#L37 . What I meant is just that you should fix that because random_fill can generate different numbers even if the seed values which are passed by program option are same (L185).
I didn't mean that we must not use std::random_device. For creating default seed which is passed to mt19937, using std::random_device can be a good choice.
But, std::random_device don't work exactly on all platforms, so if you have another better option, I don't oppose that you use that better option.

@Anushi1998
Copy link
Contributor Author

Anushi1998 commented Mar 8, 2018

@taeguk

you should fix that because random_fill can generate different numbers even if the seed values which are passed by program option are same (L185).

Please let me know if I missed something you expected.

But, std::random_device don't work exactly on all platforms, so if you have another better option, I don't oppose that you use that better option.

I have not done that because I am waiting for @hkaiser review on this.

@hkaiser
Copy link
Member

hkaiser commented Mar 10, 2018

@Anushi1998 we discussed this on irc, but for the records: I think it's fine to use std::random_device to generate the seed as no portability is required for this. If we need to run the tests with a given seed we can do that in any case by providing the command line argument.

@Anushi1998
Copy link
Contributor Author

Anushi1998 commented Mar 10, 2018

Yes, I completely agree with it @hkaiser .Have you also reviewed latest commits?Please tell me if I have missed something as expected by @taeguk .

@Anushi1998
Copy link
Contributor Author

@taeguk and @hkaiser please let me know if it needs any more changes :)

@hkaiser
Copy link
Member

hkaiser commented Mar 10, 2018

I don't know what @taeguk expected, for myself, this looks good to go.

@Anushi1998
Copy link
Contributor Author

Can you please have a look @taeguk now.Is it what you expected?

@Anushi1998
Copy link
Contributor Author

Thanks, @taeguk for pointing that out.I hope it is now up to your expectation :)

if (vm.count("seed"))
seed = vm["seed"].as<unsigned int>();

std::cout << "using seed: " << seed << std::endl;
std::srand(seed);
Copy link
Contributor

@msimberg msimberg Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If std::srand is removed seed is completely unused, no? Either std::srand should stay or seed (or a generator initialized with seed) should be passed one way or another down to the tests (which directly use std::rand now). Ideally it would be the latter as the goal is to get rid of std::rand.

What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msimberg I have not completed this file(see this still uses std::rand).I have only covered these 30 files.As soon as this PR merges I will file another PR for next 30 files(including this).

Also, this file commit has happened accidentally because I have started my work for another 30 files but seeing some issues regarding std::random_device I stopped and I have done hard reset to come back to the previous commit but my mistake this change still happened here.But I will soon fix this file in another PR as soon as this merges(I assure within a day or two).If this is not acceptable then I will fix it right away :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anushi1998 Okay, no problem in that case! Just make sure this change doesn't get included in this PR. Thanks for clarifying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure @msimberg ,thanks :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msimberg I have reverted the changes.Thanks for figuring out :)

if (vm.count("seed"))
seed = vm["seed"].as<unsigned int>();

std::cout << "using seed: " << seed << std::endl;
std::srand(seed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anushi1998 Thanks for updating the file. Would you mind doing this one as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before doing that, please would you clarify me what purpose srand serves if there is no call to std::rand.Have a look at this discussion too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good point. But if the test doesn't use any random numbers the seed is also not needed. As far as I know we don't generally add the seed option if the test doesn't require random numbers, so I would remove it completely, but I haven't been around for too long.

@hkaiser Did you mean to use rand somewhere in the test? If not, do you prefer to keep the seed option for consistency even if it's not used, or could it be removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has no purpose, please remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will do it :)

Added newline to end of the file.
@Anushi1998
Copy link
Contributor Author

Anushi1998 commented Mar 12, 2018

@hkaiser, @taeguk, and @msimberg I have updated the changes.Please let me know if there is some error.

Copy link
Member

@taeguk taeguk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you :)

@@ -135,12 +139,9 @@ double run_is_heap_until_benchmark_par_unseq(int test_count,
///////////////////////////////////////////////////////////////////////////////
int hpx_main(boost::program_options::variables_map& vm)
{
unsigned int seed = (unsigned int)std::time(nullptr);
if (vm.count("seed"))
seed = vm["seed"].as<unsigned int>();

Copy link
Contributor

@msimberg msimberg Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't _rand be seeded here again if seed is given on the command line? Same way you've updated benchmark_partition_copy.cpp? The same goes for benchmark_inplace_merge.cpp and benchmark_is_heap.cpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the changes.Thanks for figuring out :)

@msimberg
Copy link
Contributor

Thanks @Anushi1998, looks good!

@hkaiser hkaiser merged commit 30abb4e into STEllAR-GROUP:master Mar 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of std::rand has UB risk in tests
4 participants