New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Use RandomKit for random number generation #671

Closed
wants to merge 30 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@thesamovar
Member

thesamovar commented Apr 12, 2016

Numpy uses a Mersenne Twister for random number generation based on an open source implementation that as far as I can tell should be fully compatible with our license. I started from this version: https://github.com/stefanv/randkit

So far I've updated C++ standalone to use this library. This entailed a few small changes to CPPStandaloneDevice because it's a C file not C++, and because if you're on Windows you need to link to the advapi32 library. I haven't tested it on Linux yet but all tests pass on Windows.

One issue is using it with OpenMP since I believe the library is not thread safe. However, I'm not sure that we're handling this correctly at the moment anyway since we use rand() which is also not thread safe. Thoughts on this?

It should also be possible to use this library in weave and Cython, but there are a few issues. In order to use the library, you need to seed and maintain a fairly large amount of state (in the C struct rk_state). This is straightforward in C++ standalone but more of an issue for weave/cython because ideally you'd like to maintain the same state across code objects but this is non-trivial. Some options are:

  1. Since we're using an identical implementation to numpy, try to get a pointer to their rk_state and use this. I haven't worked out how to do this yet. I can get a copy of their rk_state (it's what is returned by RandomState.get_state essentially), but not a pointer to it. If we used this copy, we'd have to update it at the end of the loop (which we can do at a small cost). However, if the user was also accessing numpy's random numbers in their code, this would totally mess up the random state. If we could get a pointer to their state (which might be possible using the Cython implementation mtrand.pyx behind Numpy's RandomState), this would solve this problem. However, if numpy ever changes their RNG we might get into trouble here.
  2. We handle our own state separately from numpy's. We copy the pointer into a Python integer and pass that to weave and Cython, reinterpreting as an appropriate pointer type. I think this should be possible although it's a little bit fiddly to do. Advantages: we don't interfere with numpy at all since we never use it. Disadvantages: fiddly, and a potential problem is that the numpy random state might get seeded with the same value as ours. Looking at the source code, if possible it uses OS level entropy functions to seed the RNG (which would work fine because we would get a different and fully independent seed compared to numpy), but it falls back on using the time if they are not available (which might easily lead to us having the same seed as numpy). We could potentially check for this by using RandomState.get_state and doing a byte-for-byte comparison and raising an error if they are the same, but that's more fiddliness.
  3. We have a separate random state for each CodeObject. As above, if the seed value is coming from a good source of entropy then this is fine but if it falls back on using the system time we'd get the same seed for each CodeObject which would be a problem. A possible solution here would be for each CodeObject to generate a seed from a global RNG, but I have no idea if that's safe or not (probably not). You can see this approach in dev/ideas/randomkit/randkit_from_weave.py.
@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Apr 12, 2016

Coverage Status

Coverage decreased (-3.1%) to 90.478% when pulling 5c7dad8 on use_randomkit into 5e52644 on master.

coveralls commented Apr 12, 2016

Coverage Status

Coverage decreased (-3.1%) to 90.478% when pulling 5c7dad8 on use_randomkit into 5e52644 on master.

thesamovar added some commits Apr 12, 2016

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 13, 2016

Member

On further reading, randomkit initialises the random state using OS-level cryptographic randomness functions, and it does so by initialising all the state of the Mersenne twister, not just a seed, meaning that each new random state should be independent of the others (an exception is raised if there isn't enough randomness to initialise them, but at least on Windows in my testing this didn't happen even if you initialised thousands of random states). Given this, I went ahead and implemented support for Cython and weave. Tests are passing on my machine now, will see about Linux/Mac.

In terms of speed, on the CUBA benchmark, connections are 2x faster for weave for large N (400k) but not much different for small N (<40k). Cython has a modest improvement of around 10% maybe. It may be possible to improve my implementation though (both for weave and Cython).

From my point of view, if all tests pass, this is mergeable.

Member

thesamovar commented Apr 13, 2016

On further reading, randomkit initialises the random state using OS-level cryptographic randomness functions, and it does so by initialising all the state of the Mersenne twister, not just a seed, meaning that each new random state should be independent of the others (an exception is raised if there isn't enough randomness to initialise them, but at least on Windows in my testing this didn't happen even if you initialised thousands of random states). Given this, I went ahead and implemented support for Cython and weave. Tests are passing on my machine now, will see about Linux/Mac.

In terms of speed, on the CUBA benchmark, connections are 2x faster for weave for large N (400k) but not much different for small N (<40k). Cython has a modest improvement of around 10% maybe. It may be possible to improve my implementation though (both for weave and Cython).

From my point of view, if all tests pass, this is mergeable.

@thesamovar thesamovar changed the title from [WIP] Use RandomKit for random number generation to [MRG] Use RandomKit for random number generation Apr 13, 2016

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 13, 2016

Member

Actually I think this covers everything in the original post on #222. We did discuss a bit there the idea of having configurable RNGs and parallel implementations, but I'd be inclined to create a new issue for that and close #222 if and when we merge this.

Member

thesamovar commented Apr 13, 2016

Actually I think this covers everything in the original post on #222. We did discuss a bit there the idea of having configurable RNGs and parallel implementations, but I'd be inclined to create a new issue for that and close #222 if and when we merge this.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 13, 2016

Member

Hmm there appear to be some build problems on both Windows and Linux (although that's strange because it's working fine on my computer).

Member

thesamovar commented Apr 13, 2016

Hmm there appear to be some build problems on both Windows and Linux (although that's strange because it's working fine on my computer).

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Apr 19, 2016

Member

I did not yet look into this in detail, but I fixed a few issues that prevented the tests from running on my machine:

  • I'm not exactly sure why, but weave/cython did not like the relative paths to the directories with header files (even though we have used them before for stdint_compat.h I think...), I replaced them with absolute paths
  • I don't quite understand why, but the existence of brian2.random made the import of brian2 completely fail when started from my IDE (import random picked up our package and therefore everything referring to the random package from the standard library failed). I moved the package to brian2.utils.random for now.
  • the brianlib/randomkit/randomkit.c file was deleted in make clean on Linux
  • I needed an explicit type case in randomkit.c for the compilation to not error out on standalone (for some reason it worked with weave, it must be an issue of compilation parameters)

Let's see whether this fixes the build problems on travis/appveyor.

Member

mstimberg commented Apr 19, 2016

I did not yet look into this in detail, but I fixed a few issues that prevented the tests from running on my machine:

  • I'm not exactly sure why, but weave/cython did not like the relative paths to the directories with header files (even though we have used them before for stdint_compat.h I think...), I replaced them with absolute paths
  • I don't quite understand why, but the existence of brian2.random made the import of brian2 completely fail when started from my IDE (import random picked up our package and therefore everything referring to the random package from the standard library failed). I moved the package to brian2.utils.random for now.
  • the brianlib/randomkit/randomkit.c file was deleted in make clean on Linux
  • I needed an explicit type case in randomkit.c for the compilation to not error out on standalone (for some reason it worked with weave, it must be an issue of compilation parameters)

Let's see whether this fixes the build problems on travis/appveyor.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 19, 2016

Member

Great! Will be interested to see if this fixes things.

For the IDE, if you're using pycharm and you set brian2/brian2 as a source directory, I had the same problem. I think it adds it to the Python path (meaning that it aliases Python's own random). I changed it so that just the root brian2 was set as a source directory and it was fine. The reason I put it in brian2.random was that I thought that in the future this would be where we collect files related to our bigger plans for RNGs, but since we don't actually have that yet utils is fine too.

Member

thesamovar commented Apr 19, 2016

Great! Will be interested to see if this fixes things.

For the IDE, if you're using pycharm and you set brian2/brian2 as a source directory, I had the same problem. I think it adds it to the Python path (meaning that it aliases Python's own random). I changed it so that just the root brian2 was set as a source directory and it was fine. The reason I put it in brian2.random was that I thought that in the future this would be where we collect files related to our bigger plans for RNGs, but since we don't actually have that yet utils is fine too.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Apr 20, 2016

Coverage Status

Coverage decreased (-0.06%) to 93.533% when pulling 45b9352 on use_randomkit into 5e52644 on master.

coveralls commented Apr 20, 2016

Coverage Status

Coverage decreased (-0.06%) to 93.533% when pulling 45b9352 on use_randomkit into 5e52644 on master.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Apr 20, 2016

Coverage Status

Coverage increased (+0.04%) to 93.583% when pulling 0e0f550 on use_randomkit into 1a6e5fe on master.

coveralls commented Apr 20, 2016

Coverage Status

Coverage increased (+0.04%) to 93.583% when pulling 0e0f550 on use_randomkit into 1a6e5fe on master.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 21, 2016

Member

Could it be that it is trying to build randomkit.o to the Brian directory and that this is causing problems? e.g. it might be trying to write the file from multiple processes on linux using parallel testing? If so, maybe we need to copy randomkit.c to a tempfile or tempdir and build there? That's a bit annoying because it means having multiple copies of the same thing but I don't see a way to avoid it if this is the problem.

Member

thesamovar commented Apr 21, 2016

Could it be that it is trying to build randomkit.o to the Brian directory and that this is causing problems? e.g. it might be trying to write the file from multiple processes on linux using parallel testing? If so, maybe we need to copy randomkit.c to a tempfile or tempdir and build there? That's a bit annoying because it means having multiple copies of the same thing but I don't see a way to avoid it if this is the problem.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 21, 2016

Member

Ah we also have too long filenames on Windows apparently:

CompileError: error: could not create 'c:\users\appveyor\appdata\local\temp\1\scipy-appveyor-wbaazg\python27_intermediate\compiler_e3b0c44298fc1c149afbf4c8996fb924\Release\Miniconda-x64\envs\appveyor_test\lib\site-packages\brian2-2.0rc0+git-py2.7-win-amd64.egg\brian2\utils\random\randomkit': The filename or extension is too long
Member

thesamovar commented Apr 21, 2016

Ah we also have too long filenames on Windows apparently:

CompileError: error: could not create 'c:\users\appveyor\appdata\local\temp\1\scipy-appveyor-wbaazg\python27_intermediate\compiler_e3b0c44298fc1c149afbf4c8996fb924\Release\Miniconda-x64\envs\appveyor_test\lib\site-packages\brian2-2.0rc0+git-py2.7-win-amd64.egg\brian2\utils\random\randomkit': The filename or extension is too long
@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Apr 22, 2016

Member

Oh, the long filenames thing is annoying, not sure we can do much about this :-/ Well, we could move it back to brian2.random instead of brian2.utils.random, it seems to be just below the limit then...

Could it be that it is trying to build randomkit.o to the Brian directory and that this is causing problems? e.g. it might be trying to write the file from multiple processes on linux using parallel testing?

Yes, I think the problem is that we are running the Cython tests in parallel and several processes try to build randomkit.o at the same time. It's a shame, because Cython is in general fine with being used in parallel. Actually, compiling the file over and over again is a bit of a waste, I think the correct thing would be to compile it during installation (i.e. it would be pre-compiled in the conda package) and then just link it. This would mean that you cannot simply start Brian from the source directory, but we could raise an error message telling the user to run python setup.py build_ext --inplace or something like that (which would also give them the Cython spike queue).

Member

mstimberg commented Apr 22, 2016

Oh, the long filenames thing is annoying, not sure we can do much about this :-/ Well, we could move it back to brian2.random instead of brian2.utils.random, it seems to be just below the limit then...

Could it be that it is trying to build randomkit.o to the Brian directory and that this is causing problems? e.g. it might be trying to write the file from multiple processes on linux using parallel testing?

Yes, I think the problem is that we are running the Cython tests in parallel and several processes try to build randomkit.o at the same time. It's a shame, because Cython is in general fine with being used in parallel. Actually, compiling the file over and over again is a bit of a waste, I think the correct thing would be to compile it during installation (i.e. it would be pre-compiled in the conda package) and then just link it. This would mean that you cannot simply start Brian from the source directory, but we could raise an error message telling the user to run python setup.py build_ext --inplace or something like that (which would also give them the Cython spike queue).

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 22, 2016

Member

Compiling just once at install time does seem like a good idea, on the other hand the idea of making the installation more complicated worries me. Could we do something with a lock file so that even if it gets run from multiple processes it only needs to get built once? I seem to remember we did this for Cython at some point?

Member

thesamovar commented Apr 22, 2016

Compiling just once at install time does seem like a good idea, on the other hand the idea of making the installation more complicated worries me. Could we do something with a lock file so that even if it gets run from multiple processes it only needs to get built once? I seem to remember we did this for Cython at some point?

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Apr 22, 2016

Member

Compiling just once at install time does seem like a good idea, on the other hand the idea of making the installation more complicated worries me.

I am not sure it would make the installation more complicated, it would be compiled using the standard C extension mechanism in setup.py, we'd add it to the lines already compiling the spike queue.

Member

mstimberg commented Apr 22, 2016

Compiling just once at install time does seem like a good idea, on the other hand the idea of making the installation more complicated worries me.

I am not sure it would make the installation more complicated, it would be compiled using the standard C extension mechanism in setup.py, we'd add it to the lines already compiling the spike queue.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 22, 2016

Member

Yeah but if those fail it still works, which wouldn't be the case here. Still, maybe it's the best way.

Member

thesamovar commented Apr 22, 2016

Yeah but if those fail it still works, which wouldn't be the case here. Still, maybe it's the best way.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Apr 22, 2016

Member

Yeah but if those fail it still works, which wouldn't be the case here.

It would still work with pure numpy -- but if the compilation of randomkit.c fails, then I also think it is unlikely that weave/Cython works, right? And then, hopefully, most users are using the conda packages anyway...

Member

mstimberg commented Apr 22, 2016

Yeah but if those fail it still works, which wouldn't be the case here.

It would still work with pure numpy -- but if the compilation of randomkit.c fails, then I also think it is unlikely that weave/Cython works, right? And then, hopefully, most users are using the conda packages anyway...

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Apr 22, 2016

Member

OK you make a good case. Let's do that then.

Member

thesamovar commented Apr 22, 2016

OK you make a good case. Let's do that then.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Apr 22, 2016

Member

OK you make a good case. Let's do that then.

Ok -- I won't work on this in the next days, so go ahead if you want to.

Member

mstimberg commented Apr 22, 2016

OK you make a good case. Let's do that then.

Ok -- I won't work on this in the next days, so go ahead if you want to.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls May 20, 2016

Coverage Status

Coverage decreased (-14.6%) to 78.921% when pulling d1131ca on use_randomkit into 1a6e5fe on master.

coveralls commented May 20, 2016

Coverage Status

Coverage decreased (-14.6%) to 78.921% when pulling d1131ca on use_randomkit into 1a6e5fe on master.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg May 23, 2016

Member

I give up... As you can see from the glorious commit history (BTW: before merging I'll probably do a use_randomkit_clear branch getting rid of all those commits), I tried quite a few things to make the pre-compiled randomkit as a shared library work, but it still does not. Getting it to work for Linux was pretty straight-forward but for OS X I do not manage to get the runtime linking right with the standard distutils tools (similar to this problem: scipy/scipy#2082). Similarly for Windows, I managed to manually compile and link things correctly in my Windows VM, but it does not quite work with the distutils.Extension mechanism. I think I could get both to work but this would mean calling the compiler and other command line tools by hand, I don't think we want to write much of this platform-dependent code.

So that brings us back to your previous approach where randomkit.c is included in the source files. Actually, maybe we could do an intermediate solution: we'd include it in the source files if randomkit.o does not exist and otherwise include randomkit.o. Then again, not sure whether this works for OS X and Windows -- but I think it is simpler than what I tried before because it would be statically and not dynamically linked. This would not solve the Cython issue for multiple processes but maybe we could use the lock file approach there.

Um, wait, actually with the source_files approach it appears (judging from the output on travis) that randomkit.o ends up in the install directory of Brian 2 -- this is fine on our machines (and on the test servers) but if you install Brian 2 system-wide than this directory will not be writeable... That is odd, shouldn't weave compile it from its cache directory?

Member

mstimberg commented May 23, 2016

I give up... As you can see from the glorious commit history (BTW: before merging I'll probably do a use_randomkit_clear branch getting rid of all those commits), I tried quite a few things to make the pre-compiled randomkit as a shared library work, but it still does not. Getting it to work for Linux was pretty straight-forward but for OS X I do not manage to get the runtime linking right with the standard distutils tools (similar to this problem: scipy/scipy#2082). Similarly for Windows, I managed to manually compile and link things correctly in my Windows VM, but it does not quite work with the distutils.Extension mechanism. I think I could get both to work but this would mean calling the compiler and other command line tools by hand, I don't think we want to write much of this platform-dependent code.

So that brings us back to your previous approach where randomkit.c is included in the source files. Actually, maybe we could do an intermediate solution: we'd include it in the source files if randomkit.o does not exist and otherwise include randomkit.o. Then again, not sure whether this works for OS X and Windows -- but I think it is simpler than what I tried before because it would be statically and not dynamically linked. This would not solve the Cython issue for multiple processes but maybe we could use the lock file approach there.

Um, wait, actually with the source_files approach it appears (judging from the output on travis) that randomkit.o ends up in the install directory of Brian 2 -- this is fine on our machines (and on the test servers) but if you install Brian 2 system-wide than this directory will not be writeable... That is odd, shouldn't weave compile it from its cache directory?

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar May 26, 2016

Member

Thanks for all that work. What a nightmare.

I'm happy to give up on the shared compilation and include it in each file. This has the advantage that it will naturally work with Cython on parallel processes, and generally may simplify things. It does probably raise all compilation times a bit though, and seems a bit like it wastes quite a lot of effort on recompilation, but what can you do?

The other option is that we just don't include this for codegen, but only for standalone. I think that would be a shame because at least for weave it can give some fairly decent speed improvements. It also gives us the option of having more precise codegen-invariant control over random number generation in the future if we decide that's worth doing.

Which would you rather go for?

We also need a fix for OpenMP, either we have one separate rk_state for each thread (probably the best solution) or we put in a pragma block that forces the RNG to run in a single thread.

Oh, and for the file location: yeah I noticed that it was putting the .o files in weird places too. Maybe we should just explicitly copy randomkit.c to the temp directory that weave/cython uses? It's fairly easy to find out where that directory is (we already do this for cython).

Member

thesamovar commented May 26, 2016

Thanks for all that work. What a nightmare.

I'm happy to give up on the shared compilation and include it in each file. This has the advantage that it will naturally work with Cython on parallel processes, and generally may simplify things. It does probably raise all compilation times a bit though, and seems a bit like it wastes quite a lot of effort on recompilation, but what can you do?

The other option is that we just don't include this for codegen, but only for standalone. I think that would be a shame because at least for weave it can give some fairly decent speed improvements. It also gives us the option of having more precise codegen-invariant control over random number generation in the future if we decide that's worth doing.

Which would you rather go for?

We also need a fix for OpenMP, either we have one separate rk_state for each thread (probably the best solution) or we put in a pragma block that forces the RNG to run in a single thread.

Oh, and for the file location: yeah I noticed that it was putting the .o files in weird places too. Maybe we should just explicitly copy randomkit.c to the temp directory that weave/cython uses? It's fairly easy to find out where that directory is (we already do this for cython).

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg May 30, 2016

Member

I'm happy to give up on the shared compilation and include it in each file. This has the advantage that it will naturally work with Cython on parallel processes, and generally may simplify things. It does probably raise all compilation times a bit though, and seems a bit like it wastes quite a lot of effort on recompilation, but what can you do?

How would you "include it in each file" in practice, do you mean to use the standard function mechanism, i.e. add the rk_gauss code to the code currently included for _randn? How would this work for Cython?

The other option is that we just don't include this for codegen, but only for standalone. I think that would be a shame because at least for weave it can give some fairly decent speed improvements. It also gives us the option of having more precise codegen-invariant control over random number generation in the future if we decide that's worth doing.

No, I agree that it is better to also use it for runtime, in general we don't want to go via Python for weave/Cython. One disadvantage of the new approach is that you can no longer reproduce random numbers with weave/Cython, I guess this qualifies as a regression that we'd rather avoid, no? Maybe we should finally deal with the seeds issue (see also #19, #234) -- no fancy guarantees for now, i.e. reproducible numbers only when using the same codegen target etc.?

We also need a fix for OpenMP, either we have one separate rk_state for each thread (probably the best solution) or we put in a pragma block that forces the RNG to run in a single thread.

I guess for a small number of threads this might be ok, but here is a quote from the wikipedia article: "Multiple Mersenne Twister instances that differ only in seed value (but not other parameters) are not generally appropriate for Monte-Carlo simulations that require independent random number generators, though there exists a method for choosing multiple sets of parameter values." There are variants of the Mersenne-Twister algorithm for exactly that use case but if we go down this rabbit hole than we should probably directly switch to something like Random123...

Oh, and for the file location: yeah I noticed that it was putting the .o files in weird places too. Maybe we should just explicitly copy randomkit.c to the temp directory that weave/cython uses? It's fairly easy to find out where that directory is (we already do this for cython).

Actually if we do that, could we not copy it for every compilation under a name that includes the UUID of the main source file? This way the Cython problem for parallel compilation would go away by itself.

Member

mstimberg commented May 30, 2016

I'm happy to give up on the shared compilation and include it in each file. This has the advantage that it will naturally work with Cython on parallel processes, and generally may simplify things. It does probably raise all compilation times a bit though, and seems a bit like it wastes quite a lot of effort on recompilation, but what can you do?

How would you "include it in each file" in practice, do you mean to use the standard function mechanism, i.e. add the rk_gauss code to the code currently included for _randn? How would this work for Cython?

The other option is that we just don't include this for codegen, but only for standalone. I think that would be a shame because at least for weave it can give some fairly decent speed improvements. It also gives us the option of having more precise codegen-invariant control over random number generation in the future if we decide that's worth doing.

No, I agree that it is better to also use it for runtime, in general we don't want to go via Python for weave/Cython. One disadvantage of the new approach is that you can no longer reproduce random numbers with weave/Cython, I guess this qualifies as a regression that we'd rather avoid, no? Maybe we should finally deal with the seeds issue (see also #19, #234) -- no fancy guarantees for now, i.e. reproducible numbers only when using the same codegen target etc.?

We also need a fix for OpenMP, either we have one separate rk_state for each thread (probably the best solution) or we put in a pragma block that forces the RNG to run in a single thread.

I guess for a small number of threads this might be ok, but here is a quote from the wikipedia article: "Multiple Mersenne Twister instances that differ only in seed value (but not other parameters) are not generally appropriate for Monte-Carlo simulations that require independent random number generators, though there exists a method for choosing multiple sets of parameter values." There are variants of the Mersenne-Twister algorithm for exactly that use case but if we go down this rabbit hole than we should probably directly switch to something like Random123...

Oh, and for the file location: yeah I noticed that it was putting the .o files in weird places too. Maybe we should just explicitly copy randomkit.c to the temp directory that weave/cython uses? It's fairly easy to find out where that directory is (we already do this for cython).

Actually if we do that, could we not copy it for every compilation under a name that includes the UUID of the main source file? This way the Cython problem for parallel compilation would go away by itself.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg May 30, 2016

Member

One more thought: given that this will likely take a bit more time, should we maybe do a new RC release soon with all the bug fixes so far (and maybe a few more that are easy to do) instead of blocking on this issue? In particular the memory leak has affected quite a few users I think.

Member

mstimberg commented May 30, 2016

One more thought: given that this will likely take a bit more time, should we maybe do a new RC release soon with all the bug fixes so far (and maybe a few more that are easy to do) instead of blocking on this issue? In particular the memory leak has affected quite a few users I think.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar May 31, 2016

Member

Will reply in more detail later, but briefly yes: let's absolutely do a new RC.

Member

thesamovar commented May 31, 2016

Will reply in more detail later, but briefly yes: let's absolutely do a new RC.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar May 31, 2016

Member

I like your last suggestion, we make a copy of randomkit.c to randomkit_<uuid>.c and copy it to the weave/cython directory. Then we use the existing mechanisms that we already implemented for weave/cython. Maybe we could also make a cut down version of randomkit that only has the functions we actually use to reduce compilation times too?

For parallel processing, I guess in that case we should just have a single thread be responsible for generating the random numbers then? It's not an ideal solution but it works and gives good quality if not highly efficient results.

I wouldn't worry too much about reproducible random numbers for now. Getting it working and giving high quality random numbers seems like the most important thing. I'd postpone the work on reproducible random numbers to 2.1.

Is that enough to get going on for now or did I miss something important?

Member

thesamovar commented May 31, 2016

I like your last suggestion, we make a copy of randomkit.c to randomkit_<uuid>.c and copy it to the weave/cython directory. Then we use the existing mechanisms that we already implemented for weave/cython. Maybe we could also make a cut down version of randomkit that only has the functions we actually use to reduce compilation times too?

For parallel processing, I guess in that case we should just have a single thread be responsible for generating the random numbers then? It's not an ideal solution but it works and gives good quality if not highly efficient results.

I wouldn't worry too much about reproducible random numbers for now. Getting it working and giving high quality random numbers seems like the most important thing. I'd postpone the work on reproducible random numbers to 2.1.

Is that enough to get going on for now or did I miss something important?

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Jun 1, 2016

Member

I like your last suggestion, we make a copy of randomkit.c to randomkit_.c and copy it to the weave/cython directory. Then we use the existing mechanisms that we already implemented for weave/cython. Maybe we could also make a cut down version of randomkit that only has the functions we actually use to reduce compilation times too?

Yep, sounds good.

For parallel processing, I guess in that case we should just have a single thread be responsible for generating the random numbers then? It's not an ideal solution but it works and gives good quality if not highly efficient results.

This is probably the safest solution for now.

I wouldn't worry too much about reproducible random numbers for now. Getting it working and giving high quality random numbers seems like the most important thing. I'd postpone the work on reproducible random numbers to 2.1.

I agree for standalone but I think for weave/cython (i.e. our default targets) this is a big regression that we can't simply introduce like that. Judging from previous questions on the mailing list, setting numpy's seed in the beginning is something that quite a few users do (e.g. to reproduce synaptic connectivity or to do frozen noise trials). By always setting the seed randomly in the C source code, these scripts would break and we couldn't even point them to an alternative that works (except for: "use the numpy target").

Member

mstimberg commented Jun 1, 2016

I like your last suggestion, we make a copy of randomkit.c to randomkit_.c and copy it to the weave/cython directory. Then we use the existing mechanisms that we already implemented for weave/cython. Maybe we could also make a cut down version of randomkit that only has the functions we actually use to reduce compilation times too?

Yep, sounds good.

For parallel processing, I guess in that case we should just have a single thread be responsible for generating the random numbers then? It's not an ideal solution but it works and gives good quality if not highly efficient results.

This is probably the safest solution for now.

I wouldn't worry too much about reproducible random numbers for now. Getting it working and giving high quality random numbers seems like the most important thing. I'd postpone the work on reproducible random numbers to 2.1.

I agree for standalone but I think for weave/cython (i.e. our default targets) this is a big regression that we can't simply introduce like that. Judging from previous questions on the mailing list, setting numpy's seed in the beginning is something that quite a few users do (e.g. to reproduce synaptic connectivity or to do frozen noise trials). By always setting the seed randomly in the C source code, these scripts would break and we couldn't even point them to an alternative that works (except for: "use the numpy target").

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Jun 1, 2016

Member

Hmm, I wonder whether it would be very complicated to pass around our own rk_state instance to all codeobjects via the namespace? We could then initialize it from numpy's RandomState.get_state() in before_run and write the state back after_run.

Member

mstimberg commented Jun 1, 2016

Hmm, I wonder whether it would be very complicated to pass around our own rk_state instance to all codeobjects via the namespace? We could then initialize it from numpy's RandomState.get_state() in before_run and write the state back after_run.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Jun 1, 2016

Member

Reusing numpy's state would probably be the best if we can work out how to do it. One issue with copying/writing back the state is that if we call out to python code that generates a random number then we would get the same number being generated which would be a really hard to debug error. If we could get a pointer directly to the rk_state that numpy creates, that would solve this problem.

Member

thesamovar commented Jun 1, 2016

Reusing numpy's state would probably be the best if we can work out how to do it. One issue with copying/writing back the state is that if we call out to python code that generates a random number then we would get the same number being generated which would be a really hard to debug error. If we could get a pointer directly to the rk_state that numpy creates, that would solve this problem.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Jun 1, 2016

Member

Reusing numpy's state would probably be the best if we can work out how to do it. One issue with copying/writing back the state is that if we call out to python code that generates a random number then we would get the same number being generated which would be a really hard to debug error.

Yes, this is indeed a nasty problem -- not a very common use case but as you say, really hard to debug when it occurs.

If we could get a pointer directly to the rk_state that numpy creates, that would solve this problem.

I had a quick look at it and it does not seem to be a great solution because there is no stable C API. We'd need different code for numpy 1.10 and numpy 1.11 and potentially it will break in the future again (see this discussion: http://numpy-discussion.10968.n7.nabble.com/mtrand-c-update-1-11-breaks-my-crappy-code-td42746.html).

I think we have various "not-great-but-ok" options:

  1. Hard-code around the problem you mentioned, by not only reading/writing the state before and after a run, but also before and after network operations
  2. Have an explicit seed command for the seed in our own randomkit state. In the case of random numbers in network operations you'd have two independent MT streams which is not optimal but for just two streams it should be ok, I think. You can get reproducible random numbers by setting both numpy's and our own seed.
  3. postpone the decision and for now implement the new randomkit approach only for standalone (i.e. go back to calls via Python for weave/cython)

I think 1 will be kind of ugly so I'd prefer 2 or 3. Of those, 3 would certainly be the least amount of work. It is true that the new solution is faster, but maybe the "call numpy.rand from weave/cython" approach could be improved as well by either using a buffer size that is reasonable for the problem at hand (e.g. the size of the NeuronGroup for a state updater) or simply a bigger buffer size than we currently have (IIRC it is currently 1024) -- or did we benchmark this already and it did not make any difference?

Member

mstimberg commented Jun 1, 2016

Reusing numpy's state would probably be the best if we can work out how to do it. One issue with copying/writing back the state is that if we call out to python code that generates a random number then we would get the same number being generated which would be a really hard to debug error.

Yes, this is indeed a nasty problem -- not a very common use case but as you say, really hard to debug when it occurs.

If we could get a pointer directly to the rk_state that numpy creates, that would solve this problem.

I had a quick look at it and it does not seem to be a great solution because there is no stable C API. We'd need different code for numpy 1.10 and numpy 1.11 and potentially it will break in the future again (see this discussion: http://numpy-discussion.10968.n7.nabble.com/mtrand-c-update-1-11-breaks-my-crappy-code-td42746.html).

I think we have various "not-great-but-ok" options:

  1. Hard-code around the problem you mentioned, by not only reading/writing the state before and after a run, but also before and after network operations
  2. Have an explicit seed command for the seed in our own randomkit state. In the case of random numbers in network operations you'd have two independent MT streams which is not optimal but for just two streams it should be ok, I think. You can get reproducible random numbers by setting both numpy's and our own seed.
  3. postpone the decision and for now implement the new randomkit approach only for standalone (i.e. go back to calls via Python for weave/cython)

I think 1 will be kind of ugly so I'd prefer 2 or 3. Of those, 3 would certainly be the least amount of work. It is true that the new solution is faster, but maybe the "call numpy.rand from weave/cython" approach could be improved as well by either using a buffer size that is reasonable for the problem at hand (e.g. the size of the NeuronGroup for a state updater) or simply a bigger buffer size than we currently have (IIRC it is currently 1024) -- or did we benchmark this already and it did not make any difference?

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Jun 1, 2016

Member

OK, agreed that if numpy are changing randomkit.c then that option is not really feasible.

Also agree that (2) or (3) are the best options and I'm more or less happy with either. I guess given that (3) is (a) theoretically better (just one MT stream), and (b) less work, we should probably go for that?

To improve the buffering, I think the best option would be a large buffer that is shared between all the different code objects. At the moment, each code object creates its own buffer. Creating a correctly sized buffer will work in some cases (e.g. NeuronGroup) but not in others (e.g. creating synapses where you won't necessarily know how many times you'll call rand()). If each code object has its own buffer, the worst case behaviour can be terrible (e.g. create 1024 when you only need one) or more frequently just bad (e.g. create 2048 when you needed 1025). However, if this buffer is shared between code objects then everything is nicely amortised and you can use a pretty big buffer. Technically, can we do this sharing? I guess we would need to create a length N array of doubles (where N=1024 or some larger buffer size) and a length 1 integer array (holding the offset into this buffer), and then pass pointers to these two arrays to each code object.

Member

thesamovar commented Jun 1, 2016

OK, agreed that if numpy are changing randomkit.c then that option is not really feasible.

Also agree that (2) or (3) are the best options and I'm more or less happy with either. I guess given that (3) is (a) theoretically better (just one MT stream), and (b) less work, we should probably go for that?

To improve the buffering, I think the best option would be a large buffer that is shared between all the different code objects. At the moment, each code object creates its own buffer. Creating a correctly sized buffer will work in some cases (e.g. NeuronGroup) but not in others (e.g. creating synapses where you won't necessarily know how many times you'll call rand()). If each code object has its own buffer, the worst case behaviour can be terrible (e.g. create 1024 when you only need one) or more frequently just bad (e.g. create 2048 when you needed 1025). However, if this buffer is shared between code objects then everything is nicely amortised and you can use a pretty big buffer. Technically, can we do this sharing? I guess we would need to create a length N array of doubles (where N=1024 or some larger buffer size) and a length 1 integer array (holding the offset into this buffer), and then pass pointers to these two arrays to each code object.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Jun 2, 2016

Member

Also agree that (2) or (3) are the best options and I'm more or less happy with either. I guess given that (3) is (a) theoretically better (just one MT stream), and (b) less work, we should probably go for that?

Ok, let's do that then.

However, if this buffer is shared between code objects then everything is nicely amortised and you can use a pretty big buffer. Technically, can we do this sharing? I guess we would need to create a length N array of doubles (where N=1024 or some larger buffer size) and a length 1 integer array (holding the offset into this buffer), and then pass pointers to these two arrays to each code object.

I think that shouldn't be too difficult, this state could be stored in the device and passed on to all code objects (that may or may not make use of it depending on whether they use random numbers or not). I'll have a go at this, maybe we can do this quickly and include it in the next rc release -- it would be good if it gets a good amount of testing, I am a bit cautious of including this change shortly before the official 2.0 release.

Member

mstimberg commented Jun 2, 2016

Also agree that (2) or (3) are the best options and I'm more or less happy with either. I guess given that (3) is (a) theoretically better (just one MT stream), and (b) less work, we should probably go for that?

Ok, let's do that then.

However, if this buffer is shared between code objects then everything is nicely amortised and you can use a pretty big buffer. Technically, can we do this sharing? I guess we would need to create a length N array of doubles (where N=1024 or some larger buffer size) and a length 1 integer array (holding the offset into this buffer), and then pass pointers to these two arrays to each code object.

I think that shouldn't be too difficult, this state could be stored in the device and passed on to all code objects (that may or may not make use of it depending on whether they use random numbers or not). I'll have a go at this, maybe we can do this quickly and include it in the next rc release -- it would be good if it gets a good amount of testing, I am a bit cautious of including this change shortly before the official 2.0 release.

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Jun 2, 2016

Member

Um, given that this branch is far behind master and we are only going to keep the changes from a few commits (your initial work on standalone), I think it will be easier to create a new branch. I'll do that and open a new pull request when it is ready.

Member

mstimberg commented Jun 2, 2016

Um, given that this branch is far behind master and we are only going to keep the changes from a few commits (your initial work on standalone), I think it will be easier to create a new branch. I'll do that and open a new pull request when it is ready.

@thesamovar

This comment has been minimized.

Show comment
Hide comment
@thesamovar

thesamovar Jun 2, 2016

Member

Makes sense!

Member

thesamovar commented Jun 2, 2016

Makes sense!

@mstimberg

This comment has been minimized.

Show comment
Hide comment
@mstimberg

mstimberg Jun 2, 2016

Member

Cleaned up branch in #706 -- let's continue the discussion there.

Member

mstimberg commented Jun 2, 2016

Cleaned up branch in #706 -- let's continue the discussion there.

@mstimberg mstimberg closed this Jun 2, 2016

@mstimberg mstimberg removed the in progress label Jun 2, 2016

@mstimberg mstimberg deleted the use_randomkit branch May 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment