Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add a method of randbool() that accepts an RNG as input #6014

Closed
wants to merge 1 commit into from

Conversation

jperla
Copy link

@jperla jperla commented Mar 2, 2014

No description provided.

@@ -334,19 +334,23 @@ bitpack{T,N}(A::AbstractArray{T,N}) = convert(BitArray{N}, A)

## Random ##

function bitarray_rand_fill!(B::BitArray)
function bitarray_rand_fill!(B::BitArray, random_function::Function)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this is going to be slow, but it also seems odd to suddenly change from passing an AbstractRNG to passing a function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I was trying to avoid code duplication (i can just copy the whole method). Is there a better way to avoid this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay yea, it does seem to be 10x slower:

root# julia speed_test.jl 10                                                                                           
elapsed time: 0.007485235 seconds (139956 bytes allocated)                                                             
elapsed time: 0.009702895 seconds (279740 bytes allocated)                                                             
elapsed time: 0.000283116 seconds (0 bytes allocated)                                                                  
elapsed time: 0.001188023 seconds (160000 bytes allocated)                                                             
root# vi speed_test.jl                                                                                                 
root# julia speed_test.jl 10000                                                                                        
elapsed time: 0.022272985 seconds (139956 bytes allocated)                                                             
elapsed time: 0.128069668 seconds (25557260 bytes allocated)                                                           
elapsed time: 0.015917311 seconds (0 bytes allocated)                                                                  
elapsed time: 0.113222569 seconds (25437328 bytes allocated)   


root# cat speed_test.jl                                                                                                
N = ARGS[1]                                                                                                            

a = BitArray(int64(N))                                                                                                 

@time for i in 1:10000                                                                                                 
    Base.bitarray_rand_fill!(a, true)  # the original function                                                                              
end                                                                                                                    

@time for i in 1:10000                                                                                                 
    Base.bitarray_rand_fill!(a, rand)  # with passed in random function                                                                             
end                                                                                                                    

@time for i in 1:10000                                                                                                 
    Base.bitarray_rand_fill!(a, true) # the original function                                                                                
end                                                                                                                    

@time for i in 1:10000                                                                                                 
    Base.bitarray_rand_fill!(a, rand)   # with passed in random function                                                                    
end                                   

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bytes allocated is curious, why is this happening? @carlobaldassi

@jperla
Copy link
Author

jperla commented Mar 9, 2014

@JeffBezanson updated to be faster

@JeffBezanson
Copy link
Sponsor Member

This is becoming kind of a tangled mess of definitions.
Definitions like

randbool(r::MersenneTwister, dims::Dims) = rand!(r, BitArray(dims))

are too specific; it looks like they'd apply to any AbstractRNG not just MersenneTwister.

I'm not sure about DefaultGlobalRNG; it introduces its own form of redundancy:

rand(::DefaultGlobalRNG) = rand()
rand() = rand(some_default)

Whatever some_default is could always be used instead of DefaultGlobalRNG. Also in theory (and in practice when people less familiar with the codebase are involved) you could define methods for DefaultGlobalRNG that make it behave differently from the actual default RNG, which would get very confusing.

@jperla
Copy link
Author

jperla commented Mar 11, 2014

Okay I've cleaned up the definitions to be as general as they can be.

I also modified the definition of bitarray_rand_fill to avoid the DefaultGlobalRNG, and avoid code duplication. I'm not sure that using "nothing" is a good idea.

@JeffBezanson
Copy link
Sponsor Member

Well, sooner or later we're going to have to decide how the default RNG is actually specified. We can't put a branch everywhere to call either rand() or rand(rng); that would be crazy.

For example const DefaultRNG = MersenneTwister(...). But I don't think we've decided whether to do this, or which random-related functions should take AbstractRNG arguments (I suppose all of them?).

@jperla
Copy link
Author

jperla commented Mar 11, 2014

Unless I'm missing something, you can't just do "const DefaultRNG = MersenneTwister(...)" because we call directly into C code a lot, which wouldn't be able to use this constant anyway. I'm not sure how best to switch on this, so I put this flag in.

Yes, Stefan said it makes sense for them all to take RNGs.

@JeffBezanson
Copy link
Sponsor Member

The C code doesn't need to use the constant. One can write

rand(r::MersenneTwister) = dsfmt_genrand_close_open(r.state)

const DefaultRNG = MersenneTwister(seed)

rand() = rand(DefaultRNG)

@JeffBezanson
Copy link
Sponsor Member

I guess the only problem is if that has some overhead. If it is too much slower than the dsfmt_gv_genrand functions, I guess we'd have to use your DefaultGlobalRNG approach.

@jperla
Copy link
Author

jperla commented Mar 11, 2014

We can do that (I'm sure you mean rand() = rand(DefaultRNG.state)), but then we won't be using all of the _gv_ functions (which may not matter, or have speed issues)

@jperla
Copy link
Author

jperla commented Mar 11, 2014

Yea

@jperla
Copy link
Author

jperla commented Mar 11, 2014

I'm not crazy about it either, but I'm not sure about the fastest way to do this dispatching with C and julia mixed in (I don't know Julia or C as well as you guys).

It seems like the branching everywhere is unavoidable if we want these fast _gv_ calls where they are.

@jperla
Copy link
Author

jperla commented Mar 11, 2014

I was trying to get a discussion about how to do DefaultRNG going with this WIP: #6110 .

I understand that MersenneTwister is an object with a seed state, but since you won't be passing it to methods anyway, you may want to avoid the duplication of the seed state since it could just go out of sync with the global RANDOM_SEED and seed in the dsfmt_gv_ C code. Then it just becomes an indicator that you're using the dsfmt_gv_ stuff.

@jperla
Copy link
Author

jperla commented Mar 11, 2014

Yea, I'm looking at adding an RNG to sprand() too, and to avoid duplication the best thing to do is either this weird use_default_rng=true with a keyword arg, or the DefaultGlobalRNG in order to avoid duplication.

I think DefaultGlobalRNG will make it easier for all users of rand() in core and in user libraries.

@JeffBezanson
Copy link
Sponsor Member

Do you have any sense of how much faster the _gv_ functions are?

@@ -333,16 +333,21 @@ bitpack{T,N}(A::AbstractArray{T,N}) = convert(BitArray{N}, A)

## Random ##

function bitarray_rand_fill!(B::BitArray)
function bitarray_rand_fill!(B::BitArray, use_default_rng::Bool=true; rng=nothing)
if length(B) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Julia does not specialize on keyword arguments, and therefore the actual type of rng is unknown when the compiler generates the code, thus resulting in big performance penalty.

Random number generating functions lie at the heart of many performance-critical applications, and therefore we should be very cautious about performance issues, and should ensure that no performance penalty is caused by failure of type inference.

I am wondering why you have to put rng as keyword argument. I think you can simply do

function bitarray_rand_fill!(B::BitArray, rng::AbstractRNG)
    ...
end
bitarray_rand_fill!(B::BitArray) = bitarray_rand_fill!(B, default_rng)

In this way, the type of each argument is known at compile time, and as a consequence, we get much more performant codes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's the most straightforward way, but see above for discussions about the problems with default_rng.

Also, we'll have to move AbstractRNG over to another file, since it can't be used in bitarray.jl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see the reason why this doesn't work.

Even if it is really the case, then I would say it is still worth duplicating codes to avoid performance penalty in such a basic function that is going to be used in performance-critical codes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we should first decide how to specify default rng. This would everything much simpler. To test whether use_default_rng in a tight inner loop is performance killer and not a very elegant solution.

@jperla
Copy link
Author

jperla commented Mar 11, 2014

i saw perf tests in the code that mention _gv_, i'll see if those are informative, or build some

@jperla
Copy link
Author

jperla commented Mar 11, 2014

Okay, it's just used in the micro benchmarks. I'll make a perf test and put it into test/perf.

@jperla
Copy link
Author

jperla commented Mar 11, 2014

@JeffBezanson about 50% slower

elapsed time: 1.641044899 seconds (639983688 bytes allocated)                                                          
elapsed time: 2.339002073 seconds (799985952 bytes allocated)                                                          
elapsed time: 1.66162122 seconds (639983688 bytes allocated)                                                           
elapsed time: 2.444654077 seconds (799983688 bytes allocated)                                                          
elapsed time: 1.621061577 seconds (639983688 bytes allocated)                                                          
elapsed time: 2.407146114 seconds (799983688 bytes allocated)  
root# cat random/gv.jl                                                                                                 
# Testing the performance difference of using dsfmt_gv versus explicit randomness                                      
using Base.Test                                                                                                        
using Base.LibRandom                                                                                                   

include("../perfutil.jl")                                                                                              

N_ITER = 10000000                                                                                                      

dsfmt_gv_init_by_array([uint32(42)])                                                                                   
rng = MersenneTwister(uint32(42))                                                                                      

@assert rand(rng) == dsfmt_gv_genrand_close_open()                                                                     

@time for i in 1:N_ITER                                                                                                
    dsfmt_gv_genrand_close_open()                                                                                      
end                                                                                                                    

@time for i in 1:N_ITER                                                                                                
    dsfmt_genrand_close_open(rng.state)                                                                                
end                                                                                                                    

dsfmt_gv_init_by_array([uint32(42)])                                                                                   
rng = MersenneTwister(uint32(42))                                                                                      

@assert rand(rng) == dsfmt_gv_genrand_close_open()                                                                     

@time for i in 1:N_ITER                                                                                                
    dsfmt_gv_genrand_close_open()                                                                                      
end                                                                                                                    

@time for i in 1:N_ITER                                                                                                
    dsfmt_genrand_close_open(rng.state)                                                                                
end                                                                                                                    

@ViralBShah
Copy link
Member

How does this stand now? Would be nice to get it merged.

@ViralBShah
Copy link
Member

Recent RNG changes by @rfourquet have implemented this:

julia> rand(MersenneTwister(5), Bool, 5, 5)
5x5 Array{Bool,2}:
 false   true  false  false   true
 false   true  false   true  false
  true   true   true   true  false
 false   true  false  false  false
  true  false  false  false  false

julia> randbool(MersenneTwister(5), 5, 5)
5x5 BitArray{2}:
 false  false  false  false   true
  true  false  false  false  false
  true  false   true   true   true
 false  false  false  false   true
 false  false  false  false  false

@ViralBShah ViralBShah closed this Nov 22, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants