# Random Number Generator




## What does it mean to generate random numbers? Why do we need it?

There are many cases whether we need to generate random numbers or draw random values from distributions.

- random draw from a sample: pick a lottery number; draw survey samples
- resample a dataset (e.g., for bootstrapping):
- do numerical integration: 
- draw values from distributions to simulate a distribution (when do we use it?)


## Is it truly *random*? 

- truly random: you cannot repeat it
  - not good for reproducibility
- pseudorandom random numbers 
  - (pseudo: 偽、擬，pseudo random背後並非真正random，只是)
  - use an algorithm to generate numbers
  - usually requires a *seed* to recursively generate numbers


## random number generation vs. random number generator (RNG)

- *random number generator* (RNG),  pseudorandom number generator (PRNG). 

- Mersenne Twister algorithm 
  - has the root from the Merseen (*[mer-'sen]? well, it's French*) prime number

- xoshiro algorithm
  - based on xor (*exclusive or*; "xo"), shift ("shi"), and rotation ("ro") functions

- Lehmer algorithm


In [1]:
using Random                     # in the base, no need to "add"

myrng1 = MersenneTwister(1234);  # create a RNG that may be used for task-specific purposes; "1234" is the seed
myrng2 = Xoshiro(1234);          # new to Julia 1.7; better; use Xoshiro256++ algorithm

# using Pkg; Pkg.add("StableRNGs")
using StableRNGs
myrng3 = StableRNG(1234)         # based on LehmerRNG 

StableRNGs.LehmerRNG(state=0x000000000000000000000000000009a5)

The line `myrng1 = Xoshiro(1234)` creates a random number generator (RNG) with the seeding number `1234`, but the line itself does not put the RNG in effect. There are different ways to put RNG in effect, each has its own purposes.

### Put random seeds in "global" scope using `Random.seed!(integer_here)`

Here, "global" means it is effective throughout the script.

In [2]:
Random.seed!(1234)    # use Julia's default RNG
Random.seed!(myrng1)  # use myrng1 defined above
Random.seed!(myrng2)  # use myrng2 defined above
Random.seed!(MersenneTwister(1234)) |> display  # explicit about algorithm
Random.seed!(myrng1, 5678)  # override the seed number of myrng1

MersenneTwister(0x3febcdbc4333b315e1cded9887c4142)

MersenneTwister(5678)

###### lecture notes:

Which is Julia's default algorithm? How do you figure it out?

In [3]:
# Let's see some examples.

Random.seed!(123)  # seed the global RNG (affect the global scope)

a1 = rand(4)    # a vector of random numbers from uniform(0,1)
a2 = rand(4,1)
a3 = rand(4,2)  # a matrix of random numbers from uniform(0,1)
a4 = randn(4,3) # a matrix of random numbers from N(0,1)

4×3 Matrix{Float64}:
  0.124124   -1.17597    0.518744
  0.0321145  -0.138399  -0.525596
  0.232291   -0.790106   1.00069
 -1.26531     1.92639   -1.24574

###### side notes:

**Why there is an exclamation mark ("!") on some of the functions?**

- Function names ending with "!" means that the argument of the function would be modified by the result of the function.
- Some functions have both the "!" and non-"!" versions.


In [None]:
# Example

list1 = rand(4)
@show list1

aaa stop 1


sort(list1)  # it returns the sorted list without modify "list1"
@show list1

aaa stop 2

sort!(list1) # it returns the sorted list and saves the results in "list1"
@show list1

###### lecture notes:
- show `size(a2)`, `size(a2,1)`, `b1, b2 = size(a2)`, etc., introduce `typeof()`
  - important for debugging
  ```julia
a1 = rand(4) 
a2 = rand(4,1) 
# the numbers are not the same; so.. add RNG and comapre, still not the same; use typeof() to check
  ``` 


- global seed vs. task-specific seed; why global random seed may not be enough for reproducibility
  - careful about the "shared" RNG 

In [27]:
# It would be better to show this script in VScode.
# println("#############")

using Random
Random.seed!(123)

# axx = rand(10) # 亂入, which runs on the global seed

a1 = rand(2) 
a2 = randn(2) 

@show a1;
@show a2;

aaaa stop 1



# bxx = rand(10) # 亂入, which runs on the global seed

b1 = rand(MersenneTwister(123), 2)
b2 = randn(MersenneTwister(123), 2)

@show b1;
@show b2;

aaaaa stop 2



myrng = Xoshiro(2333)   # for task-specific purpose; 重設，就還原

# cxx = rand(11)          # 亂入, which runs on the global RNG but not run on myrng

c1 = rand(myrng, 2)
c2 = randn(myrng, 2)

@show c1; 
@show c2; 

c1 = [0.2923977715754691, 0.4166292994124188]
c2 = [0.2609962536607125, -0.2163406590182754]


### Class Exercises

- Write a code to convert `a1` (a vector) to a matrix (you may have to google the method). 

- Write code to draw a set of 10,000 random numbers that is uniformly distributed in (-2,3). (Hint: Stretch $U(0,1)$ to fit the bound of $U(-2,3)$.)  Show the mean and the standard deviation of the series. What is the theoretical mean and standard deviation of a $U(-2,3)$? Are your answers close to the theoretical values?

- Write code to draw a 10x2 matrix of random numbers from $N(2,3)$ which is a normal distribution with mean=2 and variance=3:

  - use `randn()`; (Hint: `randn()` generates N(0,1) random variables; you have to scale it to the appropriate mean and variance.)
  - use `rand()`. (Hint: `rand()` could take distributions as arguments. See the help file.)

Now that you have generated random numbers from a normal random variable, let's see how the generated values match the true distribution by drawing histograms.

In [1]:
# using Pkg; Pkg.add(["Distributions", "Plots", "Interact", "WebIO", "StatsPlots", "LaTeXStrings"])
using Distributions, Plots, Interact, WebIO, StatsPlots, LaTeXStrings

d = Normal(-1,2)

@manipulate for N in (100:100:5000)    
    histogram(rand(d,N), normalize=true,  bins=100)  # hjw
    plot!(d)
end

# Question: What if I want to show the "exact" same graphs everytime I run the code?

# Other Comments

- Don't assume random numbers will be the same between Julia versions. See the [doc](https://docs.julialang.org/en/v1.5/stdlib/Random/) here. That is, if you apply the same code `myrandom = rand(MersenneTwister(123), 10)` to different versions of Julia, you'll get different `myrandom`, even if you've specified the local RNG. This may cause problems because you may not be able to reproduce the exact same results of your program after your Julia is upgraded. So, at least you have to document your version of Julia in your results. (BTW, different OS, different types of CPUs, may also have influences on numerical details. Documentation is important.)


- If you want random numbers to be the same between versions use [StableRNGs](https://juliahub.com/ui/Packages/StableRNGs/fu6AW/1.0.0). For instance, `rng = StableRNG(seed::Integer)`.

  - ```julia
using StableRNGs  
rng = StableRNG(123)
A = randn(rng, 10, 10) # instead of randn(10, 10)
@test inv(inv(A)) ≈ A  # if not random, may not be inverted because of deficient rank
x = [1.1, 2.2, 3.1, 4.5, 5.3, 6.1, 4.4, 3.2, 2.9, 9.0] # any vector of 10
@test A \ (A*x) ≈ x   # another test of RNG
```

- StableRNG is currently an alias for LehmerRNG, and implements a well understood linear congruential generator (LCG); an LCG is not state of the art, but is fast and is believed to have reasonably good statistical properties.


- The StableRNG is not as good as MersenneTwister or Xoshiro, but it is simple and less pron to problems.


- Starting from Julia 1.7, the default RNG is switched from from MersenneTwister to Xoshiro (a much faster and easier to parallelize pseudo RNG; also has better statistical properties). Julia 1.7 will also have a different RNG object per task, which will also change the stream of random numbers. 


- Also note that due to performance improvements and improvements to numerical accuracy, exact bitpatterns for floating point results are not guaranteed between versions.


[//]: # "If students have learned Stata, ask some of them to do a presentation on DataFrames vs. Stata, also introducing DataFramesMeta (and something like that). Resources [here](https://dataframes.juliadata.org/stable/man/comparisons/), [here](https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_stata.html), [here](https://ahsmart.com/assets/pages/data-wrangling-with-data-frames-jl-cheat-sheet/DataFramesCheatSheet_v0.21_rev3.pdf), and [here](https://towardsdatascience.com/going-from-stata-to-pandas-706888525acf)."
