<img src="https://user-images.githubusercontent.com/50221806/85330638-a467b280-b489-11ea-8e64-7e7390afea32.png" style="float: left" width=500 />

# Programming for Data Analysis
***
## Assignment - numpy.random

#### Create a notebook that explains the use of the NumPy package including detailed explanations of at least five of the distributions provided for in the package.

#### There are four distinct tasks to be carried out in your Jupyter notebook
<ol>
    <li>Explain the overall purpose of the package.</li>
    <li>Explain the use of the “Simple random data” and “Permutations” functions.</li>
    <li>Explain the use and purpose of at least five “Distributions” functions.</li>
    <li>Explain the use of seeds in generating pseudorandom numbers.</li>
</ol>

## NumPy and numpy.random
***
### NumPy Background

`NumPy`, short for Numerical Python, is considered a foundational Python package for numerical computations [1]. The NumPy package was established by Travis Oliphant in 2005. It was meant as a successor to two earlier scientific Python libraries, Numeric and Numarray, with the goal of bringing a fragmented scientific computing community together around a single framework [2][3].

NumPy is an open-source external Python module that provides common mathematical and numerical routines in pre-compiled, fast functions for manipulating large arrays and matrices of numeric data [4]. Numpy does not come pre-installed with the Python standard library but can be installed using package manager programs such as `pip`. Alternatively NumPy does come pre-installed with the `Anaconda distribution` and it is recommended as the simplest way to get started for scientific computing and data science [5].

As it is an external package NumPy must be imported with an import statement in order for all of it's functions to be accesible to you. This can be done like the code shown below. The NumPy package is imported **as np** to save time when writing multiple commands, now you only need call np.x instead of numpy.x. The **numpy as np** convention is used to ensure other users reading the code understand it [4][6]. 

```python
import numpy as np

x = np.random.default_rng()
```

If you only require a function or sub-package from NumPy you can import that package directly into the current Python namespace using a from statement as below. This allows you to call the function without having to call np.package.function. You can also import the function directly as seen in the second example below [4][6].

```python
from numpy import random

x = random.default_rng()
```
or

```python
from numpy.random import default_rng

x = default_rng()
```

### Numpy's Random package
***
NumPy's random sub-module is designed to generate pseudo-random numbers and sequences from different statistical distributions [7]. The numpy.random package has functions for efficiently generating arrays of sample values.\
\
In the below example the np.random.random function is used to create a single pseudorandom number between 0 and 1 and assign it to the variable x. It can also be used to efficiently create an array of values, as in the second example below we create an array with three columns and 3 rows of values and assign it to the variable y. This is an improvement on the Python built-in random module which only samples one value at a time [8].

In [1]:
import numpy as np

x = np.random.random(size=1)
y = np.random.random(size=(3,3))

print(x)
print(y)

[0.87079218]
[[0.25289999 0.84105557 0.60464578]
 [0.5603116  0.83397931 0.00876605]
 [0.3812025  0.12068529 0.43782015]]


#### Pseudo random number generation
***
It is extremely difficult for computers and computer programs to generate truly random numbers. Instead what programs do, including NumPy's random module, is they create what are called `pseudo random numbers`. These programs use algorithms with defined deterministic behaviour to generate a value or range of values that seem random to an observer [8][9].\\

NumPy's recommended Pseudo Random Number Generator (PRNG), `Generator`, uses O’Neill’s permutation congruential generator algorithm, `(PCG64)`, as the default method for generating random numbers [10]. Legacy versions of numpy.random as well as python's `stdlib random module` use the Mersenne Twister algorithm, `(MT19937)`[11]. Historically NumPy had a strict backwards compatibility policy for it's random number generation functions. This restricted upgrades and improvements to it's processes as any changes had to comply to this policy. More recent releases are not strictly compatibile with previous versions allowing changes such as the move to the PCG64 algorithm [12]. The PCG family of algorithms are considered faster, more efficient and less predictable than most other generators including the Mersenne Twister [13].//

As show below it is still possible to access the Mersenne Twister algorithm to generate random numbers but in the latest version of numpy the default algorithm is PCG64. When default_rng and PCG64() are used with the same seed value they produce identical outputs.


In [14]:
from numpy.random import Generator, PCG64, MT19937, default_rng

sd = 1234 # create common seed value

z = default_rng(sd) # Initialise a random number generator using numpy's default RNG
x = Generator(PCG64(sd)) # Initialise Generator object using PCG64 algorithm
y = Generator(MT19937(sd)) # Initialise Generator object using Mersenne Twister algorithm

print(z.random(size=3))
print(x.random(size=3))
print(y.random(size=3))

[0.97669977 0.38019574 0.92324623]
[0.97669977 0.38019574 0.92324623]
[0.12038356 0.40370142 0.87770263]


#### Seed
***




## References
***

1. McKinney, W., (2018). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Second Edition. p85
2. McKinney, W., (2018). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Second Edition. p86  
3. numpy.org, (2020). About Numpy. https://numpy.org/doc/stable/about.html  
4. M. Scott Shell, (2019). An introduction to Numpy and Scipy, p2 https://sites.engineering.ucsb.edu/~shell/che210d/numpy.pdf  
5. numpy.org, Installing Numpy. https://numpy.org/install/  
6. numpy.org, (2020), NumPy: the absolute basics for beginners, https://numpy.org/devdocs/user/absolute_beginners.html  
7. numpy.org, (2020), Random sampling (numpy.random), https://numpy.org/doc/stable/reference/random/index.html?  
8. McKinney, W., (2018). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, Second Edition. p118
9. w3schools, (2020). w3schools.com. https://www.w3schools.com/python/numpy_random.asp
10. numpy.org, (2020). Random Generator. https://numpy.org/doc/stable/reference/random/generator.html
11. numpy.org, (2020). Legacy Random Generation. https://numpy.org/doc/stable/reference/random/legacy.html#numpy.random.RandomState
12. numpy.org, (2019). Random Number Generation Policy. https://numpy.org/neps/nep-0019-rng-policy.html
13. pcg-random.org, (2018). PCG, A Family of Better Random Number Generators. https://www.pcg-random.org/
    
    
   