# Programming for Data Analysis Assignment 2019

## Problem Statement
The assignment concerns the `numpy.random` package in Python.
Explain the use of the package including detailed explanations of at least five of the distributions provided for in the package.
There are four distinct task to be carried out in the Jupyter Notebook.
1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions. 
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers.


## 1. Explain the overall purpose of the package.

### some quick notes here.
- what is the stated aim of the package?
- why would you want to generate random numbers?
- pythons built-in `random` module can also generate random numbers so what is the difference.
- how are random numbers generated

## What is NumPy.random

`numpy.random` is a sub-package or module of the numpy package that generates random numbers. There are many different ways to generate random numbers which are in fact only pseudorandom numbers as computers do not actually generate random numbers.

Python has a built-in **random** module which implements pseudo-random numbers for various distributions. See [python docs/random library](https://docs.python.org/3/library/random.html). This built-in`random` module only samples one value at a time.
`numpy.random` is a sub-package or module of the NumPy package that supplements the `random` module's functions with functions that can efficiently generate arrays of sample values from various probability distributions.(rather than one value at a time). It is much much faster and more efficient.


The numpy.random package is used to generate random numbers (or pseudo random numbers) so I guess the question first is why would you want to generate random numbers!

NumPy is a data manipulation module for python with tools that operate on arrays of numbers. Base python does not have an array data structure so NumPy allows the user to create and manipulate arrays of numbers.
NumPy is one of the most important packages for numerical and scientific computing in python and many packages that are used for data analytics are based on this package. 

NumPy functions can operate on large arrays of numbers and are therefore very useful for statistics and data analytics. NumPy is used for manipulating data in datasets that can be very large.

NumPy has a `random` package which can be used to create random samples from a dataset and this is something that is used frequently in data analysis and machine learning projects. Being able to randomly select elements also has many used in applications such as gaming.

Numpy.random allows you to create random samples from a dataset. I'm guessing that the `numpy.random` package works on the multi-dimensional `ndarray` arrays that are the key data structure in the `NumPy` package and would therefore be far more efficient and faster than the raw python package.  

Given an input array of elements, random functions will allow you to select a random sample of elements from the array.

You can also create random samples with the NumPy random package which allows you to generate arrays of numbers from a particular probability distribution and this could be useful for creating a toy dataset in the absence of a real dataset or for testing or even just for learning!

Python has a built-in module called `random` which is part of the base python and this module provides a number of tools for working with random numbers. NumPy's random module is aimed more at generating random series of data rather than the scalar values which are generated by the python random module.

Pythons built-in `random` module implements pseudo-random numbers for various distributions. See [python docs/random library](https://docs.python.org/3/library/random.html). However this built-in`random` module only samples one value at a time and probably uses loops to generate sequences of random numbers.   
`numpy.random` provides functions that can efficiently generate arrays of sample values from various probability distributions rather than one value at a time. It is much much faster and more efficient.

### numpy.random functions
There are many many functions in the numpy.random package that relate to generating random numbers from different probability distributions and therefore it is not really possible to know all of these without referring to the documentation. 
Plots can illustrate the difference between different probability distributions. 
see random generation, random state, set state, get stats

### Generating random numbers
Numbers generated by either the built-in python `random` module or the `numpy.random` package are not actually random at all but *pseudorandom* numbers. This is because computers cannot actually generate random numbers but what they do is generate sequences that look like random numbers, so that someone else wouldn't be able to predict the next number generated without a key piece of information which is called the **seed**. If you know the seed then you can predict the next number to be generated in a sequence and therefore the numbers generated cannot be actually random as such.

The seed is typically the time (to the microsecond) on the computer when the code was run. The seed is decided when you import the function into jupyter or python script. Sometimes you may want to recreate the exact same sequence of random numbers, maybe when testing code or demonstrating something or teaching and you want the code to be reproduced. If instead of letting the seed default to the system time, you provide or **set** the seed, then the exact sequence can be reproduced.

Random (pseudo-random) numbers are drawn from a probability distribution. The numbers generated depend on the *seed* used and are generated according to some deterministic algorithm from that seed. 



### First a quick overview of the NumPy package

`NumPy` is a specialist Python package that works well with large arrays of numbers. Raw Python's built-in matrix operations are quite inefficient when compared to NumPy's capabilities. NumPy can work easily with multi-dimensional arrays. NumPy does operations on matrices in a far more efficient  way than using raw python matrix operations. 
Matrix operations are very commonly used when analysing data.

While NumPy provides a computational foundation for working with numbers, it does not have the modelling or scientific functionality of other computational packages in Python but these other packages do use NumPy's array objects. NumPy is typically used through other packages. 

The NumPy package has many algorithms for dealing with numerical operations on arrays. Operations can be performed on NumPy arrays very quickly. 

NumPy can also be used to simulate data. While csv data is commonly imported into NumPy for analysis or into a package such as pandas that uses NumPy, it is often handy to be able to simulate data to analyse before the real data may be collected or available.

The [numpy quickstart tutorial](https://numpy.org/devdocs/user/quickstart.html) provides a good overview of the NumPy package.

[what is NumPy](https://docs.scipy.org/doc/numpy/user/whatisnumpy.html)

- NumPy is short for Numerical Python. 
- NumPy is one of the most important foundational packages for numerical computing in Python. 
- NumPy's main data structure is an `ndarray` - an a homogenuous multidimensional array. An *ndarray* is also known by the alias *array*.
- A NumPy array is like a table of elements that are all of the same type. The elements which are usually numbers are indexed by a tuple of positive integers. Axes refer to the dimensions of a NumPy array.

- NumPy has many mathematical functions that avoid the need to use loop as they work on entire arrays of data. 
- NumPy is designed for efficiency on large arrays of data which is partly due to the way NumPy stores data in a contiguous block of memory. NumPy's algorithms can operate on this memory without any type checking. NumPy arrays use much less memory than python's own built-in sequences. 
- NumPy arrays use *vectorisation*. This is where batch operations can be performed on arrays without using loops. 
- arithmetic operations are carried out *elementwise* on an array and result in a new array being created.
- some operations on arrays actually modify an array in place and do not create a new one. `+=`, `-=`
 


#### NumPy's ndarray object. 
NumPy's *ndarray* is an N-dimensional or multi-dimensional array object. It is a fast and flexible container for larger datasets in Python. Mathematical operations can be performed on entire arrays using the same kind of syntax to similar operations on scalar values.  
An ndarray can be created using the Numpy `array` function which takes any sequence-like objects such as lists, nested lists etc. Other NumPy functions create new arrays such as `zeros`, `ones`, `empty`, `arange`, `full` and `eye` among others. All these functions create an array object. 

Some key points about the ndarray objects include:
- The data in an *ndarray* must be homogeneous, that is all of it's data elements must of the same type.
- Arrays have a *ndim* attribute for the number of axes or dimensions of the array
- Arrays have a *shape* attribute which is a tuple that indicates the size of the array in each dimension.
The length of the *shape* tupple is the number of axes that the array has. 
- Arrays have a *dtype* attribute which is an object that describes the data type of the array.
- The `size` attribute is the total number of elements in the array.


## 2. Explain the use of the “Simple random data” and “Permutations” functions.

See [Random Sampling using numpy.random](https://docs.scipy.org/doc/numpy-1.16.1/reference/routines.random.html#random-sampling-numpy-random) of the numpy documentation which for NumPy version 1.16 has 4 main sections, of which Simple random data is the second section.

- Simple random data
- Permutations
- Distributions
- Random Generator


### Simple random data

In [None]:
! ls  # checking to see if the image is here in the folder

Here I am just experimenting with placing images in a Jupyter notebook. Using markdown instead of the html does not seem to allow resizing of the image.  

`![Simple random data](/Simple_random_data.png "title")`

Here is the png image using html resized using the attributes 
<img src="/Simple_random_data.png" width="250" alt="Simple random data" />


<img src="/Simple_random_data.png" width="400" alt="Simple random data" />


The aim of the assignment is to be able to give a good overview of the numpy.random functions in your own words without rehashing the documentation. First I will start by trying out some of the functions. All of the functions return random value or values in some shape or form. 
As the numpy package is being imported using the alias `np`, the numpy.random functions can be called using `np.random.`.
I will go down through the simple random data functions and see how they differ from one another. A first glance at the functions in the documentation show that four of the functions return similar output. 
These are the following four functions which all return random floats in the half-open interval `[0.0,1.0)`

- `random_sample([size])` returns random floats in the half-open interval `[0.0, 1.0)`  
- `random([size])` also returns random floats in the half-open interval` [0.0, 1.0)`.
- `ranf([size])` again returns random floats in the half-open interval `[0.0, 1.0)`.
- `sample([size])` returns random floats in the half-open interval `[0.0, 1.0)`.

I will look at these and see how they differ.



In [2]:
import numpy as np
import numpy.random

## The `numpy.random.rand` function


The first function is `numpy.random.rand`. As the numpy package has been imported as the alias `np`, this function can be called as `np.random.rand`.
The function returns random values in a given shape. In most cases this is an array with the dimensions you give it using you just call the function without giving it any arguments in which case a scalar value is returned. The shape is a tuple that shows the size of each dimension

The returned array will contain a sequence of random values in a given shape according to the parameters you supplied. The random values will be drawn from the uniform distribution over the interval of `[0,1)` This means that values can be any number from 0.0 up to 1 but not including 1. 

To use this function you can simply pass some integers as arguments. These integers relate to the dimensions of the outputted array. If no argument is supplied then just a single float is returned. This is similar to the random function in the python random package.
If you do supply arguments, they should all be positive integers. 
The number of elements returned will depend on the number of arguments you provide. If you supply just a single integer as an argument, then the array that is returned is a 1 one-dimensional array containing a scalar.

You get an array so that every row has the same number of elements, every column has the same number of elements. There can be different number of rows as columns.



### create n-dimensional arrays of numbers using np.random.rand

In [34]:
rand = np.random.rand(1)  # a single array with 1 element

print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")

This array 
[0.88094131] has 1 dimension(s), each dimension is of (1,) shape and the array has a total number of 1  elements


In [35]:
rand = np.random.rand(3)  # a single array with 3 elements
print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")

This array 
[0.34358959 0.72348845 0.00428284] has 1 dimension(s), each dimension is of (3,) shape and the array has a total number of 3  elements


In [33]:
rand =np.random.rand(3,2) ## array with 3 rows and 2 columns

print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")

This array 
[[0.52333099 0.29871224]
 [0.6886532  0.91809144]
 [0.47378375 0.83704568]] has 2 dimension(s), each dimension is of (3, 2) shape and the array has a total number of 6  elements


Next we can create an array (the outer []) which contains  3 inner arrays where each of these 3 arrays have 2 elements.
The number of elements is the 6 (3 times 2)

In [37]:
rand =np.random.rand(1,2,3)

print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")


This array 
[[[0.71339858 0.23412145 0.22737493]
  [0.4300418  0.48989549 0.64521964]]] has 3 dimension(s), each dimension is of (1, 2, 3) shape and the array has a total number of 6  elements


In [38]:
rand =np.random.rand(4,1,2)
print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")

This array 
[[[0.81304438 0.46223173]]

 [[0.98140771 0.63803047]]

 [[0.00254183 0.77246516]]

 [[0.34976722 0.52527645]]] has 3 dimension(s), each dimension is of (4, 1, 2) shape and the array has a total number of 8  elements


In [42]:
np.random.rand()  ## without any arguments, a scalar is returned


0.9977713648973062

In [20]:
rand1.dtype

dtype('float64')

In [9]:
import random
random.random()

0.4755400083182595

In [39]:
rand = np.random.rand(2,2,2)  # a 3-dimensional array
print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")

This array 
[[[0.33845413 0.62059565]
  [0.96648479 0.55447961]]

 [[0.32152959 0.29722913]
  [0.78100381 0.5256415 ]]] has 3 dimension(s), each dimension is of (2, 2, 2) shape and the array has a total number of 8  elements


In [40]:
rand = np.random.rand(2,4,3,5)  # a 4 dimensional object
print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")

This array 
[[[[0.89731607 0.87895025 0.40603953 0.42666234 0.18024164]
   [0.64002035 0.52546547 0.63841065 0.25634664 0.40607073]
   [0.3705635  0.10193452 0.6748715  0.37995317 0.95465985]]

  [[0.52706375 0.36221299 0.47312481 0.33069046 0.5870162 ]
   [0.79813881 0.99343443 0.38498524 0.46679416 0.09165379]
   [0.39248591 0.3092447  0.56344326 0.89054391 0.0822224 ]]

  [[0.95759949 0.46000352 0.44129769 0.1399304  0.73881042]
   [0.09282019 0.90698317 0.81524609 0.25985015 0.70798261]
   [0.57522708 0.67117913 0.52426537 0.52990002 0.13490705]]

  [[0.46682989 0.20949353 0.68535718 0.69087224 0.38189439]
   [0.98833803 0.67508315 0.84745168 0.64524842 0.14192874]
   [0.39452834 0.9294604  0.77681828 0.456591   0.99474166]]]


 [[[0.04361531 0.78480648 0.65380604 0.01157926 0.79559855]
   [0.83773446 0.06284048 0.1923652  0.80726277 0.90917668]
   [0.70060176 0.90808602 0.66840433 0.84330836 0.72062868]]

  [[0.15540346 0.15252659 0.45052818 0.79727483 0.71088602]
   [0.58861008 0

`np.random.rand(2,4,3,5)` returns a 4 dimensional object. This is a 4-dimensional array where the first axis has 2 element, next axis has 4 , next has 3 and next has 5 so this would produce 2 * 4 * 3 * 5 random numbers. 
s.

## The `numpy.random.randint` function
Next up is the [numpy.random.randint()](https://docs.scipy.org/doc/numpy-1.16.1/reference/generated/numpy.random.randint.html#numpy-random-randint) function which will return random integers from the discrete uniform distribution in the interval from low (inclusive) to high (exclusive). 

There are four possible parameters you can give to this function but at least one is required. If only one parameter is supplied then this will generate a single integer from 0 up to that number minus 1 so `np.random.randint(10)` would result in a single integer that could be any integer from 0 up to 9. If a second integer is supplied to the function then the first integer is treated as the low parameter and the second is treated as the high parameter. `np.random.randint(5,10)` would output any integer between 5 and 9.

The first two relate to the range of values that could be drawn where low is the lowest integer to draw from the distribution, high is the integer after the last possible integer that could be drawn. 
The next numeric parameter is for the size of the resulting array. This can be a single integer or a tuple of integers for higher dimensions. This will determine the shape and size of the output. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn.

If an argument is provides for the size, then a  size-shaped array of random integers from the appropriate distribution will be returned.


The last possible argument is an optional `dtype` parameter which determines the dtype of the integers in the resulting array. This is not always required as the default value is `np.int` 

The `random.random_integers` function is similar to `random.randint`, only for the closed interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.
Note that this now deprecated and 

### Create n-dimensional arrays of random numbers using np.random.randint


In [44]:
randint1d = np.random.randint(10, size=(2,2))
print(randint1d)
randint1d.ndim
randint1d.shape
randint1d.dtype

[[2 0]
 [9 2]]


dtype('int64')

In [None]:
print(f"This array \n{randint1d} has {randint1d.shape} dimension(s)")
      
print(np.random.randint(10))
print(np.random.randint(10))
print(np.random.randint(10))

In [51]:
array1 = np.random.randint(5,10,2)
print(f"This array \n{array1} has {array1.ndim} dimension(s), each dimension is of {array1.shape} shape and the array has a total number of {array1.size}  elements")

This array 
[7 6] has 1 dimension(s), each dimension is of (2,) shape and the array has a total number of 2  elements


In [52]:
array1 = np.random.randint(5,10,3)
print(f"This array \n{array1} has {array1.ndim} dimension(s), each dimension is of {array1.shape} shape and the array has a total number of {array1.size}  elements")

This array 
[8 8 5] has 1 dimension(s), each dimension is of (3,) shape and the array has a total number of 3  elements


In [53]:
array1 = np.random.randint(5,10,(3,2))
print(f"This array \n{array1} has {array1.ndim} dimension(s), each dimension is of {array1.shape} shape and the array has a total number of {array1.size}  elements")

This array 
[[5 8]
 [5 6]
 [6 9]] has 2 dimension(s), each dimension is of (3, 2) shape and the array has a total number of 6  elements


In [54]:
array1 = np.random.randint(5,10,(2,3))
print(f"This array \n{array1} has {array1.ndim} dimension(s), each dimension is of {array1.shape} shape and the array has a total number of {array1.size}  elements")

This array 
[[6 6 8]
 [8 7 6]] has 2 dimension(s), each dimension is of (2, 3) shape and the array has a total number of 6  elements


If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. So here there are 2 times 3 times 4 = 24 samples drawn.

In [55]:
array1 = np.random.randint(5,10,(2,3,4))
print(f"This array \n{array1} has {array1.ndim} dimension(s), each dimension is of {array1.shape} shape and the array has a total number of {array1.size}  elements")

This array 
[[[9 5 9 7]
  [5 9 6 5]
  [6 5 8 6]]

 [[7 5 9 6]
  [6 6 8 5]
  [7 8 5 8]]] has 3 dimension(s), each dimension is of (2, 3, 4) shape and the array has a total number of 24  elements


In [48]:
print(np.random.randint(5,10))
print(np.random.randint(5,10))
print(np.random.randint(5,10))
print(np.random.randint(5,10))

6
6
5
5


In [49]:
print(np.random.randint(5,10,2)) 
print(f"This array \n{rand} has {rand.ndim} dimension(s), each dimension is of {rand.shape} shape and the array has a total number of {rand.size}  elements")
print(np.random.randint(5,10,3))
print(np.random.randint(5,10,4))
print(np.random.randint(5,10,5))

[8 6]
[6 9 7]
[9 9 7 6]
[9 5 5 7 5]


#### one-dimensional array

In [7]:
randint = np.random.randint(5, size=10)
print(f"This array of random integers \n{randint} has {randint.ndim} dimension(s), each dimension is of {randint.shape} shape and the array has a total number of {randint.size}  elements")

This array of random integers 
[4 3 0 3 3 0 3 0 2 3] has 1 dimension(s), each dimension is of (10,) shape and the array has a total number of 10  elements


In [8]:
np.random.randint(5, size=10)

array([0, 1, 2, 4, 0, 4, 0, 4, 2, 1])

#### a two-dimensional array of random integers

In [6]:
randint = np.random.randint(5, size=(2, 4)) # Generate a 2 x 4 array of integers between 0 and 4, inclusive:
print(f"This array of random integers \n{randint} has {randint.ndim} dimension(s), each dimension is of {randint.shape} shape and the array has a total number of {randint.size}  elements")

This array of random integers 
[[1 3 0 0]
 [2 1 2 0]] has 2 dimension(s), each dimension is of (2, 4) shape and the array has a total number of 8  elements


In [10]:
randint = np.random.randint(5, size=(2, 3, 4)) # Generate a 3 dimensional array of integers
print(f"This array of random integers \n{randint} has {randint.ndim} dimension(s), each dimension is of {randint.shape} shape and the array has a total number of {randint.size}  elements")

This array of random integers 
[[[0 3 4 4]
  [4 2 3 4]
  [3 2 2 1]]

 [[1 3 2 1]
  [2 2 2 1]
  [1 4 1 3]]] has 3 dimension(s), each dimension is of (2, 3, 4) shape and the array has a total number of 24  elements


In [11]:
randint = np.random.randint(5, size=(2, 3, 4,5)) # Generate a 4 dimensional array of integers
print(f"This array of random integers \n{randint} has {randint.ndim} dimension(s), each dimension is of {randint.shape} shape and the array has a total number of {randint.size}  elements")

This array of random integers 
[[[[3 1 1 3 0]
   [0 2 3 4 0]
   [2 1 2 3 2]
   [0 0 0 4 0]]

  [[3 1 3 1 3]
   [2 1 4 0 2]
   [4 4 3 0 2]
   [1 3 2 4 0]]

  [[1 2 4 1 0]
   [1 0 3 2 3]
   [3 0 3 0 2]
   [0 2 2 2 0]]]


 [[[3 4 2 4 0]
   [2 3 4 0 2]
   [0 2 1 2 3]
   [0 2 4 0 0]]

  [[4 2 3 4 0]
   [4 4 1 0 3]
   [0 3 2 2 4]
   [3 4 1 1 1]]

  [[3 1 4 4 4]
   [2 2 0 2 1]
   [1 3 0 2 3]
   [2 2 1 3 0]]]] has 4 dimension(s), each dimension is of (2, 3, 4, 5) shape and the array has a total number of 120  elements


In [66]:
#np.random.random_integers(5)
np.random.randint(1, 5 + 1)
# DeprecationWarning: This function is deprecated. Please call randint(1, 5 + 1) instead

5

`random_integers` was similar to randint, only for the closed interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.
This has now been deprecated. come back to this

## The numpy.random.random_sample() function

This function will return random floats in the half-open interval `[0.0, 1.0).` The results are from the “continuous uniform” distribution over the stated interval. 

You can specify a different half-open interval `[a,b)` where b > a, by multiplying the output of `random_sample` by (b-a) and adding a to the result


This function takes an optional size parameter which can be an integer or a tuple of integers. This will determine the shape of the output. If no parameter is provided, the default is None and this results in a single value.


In [78]:
np.random.random_sample() # returns a single value in the half-open interval `[0.0,1.0)`

0.11094619250190951

In [69]:
type(np.random.random_sample())

float

In [70]:
np.random.random_sample((5,)) # returns 5 values in the half open interval `[0.0,1.0]`

array([0.90269139, 0.97004898, 0.17451055, 0.9579638 , 0.80113805])

In [79]:
np.random.random_sample((5,2)) # returns a 5-by-2 array in the half open interval `[0.0,1.0]`

array([[0.74286262, 0.47160249],
       [0.61958262, 0.59635121],
       [0.78786201, 0.05145853],
       [0.10133839, 0.57374399],
       [0.17563653, 0.12555639]])

In [80]:
4 * np.random.random_sample((3, 2)) - 4 # returns a 3-by-2 array of random numbers from [-4, 0). 

array([[-0.87248641, -0.81824422],
       [-2.62801768, -2.8198481 ],
       [-0.77743175, -3.30258601]])

In [77]:
np.random.sample(2) # returns 2 values in the half open interval `[0.0,1.0]`

array([0.05443218, 0.52082727])

## the numpy random.random function.
This function returns random floats in the half-open interval `[0.0, 1.0)`. The results are from the “continuous uniform” distribution over the stated interval. This seems to be similar to the `numpy.random_sample()` function above so what is the difference?



In [83]:
type(np.random.random())

float

In [84]:
type(np.random.random_sample())

float

In [None]:
np.random.random_sample()

type(np.random.random_sample())

np.random.random_sample((5,)

# 3. Explain the use and purpose of at least five “Distributions” functions.

show how random numbers can be drawn from different probability distributions. use some plots and statistics here to show the differences between the types of random numbers that would be generated from the different probability distributions.

# 4. Explain the use of seeds in generating pseudorandom numbers.

I will show how to use a seed to generate a pseudorandom sequence of numbers that is reproducible.
First of all explain what a **seed** is in this context. 
Show that setting a seed will produce the same sequence of random numbers each time.
This means that the functions follow a very particular set of instructions to generate the so-called random numbers which is what **deterministic** means.

See the section https://docs.scipy.org/doc/numpy-1.16.0/reference/routines.random.html#random-generator



Random Generator.
- `RandomState([seed])`
- `seed([seed])`
-` get_state()`
- `set_state(state)`


`getstate` is used to capture the state of random at any time, returns a tuple representing the internal state of the generator. 
`setstate` sets the internal state of the generator from the tuple and is used if for any reason you want to manually reset the internal state of the "Mersenne Twister"

The tuple returned from `getstate` can be passed to `setstate` method to duplicate the generation at that moment.


### What exactly is a seed?

A random **seed** is a number that is used to initialise a pseudorandom number generator. This number does not need to be random. By setting the seed, the original seed is ignored and numbers will be generated in a pseudorandom manner. If you reinitialise a random number generator with the same seed then the same sequence of numbers will be produced. 

According to [statisticshowto](https://www.statisticshowto.datasciencecentral.com/random-seed-definition/), a random seed specifies the start point when a computer generates a random number. The random seed can be any number but it usually comes from seconds on the computer system's clock which counts in seconds from January 1, 1970.  (known as Unix time). This ensures that the same random sequence won't be repeated unless you actually want it to. 


Computers generate random numbers in a *deterministic* way - that is by following a set of rules. Randomness can be imitated by specifying a set of rules to follow. The algorithms behind computer number generation are based on patterns which generate numbers that follow a particular probability distribution.

https://pynative.com/python-random-seed/
>Random number or data generated by Python’s random module is not truly random, it is pseudo-random(it is PRNG), i.e. deterministic. It produces the numbers from some value. This value is nothing but a seed value. i.e. The random module uses the seed value as a base to generate a random number.

>Generally, the seed value is the previous number generated by the generator. However, When the first time you use the random generator, there is no previous value. So by-default current system time is used as a seed value.



#### pseudo random number generator.
There are various methods of generating random numbers, with replacement and without replacement. With replacement means the number drawn would be placed back in and could be selected again. 
No replacement means that once a number is chosen, it can no longer be chosen again in the same sample as it is not placed back in the pot. In this way there will be no duplicates.
it cannot be selected again in the same sample. It is no longer available and this means there will be no possibility of duplicates.

Python's random package and numpy.random are examples of pseudo random number generators. A true random number generator would involve hardware while pseudo random number generators involve software.
 
Computers cant really generate random numbers, even humans have patterns in thinking of random numbers. Computers generate numbers that looks random, so that someone else wouldn't be able to predict the next number generated.
We are really only looking at *pseudo* random numbers. If you do have a key piece of information - the **seed** - then you can predict the next random number. The seed is typically the time (to the microsecond) on the computer when the code was run. The seed is decided when you import the function into jupyter.


Example of **pi** from the lecture. Pi is the ratio of the diameter of a circle to its circumference and it has a decimal expansion that never ends and never repeats. 
Therefore you will only ever see an approximation of pi becuase it has an unending expansion. 
The digits never repeat a pattern. While you might find pairs of digits, these pairs do not appear periodically. 
If you wanted to generate a random number between 1 and 10, you could go out to a point in the decimal expansion of pi and not tell anyone where you started, the position where you started from is the seed. If no-one knows where you start from , then no-one can predict where you go next. The seed is where you start at.

For various applications you will want to set the seed, for example to test code etc if you want the same random numbers (pseudo-random) generated you need to set the seed.
This tells python not to generate a random seed at the start (from the time to the microsecond on the machine) and instead provide it with a seed to start from to get the very same output another time.

Pseudo random number generators can be seeded which makes them deterministic and the series of random values can be recreated and predicted. A seed is like a starting point to the random number generation process. The computer's system time is usually used for the seed and this is used in an algorithm to generate some (pseudo) random values. There are times however when you need to be able to generste exactly the same sequence of random numbers such as for testing or demonstrating/teaching etc. 
You can then provide a seed to the process. 



Pythons built-in `random` module implements pseudo-random numbers for various distributions. See [python docs/random library](https://docs.python.org/3/library/random.html). However this built-in`random` module only samples one value at a time and probably uses loops to generate sequences of random numbers.   
`numpy.random` provides functions that can efficiently generate arrays of sample values from various probability distributions rather than one value at a time. It is much much faster and more efficient.





### How are random numbers generated?

According to [NumPy random module](https://numpy.org/devdocs/reference/random/index.html?highlight=random#module-numpy.random) 
>Numpy’s random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions.

It goes on to describe how 
- **BitGenerators** are objects that generates random numbers which are typically unsigned integer words filled with sequences of either 32 or 64 random bits. 
- **Generators** objects then transform these sequences of random bits from the BitGenerator into sequences of numbers that follow a specific probability distribution within a specified interval.

See [numpy docs on random sampling](https://numpy.org/devdocs/reference/random/index.html?highlight=random#random-sampling-numpy-random) on the changes since NumPy version 1.17.0. This ties in with the differences in the documentation I noticed between versions 1.16 and 1.17.

>Since Numpy version 1.17.0 the Generator can be initialized with a number of different BitGenerators. It exposes many different probability distributions. See NEP 19 for context on the updated random Numpy number routines. The legacy RandomState random number routines are still available, but limited to a single BitGenerator.
For convenience and backward compatibility, a single RandomState instance’s methods are imported into the numpy.random namespace, see Legacy Random Generation for the complete list.


### Pseudorandom  numbers vs random numbers.
>Pseudorandom  numbers are generated by an algorithm with deterministic behaviour based on the seed of the random number generator.


Numbers generated by either the built-in random module or numpy.random package are not actually random at all but *pseudorandom* numbers. Computers cannot actually generate random numbers but they do generate sequences that look like random numbers. Computers generate numbers that looks random, so that someone else wouldn't be able to predict the next number generated without a key piece of information. This is the **seed**. If you know the seed then you can predict the next number to be generated in a sequence and therefore the numbers generated are not random as such. The seed is typically the time (to the microsecond) on the computer when the code was run. The seed is decided when you import the function into jupyter or python script.

The lectured demonstrated the example of **pi** which is the ratio of the diameter of a circle to its circumference. Pi has a decimal expansion that never ends and never repeats. Therefore you only ever see an approximation of pi becuase it has an unending expansion. 
The digits in pi never repeat a pattern and while you might find pairs of digits, these pairs do not appear periodically. 
Therefore if you wanted to generate a random number between 1 and 10, you could go out to a point in the decimal expansion and not tell anyone where you started, the position where you started from is the seed. If no-one knows where you start from, then no-one can predict where you go next. The seed is where you start at. If someone knows where in the decimal expansion you started from , then they could predict the next number in the "random" sequence. 

For various applications, you may wish to set a seed so that the exact sequence of random numbers can be generated. For example when testing code or for teaching purposes where you want the output to be reproducible you can set the seed. This tells Python not to generate a random seed at the start (from the time to the microsecond on the machine) and instead you provide it with a seed to start from to get the very same output another time.

In summary, random (pseudorandom) numbers are drawn from a probability distribution. The numbers generated depend on the *seed* used and are generated according to some deterministic algorithm from that seed. 

### References
- Python for Data Analysis - chapter 4 NumPy Basics: Arrays and Vectorised Computation by Wes McKinney
- Python Data Science Handbook by Jake VanderPlas
-[numpy quickstart tutorial](https://numpy.org/devdocs/user/quickstart.html)
- [NumPy random module](https://numpy.org/devdocs/reference/random/index.html?highlight=random#module-numpy.random), 
- [python - random library](https://docs.python.org/3/library/random.html#module-random)
- Section 4.6 of Python for Data Analysis by Wes McKinney
- [GitHub Flavoured Markdown](https://github.github.com/gfm/)

###  The numpy.random package documentation on docs.scipy.org

I noticed that the documentation in the latest version 1.17 does not seem to be listing `numpy.random.rand()` function in the same way as the documentation for NumPy version 1.16 (or NumPy versions 1.15 and 1.14).

[numpy-1.15.1/reference.routines.random](https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html) has `numpy.random.rand(d0,d1,...,dn)`
The [numpy/reference/random](https://docs.scipy.org/doc/numpy/reference/random) has a random.sampling(numpy.random) section which has a [quick start guide](https://docs.scipy.org/doc/numpy/reference/random/index.html#quick-start) and it outlines some changes since the videos were made and refers to the **generator** as a replacement for Random.State.

I'll have to go through the documentation and see what the changes are.

- https://docs.scipy.org/doc/numpy/reference/random/generated/numpy.random.Generator.random.html#numpy.random.Generator.random

Note that the documents under this url refer to `numpy.random.generator` but I think this is just the newer form.





In [2]:
import random
random.randrange(300,500)

472