## Problem statement
The following assignment concerns the numpy.random package in Python [2]. You are required to create a Jupyter [5] notebook explaining the use of the package, including detailed explanations of at least five of the distributions provided for in the package. 

**There are four distinct tasks to be carried out in your Jupyter notebook:** 
1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions.
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers.


## Q1: Explain the overall purpose of numpy.random package.

### 1.1. Contents

### Section 1:
#### 1.1. Contents
#### 1.2. Description of Numpy Random's Package
#### 1.3. Numpy Documentation and Updates
#### 1.4. Summarising the Random Package

### Section 2
#### 2.1. Simple Random Data
* What do the Simple Random Data Functions Do?

#### 2.2. Numpy Integers Function
* Formula
* Basic Use

#### 2.3. Numpy's Random Function
* Formula
* Basic Use
* Using Random Function in a Loop

#### 2.4. Choice and Bytes
* Choice Function
* Bytes Function

#### 2.5. Using Simple Random Data with Pandas

#### 2.6. Permutation Functions

###### 2.6.1. Functions
##### 2.6.2. Basic Use on Arrays
##### 2.6.3. The Distinction Between the Permutation Functions
##### 2.6.4. Using the Permutation Functions with Pandas Dataframes

### Section 3
#### 3.1. Overview of The Distribution Functions

#### 3.2. Distribution Functions

##### 3.2.1. The Normal Distribution Function
* Formula
* Significance
* Basic Use
* Visualising the Normal Distribution

##### 3.2.2. The Uniform Distribution
* Significance
* Basic Use
* The Correlation Between Data Size and the Presentation of Uniformity

#####  3.2.3. The Rayleigh Distribution Function
* Significance
* Basic Use

##### 3.2.4. The Triangular Distribution Function
* Formula
* History and Significance
* Basic Use
* Example Application
* Visualising Leemis' Research

##### 3.2.5. The Wald Distribution Function
* Significance
* Visualising the Inverse Gaussian Distribution
* Plotting the Distribution

### Section 4

#### 4.1. The Function of the Seed in Generating 'Random' Numbers

#### 4.2. The Generator and the BitGenerator
* What are they?
* PCG-64 

#### 4.3. The SeedSequence
* Introducing the New Seeding Method

#### 4.4. Basic Use of the Seed

----------------------------

### 1.2. Description of Numpy's Random Package

Numpy.random is a package within the Numpy library that provides for the creation of pseudo-random values. It is known as a Pseudo-Random Number Generator (PRNG). This is done through the use of the BitGenerator that passes a stream of bits to the Generator, which then converts these bits into random data using numpy.random functions. An important fact to note in this is how the values produced are not truly random. Rather, they are generated in a manner that is determined by the seed value, which can be set by the user.

There are three classes of functions used by the Generator to produce random numbers: Simple Random Data, Permutations and Distribution Functions. Each of these classes will be discussed in depth in Sections 2 and 3.

The generation of random numbers has broad reaching applications throughout information technology, including but not limited to computational creativity, modeling and simulation, robotics and gaming (O'Neill, 2014). Wherever there is statistical need for data, but it is difficult or impossible to collect the data, PRNG's can be used in combination with basic mathematical principles. PRNG's are also widely used in cryptography, however, Numpy is not intended to be used in this fashion. 

-------------------------------------------------

### 1.3. Numpy Documentation and Updates

With Version 1.17, Numpy have changed the way the Random package works. The documentation has been updated to reflect this. This has necessitated the removal of functions from the Simple Random Data section, that were essentially duplications. The rand, randn, randint, random_integers, random_sample, and ranf have been removed as individual functional entities and Simple Random Data now contains the integers, random, choice and bytes functions. 

In addition, a 'default_rng' constructor has been introduced for the Generator and associatied functions. This constructor function will instantiate a Generator with Numpy's default BitGenerator. This is discussed in detail in Section 4.

Version 1.17 saw no change to the Permutations section of the documentation. Similarly there is little change to how the Distribution functions work.

-------------------------------------------------

### 1.4. Summarising the Random Package
This Jupyter Notebook, seeks to provide a detailed overview of the numpy.random package. Below is a table that has been generated to provide key details on the functions that will be discussed in this project. The functions include the all of those in the Simple Random Data section and Permutations section. 5 Distributions functions are discussed is Section 3, which will provide graphs that illustrate the relationship between these distributions and Numpy's functions for simulating them. I have included examples of outputs from these distribution functions below, although the limited arrays in this table do not display the distribution as the graphs in Section 3 does.

|  Function Name | Parameters | What it does | Input Example &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Output Example  |
|:---|:---|:---|:---| :---|
|  <font color='blue'>*Simple Random Data Functions*</font> |
| integers()| low, high, size, dtype and endpoint  | Generates random integers | rng.integers(5)  |4   | 
| random()  |size and dtype |Generates random floats of a given array | rng.random(5) |array([0.10582498, 0.22151441, 0.88883466, 0.46139778, 0.02401577])   |
|  choice() | a, size, replace, p, axis, shuffle | Return a random sample from a given array, for uniform or non-uniform probabilities |rng.choice(5, 3, p = [0.1, 0.1, 0.7, 0.1, 0] )  | array([2, 2, 2], dtype=int64)  | 
| bytes() | length| Return random bytes of a given length  |  rng.bytes(4) |  b'\x8a\xf9R\x8b' | 
| <font color='blue'>*Permutation Functions*</font> |
| shuffle() | x and axis | Modifify an array and return the modified version | rng.shuffle(x) ... *if x=np.arange(5)*| array([0, 2, 4, 3, 1]) | 
| permutation() | x and axis | Returns a new variable, a copy of an array that has been permuted | rng.permutation([1, 4, 9, 12]) |  array([4,  9,  1, 12])
|  <font color='blue'>*Distribution Functions*</font> |
| wald()| mean, scale and size| Draws samples from the Inverse Gaussian Distribution| wald_1 = rng.wald(1, 1, 2) | array([0.79248305, 0.29285253])
| uniform()| low, high and size| Draw samples from the Uniform Distribution| rng.uniform(2, 3, 2)| array([2.91992209, 2.09892322])
| normal() | loc, scale and size| Draw random samples from a Normal Distribution| rng.normal(mu, sigma, 3)...*where mu, sigma= 0, 0.1* |  [0.00100176,  0.00519112, -0.10541133]
rayleigh()| scale and size| Draw samples from a Rayleigh Distribution| rng.rayleigh(1, 2)| array([1.08590067, 2.03465331])
triangular()| left, mode, right and size| Draw samples from the Triangular Distribution| rng.triangular(4, 6, 12, 3)| array([9.11802852, 5.05872803, 7.90736215])


------------------------------------------------------------------------

## Q2: Explain the use of the “Simple random data” and “Permutations” functions.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

### 2.1. Simple Random Data

#### What do the Simple Random Data Functions Do?
The second section of the numpy.random package relates to the use of simple random data. This sections contains 4 functions that return random values of various types. The four functions are:
1. integers
2. random
3. choice
4. bytes

Numpy's classification of its Random package into three groups (Simple Random Data, Permuations and Distributions) may not be immediately intuitive when first using the package. However, by practicing the use of functions from each of these groups, one can grasp the distinctions between them. The Permutation functions can take an ordered array as input and output either a shuffled version or a new variable that has been 'randomly' permuted. The Distribution functions all produce random data that reflect a particular statistical distribution. 

However, the functions under in Simple Random Data section are, as the name suggest, the most simplistic of the entire numpy.random package. In the case of the random.integers() and random.random() functions, the output is manipulated based on type (integer or floating point number). Whereas, in the case of the random.choice function(), the random sample produced can be set at a specific probability, providing a useful means to replicate real-life 'chance' scenarios, such as the simulation of a game of roulette. The bytes() function is limited to producing random bites of a given length. These functions have broad use across all probabilities and areas of statistics. They are less concerned with producing manipulated data and are designed to provide the most basic statistical ramdomisation tools. 

--------------------------

### 2.2. Numpy Integers Function - numpy.random.Generator.integers

#### Formula

##### <font color='blue'>.default_rng.integers(low(int or tuple), high(int or tuple), size, dtype, endpoint</font>
*where size is the output shape,bool is optional and endpoint is bool/optional* 

#### Basic Use

In [None]:
# Create variable and assign to the constructor function
rng = np.random.default_rng()
# Set seed value
np.random.seed(30)
# Use integers function to set low and high values
rng.integers(18, 20)

In [None]:
# Set seed value
np.random.seed(0)

# Produce a 2 by 3 array of random integers between 1 and 10
print(rng.integers(10, size=(2, 3)) )

# Produce a 4 by 5 array between 1 and 20
print("\n", rng.integers(20, size=(4, 5)) )

An easy way to understand the production of matrices with the Integers function can be seen in the example below. The function is set to produce random numbers between 1 and 120 (high exlusive) and given a 4-dimensional array shape. The array values when combined must total 120. 

A useful way to understand the dimensional shape that is formed, is to look at the last size dimenion - this is the smallest dimensional unit, resulting in 5 columns. Working backwards through the size parameters, one can predict the shape of the matrix.

In [None]:
# Prduce a matrix of integers between 1 and 120 in 4-d
print("\n", rng.integers(120, size=(2, 3, 4, 5)) )

We will further explore the use of the integers function in Section 2.5.

-------------------------------------------------

### 2.3. Numpy's Random Function - numpy.random.Generator.random

#### Formula

#####  <font color='blue'>.default_rng.random(size(int or tuple), dtype, out)</font> 
*where size provides the output shape and dtype is optional*

#### Basic Use

Whilst the Integers function generates random integers, Numpy's Random function is designed to create random floating point numbers. 

This method perhaps produces greater application in the Data Science world, considering that when dealing with real-world data, Data Scientists are more often than not dealing with floating point numbers. 

Below is an example of the random() function. 10 random floats are generated by simply calling rng.random(10). These numbers are then reshaped into a 5 by 2 array using the reshape method.

In [None]:
rng = np.random.default_rng()
float1 = rng.random(10)
float1.reshape(5, 2)

#### Using Random Function in a Loop

The cell below contains a loop that contains the random function.

In [None]:
# Initialize random numbers: random_numbers
random_floats = np.empty(50)

# Generate random numbers by looping over range(100000)
for i in range(50):
    random_floats[i] = np.random.random()

print(random_floats)

# Show the plot
plt.show()

The function loops through 50 iterations to produce 50 floats.

------------------------

### 2.4. Choice and Bytes

The final two Simple Random Data Functions are Choice and Bytes. We will just briefly cover these functions.

#### Choice Function - numpy.random.Generator.choice

Choice is used to generate a random *sample* from a 1-d array. The production of random data in this fashion has many applications such as simulating the rolling of a die, or the game roulette. 

Choice accomplishes this because it can produce a sample of random data at different prbabilitiy rates for each element.

Below is an example of the function in use. Notice that the first input is for size and the second for shape/

In [None]:
rng = np.random.default_rng()
choice_1 = rng.choice(10,4)
choice_1

#### Bytes Function - numpy.random.Generator.bytes

Bytes is the most simplistic of the Simple Random data functions. It takes just 1 input, length, which requires the input of the number of bytes in the output. It is unique in the numpy.random package in that it returns a string detailing the random bytes.

Example below with a byte input length of 4:

In [None]:
rng.bytes(4)

-------------------------------------------------

### 2.5. Using Simple Random Data with Pandas

A very important function of Numpy in general, is its ability to compile data into arrays so that they can be used in databases. This has broad application in statistics.

In this section I will create random data and assign it to three separate dataframes using Pandas ravel() method. The dataframes will them be combined into 1 so that it can be plotted.

The example will create a database that simulates a collection of data on consumers, the amount of money they spent on their computers and the OS that runs on their computers. 

This is a more complex task, but it is important because it highlights the application of Numpy in the ordering of datasets.

In [None]:
# Set row display to 100
pd.set_option('display.max_rows', 100)

# Set constructor
rng = np.random.default_rng()

# Variables: cost, age2 and OS

# Generate 60 random integers to represent the cost 
# of the PCs bought
cost = rng.integers(500, 1500, 60)
#print("cost:", cost)

# Generate another 60 integers to represent the buyers age
age2 = rng.integers(16, 70, 60)
#print("age2:", age2)

# Create a dataframe for the cost and age variables 
# and use ravel() to flatten the array
df1 = pd.DataFrame({'cost': cost.ravel(), 
                    'age2': age2.ravel()})

Note that the third variable will be a list of strings that will be used as the differential variable in bivariate plots.

Before it can be plotted however, the dataframes must be alligned column-wise into a single dataframe.

In [None]:
# To simulate the OS that run on the purchased computers create a 
# list containing 60 strings of either Mac or Windows
PC=['OS', 'Windows', 'Mac', 'Windows', 'Mac',
     'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
      'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
     'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
     'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
      'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
     'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
     'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
      'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
      'Windows', 'Mac', 'Windows', 'Mac', 'Windows', 'Mac',
     'Windows', 'Mac', 'Windows', 'Mac']
# Split the list to have the first element as a column and the rest as 
# data
data = list(zip(*[iter(PC)]))
df2 = pd.DataFrame(data[1:], columns=data[0])

# Create a variable listing both dataframes together
dataframes = [df2, df1]

# Concatenate the dataframes along the x-axis
result = pd.concat([df2, df1], axis=1, join='inner')

Now plot the 'cost and age' variables in a seaborn scatterplot and set the hue parameter to 'OS'. 

In [None]:
# Create a scatterplot of ost against age and set hue to OS
scatter_choice = sns.scatterplot(x="cost", y="age2",
                                 hue="OS", data=result)
plt.legend()
plt.show()

We can also see the dataframe that has been produced.

In [None]:
print(result)

----------------------------

### 2.6. Permutation Functions

##### Definition (Webster's Dictionary)
*permute (pəˈmjuːt)*
*vb (tr)*

1. *to change the sequence of*
2. *(Mathematics) to subject to permutation [C14: from Latin permūtāre, from per- + mūtāre to change, alter] perˈmutable adj perˌmutaˈbility, perˈmutableness n perˈmutably adv*

Accrding to corporatefinanceinstitute.com, a permutation is a mathematical technique that determines the number of possible arrangements in a set when the order of the arrangements matters. Common mathematical problems involve choosing only several items from a set of items with a certain order.

#### 2.6.1. Functions

There are two functions within Numpy's Permutations Section:
* The Shuffle function:
##### <font color='blue'>.default_rng.shuffle(x(array), axis)</font>
*where axis is an int and optional*

*and*
* The Permutation function:
##### <font color='blue'>.default_rng().permutation(x [int or array_like ints], axis[int, optional])</font>
*Where x is an int. and axis is which axis x is shuffled along, defaulted to 0.*

Both of these function, permute a list in order to create random data. In particular, they both change the order of elements in an array. However, shuffle() merely permutes an array and outputs the altered array, permutation() makes a copy of an array, permutes it and then outputs the change in the form of a new variable.

#### 2.6.2. Basic Use on Arrays

In the example below, shuffle() is used on a 1-d array between 1 and 10. 

The items within the array are rearranged:

In [None]:
rng = np.random.default_rng()

# Create a basic array between 1 and 10
arr = np.arange(10)
# Use shuffle method to randomly switch between elements 
rng.shuffle(arr)
arr

In the example below the permutation function rearranges a 2-dimenional array, operating on each row of data. 

Something that is important to understand when composing statements with the permutation functions, is that they are called to shuffle within lists, rather than by permuting elements between lists. 

This is demonstrated by looking at the output of the following permutation function example:



In [None]:
# Create a more complicated array by reshaping a single array of 9 
# into 3 by 3 arrays
arr = np.arange(9).reshape((3, 3))
# Use shuffle to shuffle the elements within each array, 
# wihtout changing the order of the arrays.
arr2 = rng.permutation(arr)
arr2

#### 2.6.3. The Distinction Between the Permutation Functions

The distinction between permutation() and shuffle() that was previously alluded to will now be defined.

Below, a 2-d array is created and assigned to variable 'a'. This variable is copied by the permutation function and a new variable is output, 'a_perm':

In [None]:
# Create a 3 by  array with arange() method
a = np.array(np.arange(9).reshape(3, 3) )
print('a:', a)

# Create a variable and assign it to the permutation of 'a'
a_perm = np.random.permutation(a)
# Note that the new variable contains the permutated version
# of 'a'
print('\n', 'a.permutation variable has been created:\n', a_perm)

# Note that original remains unchanged
print('\nOriginal a variable is not altered by\
\nnp.permutation:', a)

As can be seen above, the original variable 'a' remains unchanged with this method.

Shuffle() will now be used on an array of identical dimensions:

In [None]:
# Create a 4 by 4 array and assign to variable b
b = np.array(np.arange(16).reshape(4, 4) )
print('\n', 'b:', b)

# Create another variable and assign to the shuffle
# method of 'b'
b_shuffle = np.random.shuffle(b)

# Note that the new variable does not contain the shuffled
# values
print('\n', 'b.shuffle variable does not contain\
 shuffled values:', b_shuffle)

#Note that the original variable has instead been changed
print('\nOriginal b variable has been altered by\
 np.shuffle:\n', b)

 #### 2.6.4. Using the Permutation Functions with Pandas Dataframes

We will now attempt to use more complex statements to develop Numpy arrays into dataframes, so that they can be altered by the permutation() method. 

Utilising Numpy and Pandas in conjunction is often desirible when working with datasets, especially when creating random numbers to test a dataset on.

In the following example a dataframe will be formed, by creating three 1-d arrays and attaching them together using the ravel() method. 

The ultimate goal, will be to create a number of scatterplots in Seaborn to test the permutation() method on. The effect of the function will be shown by creating 4 plots, with the first plot containing no permutations, and the final plot having every data point permuted.

In array 'c' below, notice how the last 3 rows of data contain ranges between 0 and 2, whilst all others only contain 1 value replicated 3 times.

In [None]:
# Creat 2 1-d arrays, the first 1-60 and the second 60-120
a =np.arange(start=0, stop=60, step=1)
b =np.arange(start=60, stop=120, step=1)

# Create an array of numbers 0, 1 and 2 to provide a 
# variable to distinguish rows of data when put in a plot
c = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0], 
              [0, 0, 0], [0, 0, 0], [0, 0, 0], 
              [1, 1, 1], [1, 1, 1], [1, 1, 1], 
              [1, 1, 1], [1, 1, 1], [1, 1, 1],
              [2, 2, 2], [2, 2, 2], [2, 2, 2], 
              [2, 2, 2], [2, 2, 2], [2, 2, 2],
              [0, 2, 1], [1, 2, 2]])

# Create df1 using ravel(), this will contain no 
# permuted data
df1 = pd.DataFrame({'a': a.ravel(), 'b': b.ravel(), 
                    'c': c.ravel()})

# Use permutation() to jumble the elements of 'c'
c2 = rng.permutation(c)
# Create df2 with the permuted variable 'c2'
df2 = pd.DataFrame({'a': a.ravel(), 'b': b.ravel(), 
                    'c': c2.ravel()})

# Now permute 'b' and add b2 to another, df3, with 'c2'
b2 = rng.permutation(b)
df3 = pd.DataFrame({'a': a.ravel(), 'b': b2.ravel(), 
                    'c': c2.ravel()})

# Finally permute a and 'a2' to df4 with 'b2' and 'c2'
a2 = rng.permutation(a)
df4 = pd.DataFrame({'a': a2.ravel(), 'b': b2.ravel(), 
                    'c': c2.ravel()})


##### Plot on scatterplots: 
In the first plot below, for 'df1', 'a' and 'b' have been graphed against each other, with 'c' acting as the 'hue'. No variables have been permuted in this example. Notice the uniform relationship between 'a' and 'b'. Additionally, the final few dots of varied shading in the top right represent the final rows of 'c', that contain a mix of values.  

In the second plot, 'df2', the values of c have been permuted. This results in the shuffling of datapoints, but this occurs across the uniform line drawn by 'a' and 'b'.

A big change is seen in the plot of 'df3'. Here, in addition to 'c', the values of 'a' have been permuted by the permutation() method. This in turn upsets the uniform relationship between 'a' and 'b'. The plot now resembles a completely random dispersal of values.

The plot of 'df4' is similarly dispersed. This effect displays the ability of the permutation functions to disturb arrays of data.

In [None]:

pd.set_option('display.max_rows', 200)
# Use subplots() method to create a 4,1 figure
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4,1,figsize=(10,20))

# Populate with data from dataframes, set hue to 'c'
sns.scatterplot(x="a", y="b", data=df1, hue='c', ax=ax1)
sns.scatterplot(x="a", y="b", data=df2, hue='c', ax=ax2)
sns.scatterplot(x="a", y="b", data=df3, hue='c',ax=ax3)
sns.scatterplot(x="a", y="b", data=df4, hue='c', ax=ax4)

# Add titles for each plot
ax1.set_title("df1 - no permutation")
ax2.set_title("df2 - c is permuted")
ax3.set_title("df3 - c and a are permuted")
ax4.set_title("df4 - all variables are permuted")

plt.show()

-------------------------------------------------

## Q3: Explain the use and purpose of at least five “Distributions” functions.

### 3.1. Overview of The Distribution Functions

The motive for the largest set of functions in the numpy.random package, the Distribution Functions, is to create specific statistical order to random data that is created. In comparison to the Simple Random Data or the Permutations, which provide the basics of RNG, these functions give a form to the data, that reflects common statistical properties. 

The various orders of data have been observed for centuries and each specific order has a recognised shape to it. These shapes have become known as Distributions. They often reflect natural phenomena and have been assigned mathematical formulas. Numpy has created a section that places random data into these various distributions, through the use of functions that contain the parameters from the distribution formulas. 

The utility of these functions is seen in simulating real-world situations with randomly produced data. These Ditributions play a key role in models of analysis for sectors ranging from finance and stock markets, to electronic engineering and the study of radio waves.

In this section, Seaborn 'distplots' (univariate density plots with histograms) are utilised to simulate real-world phenomena using 5 of Numpy's Distribution Functions. 

**Below is a Seaborn distplot displaying the shape of 5 Distributions that will be inspected in this section:**

In [None]:
# Set rng as constructor
rng = rng = np.random.default_rng()

# Plot 5 dist. functions on 1 graph to show their shape
sns.distplot(rng.rayleigh(8.0, size=1000), label="Rayleigh" )
sns.distplot(rng.uniform(14 ,24, 1000), label="Uniform") 
sns.distplot(rng.normal(29, 3.0, 1000), label="Normal")
sns.distplot(rng.triangular(31, 41, 52, 1000), label="Triangular")
sns.distplot(rng.wald(4, 5, 1000), label="Inverse Gaussian")
# sns.distplot(rng.power(3, 3))
plt.title("Distributions")
plt.legend()
plt.show()


-------------------------------------------------

## 3.2. Distribution Functions

### 3.2.1. The Normal Distribution Function - numpy.random.Generator.normal

#### Formula

##### <font color='blue'>np.random.default_rng().normal(mu, sigma, size{int or tuple)</font>    
*where mu = mean, or "centre" and sigma = standard deviation*

<img src="https://i2.wp.com/www.sharpsightlabs.com/wp-content/uploads/2018/12/numpy-random-normal-syntax-explanation.png?w=600&ssl=1" alt="Drawing" style="width: 400px;"/>

*Image from sharpsightlabs.com*       


#### Significance

Calling on numpy.random.normal will transform data into the Normal or Gaussian distribution. It is also known as the bell-curve, however, there are other distributions that resemble this shape, so Normal or Gaussian is preferred. The primary characteristic of the Normal Distribution is that the mean, the median and the mode are the same. The application of the Normal Distribution is not confined to statistics, rather it has broad reaching consequences for the natural sciences, such as Electronic Spectroscopy (Salman, 2017). Furthermore, it has found application in the study of psychological and physical phenomena (e.g. IQ scores and heartbeat).

A well-known application of the Normal Distribution is the Covid-19 healthcare capacity curve that was discussed broadly by governments around the world as a justification for imposing lockdown to reduce casualties of the virus. The model displays two Gaussian curves, one that has a high peak, representing high ICU demand and a model where demand is kept low by restrictive measures. The graph is presented below:


![](https://i.stack.imgur.com/sCwiV.png)

#### Basic Use
The parameter 'loc' requires a float reflecting the Mean vector of the sample data. 'Scale' requires the standard deviation as a float. 'Size' can be an integer or a tuple of ints, a single integer will reflect the size of the data, whilst a tuple such as (m, n, k) will draw samples in the form *m * n * k*. 

An example of the basic use of the Normal Distribution function is seen below:

In [None]:
# Set the mean and standard deviation values:
mu, sigma = 20, 6
normal_1 = np.random.normal(mu, sigma, 10)

normal_1

We gather a greater idea of how the random.normal() function works if we graph a sample of normally distributed random data on a histogram. By plotting this histogram against the histogram of uniformly distributed data created using the linspace() function, we can gain an understanding of the shape of the 'Bell-Curve' and more importantly, how numpy.random recreates this naturally occurring phenomena with random data. The graph below utilises Seaborn to illustrate this

In [None]:
# Create 1000 data points evenly spaced between 5 and 10 simulate uniform dist.
b = np.linspace(5, 10, 1000)

# Alter these 1000 vectors into a Gaussian bell-shape with random.normal()
c = np.random.normal(b)

# Create distplot
sns.distplot(c, label="Uniform Distribution")
sns.distplot(b, label="Normal Distribution")
plt.legend()
plt.title("Nornal Distribution Vs Uniform Distribution")
plt.show()

#### Visualising the Normal Distribution
A company that produces exercise bikes wish to examine the life-time of the internal parts of the bike in hours. The department that has marketed the bike has pledged to customers that the bike will last up to 4500 hours of use, as such it is important for the manufacturers to understand how many of the parts will fail before 4500 hours of use. After carrying out analysis, they have determined that the mean number of hours that the parts will remain functional at is 3500 hours and the standard deviation is determined to be 500 hours. As stated above, only the mean and standard deviation must be known to create a useful probability density function and a histogram of this data.

Using Numpy's Random package, we can create a 1-dimensinal array of random normal data reflecting instances made during the inspection of the exercise bike parts.

In the below example, loc is set to 3500, scale is set to 500 and size is set to 1000 (the minimum number of bike parts that must be examined to determine their lifespan).

In [None]:
mu, sigma = 3500, 500 # mean and standard deviation
# Create variable and assign normal function with mean,
# standard deviation and size as inputs
rng = np.random.default_rng().normal(mu, sigma, 1000)
# Create distplt using Seaborn and format
sns.distplot(rng, axlabel="Use of Bike Parts (hrs)")
plt.title("Operational Life of Exercise Bike Parts")
plt.show()

*From the above graph, it can be seen that less than 5% of 1000 bike parts examined survived after 4500 hours of use.*

-------------------------------------------------

### 3.2.2. The Uniform Distribution - numpy.random.Generator.uniform

#### Parameters and Formula

##### <font color='blue'>np.random.default_rng().uniform(low, high, size{int or tuple)</font>
*where low is for the lower boundary of the output and high is for the higher boundary.*

#### Significance

The origin of the Continuous Uniform Distribution are difficult to pinpoint. It may be more worthwhile to investigate the origin of the concept of Equiprobability. Equiprobability is where the collection of a series of events all possess equil, or uniform, probability of occurring. This developed both from the philosophical concept of Equipossibility, but also from the more grounded Equiprobability application of rolling dice. It is intuitive that it is equally ikely for a rolling die to land on any one of its six faces. This sort of uniformity is particularly useful in computer programming, where creating uniform probability is highly desirable. It is from this aspect of Equiprobability that Numpy utilises the random.uniform distribution.

The Uniform Distribution, in contrast to the Normal Distribution, does not have much expression in the natural world. As previusly stated, the uniform distribution is present in the rolling of dice, it is also seen in a deck of cards, where each card is equally likely to occur. Unsurprisingly, it is in the generation of pseudo-random numbers, through alogrithms, such as Numpy, that the Uniform Distribution is applied with most benefit. Numpy's Simple Random Data functions all create data that is uniformly distributed.

By default, Numpy distributes data using the Uniform distribution. This can be seen when looking at the comparison between the Uniform and Normal graph in the previous section. Np.linspace produced 1000 point uniformly without specifically being requested to do so.

#### Basic Use

Uniform() takes 3 parameters: a low value, a high value(exclusive) and size.

A basic format for inputting these paramaters is seen below:

In [None]:
rng = np.random.default_rng()
uni_1 = rng.uniform(5, 10, 10)
print(uni_1)

#### The Correlation Between Data Size and the Presentation of Uniformity

In the below example it is clear from the distplot created for 'uni_50000' that the distribution of data is completely uniform. 50000 random numbers is seemingly more than enough to present a plot of uniform appearance. As there are less random vectors created for 'uni_100' and 'uni_1000', the uniformity is not obvious. This is because the uniform function is creating random numbers in the given range and with a lower amount of data points, the randomness will present the distribution as seemingly non-uniform. Nevertheless, all three variables have been distributed uniformly in a random manner.

In [None]:
# Create 3 distplots of sizes, 100, 1000 and 50,000 respectively
uni_100 = rng.uniform(10, 30, (20, 100))
uni_1000 = rng.uniform(40, 70, (20, 1000))
uni_50000 = rng.uniform(80, 110, (20, 500000))
# Add on to single distplot
sns.distplot(uni_100, label="uni_100")
sns.distplot(uni_1000, label="uni_1000")
sns.distplot(uni_50000, label="uni_50000")
plt.legend()
plt.show()


------------------------------------------------------------

### 3.2.3. The Rayleigh Distribution Function - numpy.random.Generator.rayleigh

#### Significance

This distribution was first conceptualised by John Rayleigh in 1887. Since then it has become widely used in various realms of science. It is used in communications to model the pathways of signals to receivers. It has also found widespread use in the modelling of wind speed and in the measure of light radiation. However, this project will look at Rayleigh's original use of the model in terms of wave height and how this can be modelled with Numpy.


#### Formula

##### <font color='blue'>np.random.default_rng().rayleigh(scale, size(int or tuple of ints))</font>
*where scale=mode (defaulted to 1) and size determines shape (defaulted to a single value)*


Rayleigh showed that the distribution could be used to reflect the amplitude of wave heights. The distribution of random wave heights could be descirbed with any of the following formulas:

In [None]:
from IPython.display import Image
Image(r'waveheights.png')


The random values of H in the above formula can be discovered once one of the following measurements is known:

* Hmode = modal or most common wave height
* Hmean = mean or average wave height
* HRMS = root-mean-square wave height.

An even better way to visualise the elements of this equation is found in the image below, where the variables are graphed over the profile of wave:   

<img src="https://images.slideplayer.com/19/5892025/slides/slide_11.jpg" alt="Drawing" style="width: 600px;"/>

#### Basic Use

The Rayleigh function takes an input that represents the mode. It is defaulted to 1. The second input represents the output size.

In the simple example below, scale (mode) is not set, the plot displays a mode of 1 as expected. The wave shape can be seen in that the right slope of the curve stretches out slightly across the x-axis.

In [None]:
rng = rng = np.random.default_rng()
sns.distplot(rng.rayleigh(size=1000), hist=True)
plt.show()

-------------------------------------------------

### 3.2.4. The Triangular Distribution Function - numpy.random.Generator.triangular

#### Formula

##### <font color='blue'>np.random.default_rng().triangular(lower-limit, mode, upper-limit, size{int or tuple)</font>
*where left <= mode <= right and mode = the peak of the distribution.*

<img src="https://www.nde-ed.org/GeneralResources/Uncertainty/Graphics/Triangle_pdf.jpg" alt="Drawing" style="width: 500px;"/>


#### History and Significance

The development of the Trianagular Distribution dates back to the 18th Century mathematician Thomas Simpson. 

<img src="https://mathshistory.st-andrews.ac.uk/Biographies/Simpson/thumbnail.jpg" alt="Drawing" style="width: 200px;"/>

According to Seal (1949), Simpson envisioned a distribution that could mathematically represent the method practiced by astronomers, of taking the mean of a number of observations and using this to reduce the defections arising from human error and instrumentational limits. Kotz (2004) explains that Simpson stated that the triangular distribution should be employed to limit errors in obersations to within ± 1 and that this was the first time a continuous (or symmetric) probability law was introduced. This situates the Triangular Distribution amongst the first continuous distributions of the 18th Century.

Today, the Triangular Distribution finds application primariy for populations of data that are scarce and is used in a subjective manner, rather than as a rigid and well-defined tool of analysis. It is understandable, therefore, that it has its roots in the attempt by Simpson to mitigate the errors in observations made in an age when data was significantly more difficult to collect. The distribution is commonly used in audio dithering, or in project management, where it is used to model events within an interval determined by a minimum and maximum value.

#### Basic Use

Below is a simple example of generating data from the Triangular Distribution. Notice how minimal the inputs are. This example will provide us with a basic understanding of the paramaters and will generate just 1 number.

In [None]:
#Set the constructor
rng = np.random.default_rng()

# Create random-valued variables to respresent the triangular() inputs
mode = rng.integers(25, 28)
lower = rng.integers(22,25)
peak = rng.integers(29, 32)

# Pass variables into function
rng.triangular(lower, mode, peak)

*In basic use as seen above, it is important to ensure that lower<mode<peak.*

#### Example Application
Mathmatician Lawrence Leemis, whilst working for Nasa in the 1980's, was asked to estimate the time that would be required to do some maintenance work on a thruster for the yet unbuilt International Space Station. Leemis consulted with a senior engineer within Nasa, who had experience on projects where similar maintenance work was carried out on satellites in low-Earth orbit. From this consultation, Leemis came up with a minimum, a mode and a maximum amount of time taken to perform the maintenance work. The figures were 4 hours, 6 hours and 12 hours respectively. As such, Leemis' data presented itself in a Triangular Distribution. 

Triangular Distribution requires very little input data to carry out a simulation, just the three inputs that Leemis used in the above example will suffice. 

#### Visualising Leemis' Research

Utilising Leemis' investigation as an example, we can generate random figures using the numpy.random.triangular() function.
We will also be required to input the amount of observations we wish to randomly generate. In order to create a useful graph, we will use a figure that is much higher than is likey to have been carried out in low-Earth orbit maintenance work. By using an input of 10000, we are presented with a probability density function in a triangular formation.

Below is density plot that simulates Leemis' research on maintenance on thrusters:

In [None]:
# Set constructor function as 'rng'
rng = np.random.default_rng()
# Generate a random sample using triangular function and inputs
# for low, mode, high and number of datapoints.
tri1 = rng.triangular(4, 6, 12, 10000)
# Plot a distplot with Seaborn
sns.distplot(tri1, axlabel="Maintenance Time(hrs)")
plt.title("Triangular Distribution: Time for Thruster Repair")
plt.show()

The probability is represented on the y-axis and the time taken is represented on the x-axis.

In the above code, 10000 pseudo-random vectors have been created, each representing a hypothethical Nasa maintenance mission. The peak of the ditribution is at 6 hours, with a slope in either direction to the minimum and maximum time limits. 

If the maximum amount of time an astronaut could spend on a maintenance mission was 10 hours, utilising the randomly generated data and graph below, it could be determined that there is roughly a 12.5% probability of exceeding this limit. 

--------------------------------------------------------------------------

### 3.2.5. The Wald Distribution Function - numpy.random.Generator.uniform

#### Forumula

##### <font color='blue'>np.random.default_rng().wald(mean, scale, size(int or tuple of ints)</font>
*where the mean must be >0, scale must be >0 and size determines shape (defaulted to 0)*

#### Significance

The Wald Distribution is more often known as the Inverse Gaussian Distribution or Inverse Normal. The Wald has a similar shape to the Weibull Distribution but is easier to predict since the former takes 2 parameters, but the latter takes three. The Inverse Gaussian distribution has its origins in Shrödinger's study of Brownian motion. However, it also holds applications in modellng stock returns and interest rate process and in electrical engineering (Sato and Inoue, 1994).

#### Visualising the Inverse Gaussian Distribution

Numpy's Wald function wil simulate random data in a right-skewed distribution bounded at 0. In the image below, 3 variations of the Inverse Gaussian distribution are seen. 
* The blue line is where mean and scale are both input as 1.
* The red line distributes data at mean 1 and scale 10, resulting in a higher peak.
* The black line has mean at 2 and scale at 4, resulting in the distribution being spaced out along the x-axis.  

![](https://www.vosesoftware.com/riskwiki/images/image2c33.gif)

#### Plotting the Distribution

Simiar to how we investigated the basics of the triangular() method, we will now create a basic graph by creating a replica of the density plot seen above.

In [None]:
# Set constructor function as 'rng'
rng = np.random.default_rng()

# Generate a random sample using triangular function and inputs
# for mean, scale and size
wald_1 = rng.wald(1, 1, 1000)
wald_2 = rng.wald(1, 10, 1000)
wald_3 = rng.wald(2, 4, 1000)

# Plot a distplot with Seaborn
sns.distplot(wald_1, axlabel="Inv.Gaussian: 1,1", 
             color='purple',hist=False)
sns.distplot(wald_2, axlabel="Inv.Gaussian: 1,10", 
             color='red', hist=False)
sns.distplot(wald_3, axlabel="Inv.Gaussian: 2,4", 
             color='black', hist=False)
plt.title("Wald Distribution of 3 Basic Variables")
plt.show()

---------------------------------------------------------------------------------------------

## Q.4. Explain the use of seeds in generating pseudorandom numbers.

### 4.1. The Function of the Seed in Generating 'Random' Numbers

The Seed in Numpy's Random package is the entity at the heart of the pseudo-random number generation process. In reality, it is simply a single value. The Seed is the 'BitGenerator' and the Seed value is the reference point that determines which numbers are selected to be provided to the 'Generator'. Numpy uses this reference point to select random numbers from. The Seed provides an 'instant', that is to say an internal state, that is very difficult to determine without knowing the Seed value. The user can, however, set the seed value themselves and in doing so remove the veil of mystery surrounding numpy.random.

In order to understand the basic function of the Seed it might be a good time to utilise a cheesy metaphore drawn from pop-culture!

Think of the Seed as the Seed as...

![](https://www.nsxprime.com/photopost/data/1032/THE_MATRIX_RELOADED-0.jpg)

##### ... The Keymaker from the Matrix series...

It might seem cliché to make a reference to the Matrix series when describing an aspect of Data Science, but the comparison may be useful in translating the operations of the Seed into a digestible form.

In the film, the Keymaker is a computer program, within the simulation known as the Matrix, which has been created by an alien population that have taken over the Earth and have sedated humans by plugging them into the Matrix. Neo is a human that has been unplugged from the Marix. 

The Keymaker is a character of importance in the film, as he and only he posseses unique keys for doors, that Neo, 'The One', must travel through. By finding the Keymaker, one can use his keys and in some sense cheat the Matix. 

| -  |   -  |  -    |    -  |    - |
|:---: | :---: | :---: | :---: | :---: | 
|  <font color='blue'>*The Matrix*</font>     |      |        |        |         |
|    Key  →| The Keymaker(Program)  →|  The Oracle(High-level program)  →|The Source(CPU)  |
|  <font color='blue'>*The Seeding Process*</font>    |               |            |                  |
|Seed/Seed Value  →     |        PCG-64  →          |       Generator  →          |    A Random Number is produced   |

The Seed in a similar sense is a glimpse of the internal state of the computer. Like obtaining one of the Keymaker's keys, users can set the Seed value and therefore, control the random numbers that are generated. The PRNG system, PCG-64, is similar to the Keymaker as the system takes the Seed value, permutes it and passes it to the Generator. The Generator, a program with a different purpose, takes the stream of bits and transates it into a random distribution for use by the user. Similarly, the Oracle is a program with a different purpose to the Keyholder that Neo must use after the Keymaker's death. This chain eventually leads to the Source, in the case of our metaphor and in reality for Numpy, the production of a random number. 

-------------------------------------------------



### 4.2. The Generator and the BitGenerator

#### What are they?

With Numpy's Version 1.17 release, the package has been divided into two sections: the Generator and the BitGenerator. In order to understand how the seed operates within Numpy, it is also necessary to understand how these functional algorithms work. 

The Generator is the 'user-facing' umbrella of methods for drawing random numbers from a variety of distributions. Therefore, it emcompasses the Simple Random Data, Permuation and Distribution functions. Before Version 1.17, Numpy utilised a Generator known as RandomState. 

The Bit Generator uses an efficient algorithm to produce a 'stream' of random bits to pass to the Generator, which in turn will be given random shape in a distribution. The BitGenerator, therefore, is responsible for 'instantiating' the Generator. Since Version 1.17 this process occurs when the user commands the Generator via the numpy.random.default_rng. The default_rng function is a 'constructor' that calls on the BitGenerator to instantiate the Generator.

The complicated workings of the BitGenerator make it difficult for the user to comprehend how it operates and its relationship to the interface of Numpy. By investigating the particular Pseudo-Random Number Generator (PRNG) that Numpy has adopted in its latest releases, the PCG-64 (Permutation Congruential Generator, 64-bit), we can understand more about the BitGenerator's role in numpy.random.

#### PCG-64 

PCG-64 is a family of PRNG's that seeks to address a perceived shortcoming of previous algorithms, that are guilty of falling into one pit fall or another when it comes to random number generation. Some PRNG's use encryption to keep the internal state of the generator secret, but these programs use a lot of space and are slow in operation. Other PRNG's are fast but do not offer any methods to conceal the internal state. One the creators of PCG-64 has explained that this new PRNG is designed to to pass the output of a fast well-understood “medium quality” random number generator to an efficient permutation function (O'Neill, 2014). The random number generator used in the first stage of this process is known as a Linear Congruential Generator (LCG). Therefore, the stream of random bits produced by the LCG is further 'randomised' by a special function, before it is passed to the Generator. 

-------------------------------------------------

### 4.3. The SeedSequence

#### Introducing the New Seeding Method

To further complicate what has been discussed in the previous section, where it was stated that the PCG-64 *was* the BitGenerator, it is more correct to say that PCG-64 is the PRNG method that uses the Seed itself as the BitGenerator. Looking at the paramters for the PCG-64 class in Numpy, we can see how the seed plays a central role in how the BitGenerator works:

##### <font color='blue'>np.random.PCG64(seed=[None, int, array_like[ints], SeedSequence}, optional])</font>

If the seed is set to 'None' a 'fresh' instantiate entropy will be pulled from the OS using the LCG. However, if one is to set the seed to an integer, or an array of integers, then it will be passed to the SeedSequence class. 

The legacy methods now discouraged by Numpy, employ a reseeding method for the BitGenerator. That is to say, if the user wishes to change the seed, they merely change the seed of the BitGenerator. Since Numpy 1.17, users are advised not to reseed, but rather to create a new BitGenerator with a new seed.

-------------------------------------------------

### 4.4. Basic Use of the Seed

With the legacy model for seeding, RandomState, the seed was set by calling the np.random.seed() method and inserting the seed value you wish to set. 

This has been replaced by the default_rng Generator constructor. To seed the constructor, all you have to do is insert the value into constructor as the parameter.

A simple example of this is seen below:

In [None]:
# Create construcor and enter seed value
rng = np.random.default_rng(4)

# Create variable and assign to integers function
seeded = rng.integers(10)
seeded

A seed value of 4 will always produce the integer 7 when rng.integers(10) is passed. This works with all functions:

In [None]:
rng = np.random.default_rng(4)

# Bytes Function Example
seeded2 = rng.bytes(8)
print(seeded2)

# Random example
seeded3 = rng.random(1)
print(seeded3)

# Choice Example
# Create array of 1-6
array_4 = np.arange(1, 7, 1)
# Use choice() to pick one at random
seeded4 = rng.choice(array_4)
print(seeded4)

##### Repeat:

In [None]:
rng = np.random.default_rng(4)
seeded2 = rng.bytes(8)
print(seeded2)
seeded3 = rng.random(1)
print(seeded3)
array_4 = np.arange(1, 7, 1)
seeded4 = rng.choice(array_4)
print(seeded4)

The inputs are the same **everytime**.

And that's it! 

Despite how difficult it is to understand how the seed works, it is very simple to use.

------------------------------------