![numpy_logo_icon_168073.png](attachment:numpy_logo_icon_168073.png)

***

# NumPy random assignment
# Author : Michelle O'Connor (Student ID : G00398975)

## Assignment :
Create a Jupyter notebook explaining the use of the NumPy random package, including
detailed explanations of at least five of the distributions provided for in the package.

<br>
The four tasks to be carried out in the Jupyter notebook are:

1. Explain the overall purpose of the package.
2. Explain the use of the “Simple random data” and “Permutations” functions.
3. Explain the use and purpose of at least five “Distributions” functions.
4. Explain the use of seeds in generating pseudorandom numbers.

## 1. Purpose of the NumPy Random package

#### NumPy Random package

NumPy is a Python library and is used for working with arrays. 
NumPy is short for "Numerical Python" [1]

Because NumPy functions operate on numbers, they are especially useful for data science, statistics, and machine learning.

One common task in data analysis, statistics, and related fields is taking random samples of data.
You’ll see random samples in probability, Bayesian statistics, machine learning, and other subjects. Random samples are very common in data-related fields.
https://www.r-craft.org/r-news/how-to-use-numpy-random-choice/

The NumPy random is a module in the NumPy library. <br> NumPy random choice provides a way of creating random samples with the NumPy system, this module contains the functions which are used for generating random numbers. It contains some simple random data generation methods, some permutation and distribution functions, and random generator functions.[2]


What is a Random Number? <br>
Random number does NOT mean a different number every time. Random means something that can not be predicted logically. 

Pseudo Random and True Random.
Computers work on programs, and programs are definitive set of instructions. So it means there must be some algorithm to generate a random number as well.

If there is a program to generate random number it can be predicted, thus it is not truly random.

Random numbers generated through a generation algorithm are called pseudo random.

Can we make truly random numbers?

Yes. In order to generate a truly random number on our computers we need to get the random data from some outside source. This outside source is generally our keystrokes, mouse movements, data on network etc.

We do not need truly random numbers, unless its related to security (e.g. encryption keys) or the basis of application is the randomness (e.g. Digital roulette wheels).[3]

#### PCG64 vs Mersenne Twister

NumPy random is continously updating but one update worth to note happened in verison 1.17. The default PRNG changed from Mersenne Twister to PCG64.

The Generator provides access to a wide range of distributions, and serves as a replacement for RandomState. The main difference between the two is that Generator relies on an additional BitGenerator to manage state and generate the random bits, which are then transformed into random values from useful distributions. The default BitGenerator used by Generator is PCG64. The BitGenerator can be changed by passing an instantized BitGenerator to Generator.

PCG is a family of simple fast space-efficient statistically good algorithms for random number generation.

https://docs.w3cub.com/numpy~1.17/random/generator

Why the change?


On the image below which compares a selection of random number generator on performance and time taken to derive the random samples, PCG's results are significantly better than Mersenne Twister (MT on the image) on both counts.

PCG's performance is more than double than for Mersenne Twister and on the time taken to derive the the randon sample for PCG64 is half that of the Mersenne Twister.

![prng_perf.jpg](attachment:prng_perf.jpg)

Both the Mersenne Twister and PCG64 are open source software, with a permissive license. 

The below imagee 
Overall thoough, PCG64 is faster,

One item to notee that isn't 

Predictable — after 624 outputs, we can completely predict its output
Not particularly space efficient. 
While jump-ahead is possible, algorithms to do so are slow to compute 
Fails some statistical tests, with as few as 45,000 numbers

https://thompsonsed.co.uk/random-number-generators-for-c-performance-tested

![PCG64%20vs%20Mersenne%20Twister.JPG](attachment:PCG64%20vs%20Mersenne%20Twister.JPG)

> Advantage of PCG64  
-Easy to use, and yet its very flexible and offers powerful features.  
-Fast, and can occupy very little space.  
-It has small code size.  
-It's performance in statistical tests is excellent.   
-It's much less predictable and thus more secure than most generators.  
-It's open source software, with a permissive license  

> Advantage of Mersenne Twister
-It's open source software, with a permissive license 
-Easy to use, and yet its very flexible and offers powerful features.  
-Fast, and can ocupy very little space.  
-It has small code size.  
-It's performance in statistical tests is excellent.   
-It's much less predictable and thus more secure than most generators.  
-It's open source software, with a permissive license  


The Mersenne Twister is, however, not a silver bullet for all PRNG
needs. While a variant, CryptMT, exists, it is normally not
cryptographically secure, disallowing its use where security is a major
concern such as password encryption or gambling. It also requires a
large state buffer of 2.5 KiB, and has mediocre throughput by modern
standards, meaning it shouldn’t be used on hardware with too little buffer
or in situations where large streams of random numbers are needed.


https://www.pcg-random.org/index.html

#### rng = np.random.default_rng()

default_rng is the recommended constructor for the random number class Generator.   
The function numpy.random.default_rng will instantiate a Generator with numpy’s default BitGenerator.   
rng = np.random.default_rng() will be the basis of the numPy random code in this notebook. 

https://docs.w3cub.com/numpy~1.17/random/generator

## 2. Simple Random data and Permutations functions

### Simple Random data

The Simple random data function has 4 methods 
<br>-Intergers
<br>-Random
<br>-Choice
<br>-Bytes

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#### Integers

The Integers method generates random integers in the shape defined by size from low (inclusive) to high (exclusive) in the discrete uniform distribution. Optionally, we can also set the dtype as int64, int, or something else, with np.int being the default value.
One thing to note is that when we don’t set the high argument, the numbers will be generated in the range of (0, low).<BR>
https://towardsdatascience.com/a-cheat-sheet-on-generating-random-numbers-in-numpy-5fe95ec2286

random.Generator.integers(low, high=None, size=None, dtype=np.int64, endpoint=False)

*low, high = None* : You enter the start (inclusive) and end number (exclusive) of the range you want your selection from. If you want to default zero as your low, then just enter one number <br>

*size=None* : Select the number of results you would like and the shape of you require, for example if you want 2 rows of 4 numbers enter size = (2, 4) . The default is 1 if you do not enter a value. <br>

*dtype=np.int64* : The desired dtype of the result, the default value is a 64 bit integar which can generate an integar between -9223372036854775808 to 9223372036854775807<br>

*endpoint=False* : Default = False excludes the high number of the range from your selection, however if you want to have this numbered included enter endpoint=True<br>

In [None]:
# Generate 5 random numbers from 0-9
rng = np.random.default_rng()
rng.integers(2, size =5)

In [None]:
# Generate 6 rows of 4 random numbers from 1 to 19
rng = np.random.default_rng()
rng.integers(1, 20, size=(6,4), endpoint=False)

In [None]:
# Generate 6 rows of 4 random numbers from 1 to 20
rng = np.random.default_rng()
rng.integers(1, 20, size=(6,4), endpoint=True)

To show the simple random integers function visually, I'll use a scatterplot. Here we see the random selection of 100 numbers.

In [None]:
rng = np.random.default_rng()
x = rng.integers(1, 100, size=(400), endpoint=False)
y = rng.integers(1, 100, size=(400), endpoint=False)
plt.scatter(x, y)
plt.show()
# https://www.codingem.com/scatter-plots-in-python/

Using a histogram we can see the count of each number selected, overall an even distribution of numbers. 

In [None]:
rng = np.random.default_rng()
x = rng.integers(1, 100, size=(400), endpoint=True)
sns.histplot(x, color="gold")
plt.show()

#### Random

The Random method generates random floats in the shape defined by size in the range (0.0, 1.0) which is a continuous uniform distribution.

random.Generator.random(size=None, dtype=np.float64, out=None)

*size=None* : Select the number of results you would like and the shape of you require, for example if you want 2 rows of 4 numbers enter size = (2, 4) . The default is 1 if you do not enter a value. <br>

*dtype=np.float64* : The desired dtype of the result, options are float64 or float32. The default value is a 64 bit integar, however if you have memory restrictions you could use float32 as this used less memory and thus is quicker.

*out=None* : <br>
 
 TO DO - COMPLETE THE OUT=NONE EXPLANATION

In [None]:
# Generate 1 random float number
rng = np.random.default_rng()
rng.random()

In [None]:
# Generate 2 rows of 4 random float numbers with dtype float32
rng = np.random.default_rng()
rng.random(size=(2,4), dtype=np.float32)

In [None]:
rng = np.random.default_rng()
rng.random(size=(2,4))

To visualise, I'll use a scatterplot and histogram. Both show a generally even spread of numbers between 0 and 1.

In [None]:
rng = np.random.default_rng()
x = rng.random(size=(1000))
y = rng.random(size=(1000))
plt.scatter(x, y, c='r')
plt.show()
# https://www.codingem.com/scatter-plots-in-python/

In [None]:
rng = np.random.default_rng()
x = rng.random(size=(1000), dtype=np.float64,out=None)
sns.histplot(x, color="teal")
plt.show()

#### Choice

The Choice method generates a random sample from a given 1-D array specified by the argument a. However, if a is set as an int, the method will run as if a were an ndarray generated from np.arange(a). <br>
https://towardsdatascience.com/a-cheat-sheet-on-generating-random-numbers-in-numpy-5fe95ec2286

random.Generator.choice(a, size=None, replace=True, p=None, axis=0, shuffle=True)

*a* : the {array_like, int}
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated from np.arange(a).

*size* : Select the number of results you would like and the shape of you require. If the shape has more than 1 dimension then the size shape will be inserted into the axis dimension. The default is 1 if you do not enter a value.

*replace* : The default is True, meaning that a value of a can be selected multiple times. However if don't want a value to selected multiple times, then enter replace=False

*p* : The probabilities associated with each entry in a, note the probabilities must equal 100% in total. If not entered, the default is a uniform distribution on all entries in a.

*axis* : This allows which sample to choice from first if you are choosing from multiple arrays. The default of 0 selects by row.

*shuffle* : The default of shuffle=True allows the sample to be shuffled when sampling without replacement. If you require a quicker result, shuffle =False will produce a faster result.

In [None]:
# Generate 3 numbers between 0 and 4 where the number cannot be repeated
rng = np.random.default_rng()
rng.choice(5, 3, replace=False)

In [None]:
# Generate 2 rows of 8 numbers between 0 and 4
rng = np.random.default_rng()
rng.choice(5, size =(2,8))

In [None]:
# Generate 80 numbers between 0 and 10
rng = np.random.default_rng()
rng.choice(10, 80)


Visualise<br>
I'll use a histogram to visualise the random generation of 8000 numbers from 1 to 10, here we see the split of the tesults 

In [None]:
rng = np.random.default_rng()
x = rng.choice(10, 8000)
sns.histplot(x, color="gold")
plt.show()

Let's bring in probability into the mix, here we assign the probability of results to each number

In [None]:
# Generate 2 rows of 8 numbers between 0 and 4, applying a probability to the numbers as follows
# (0=50%, 1=10%, 2=0%, 3=10% & 4=30%). Assigning 0% to 2 means results in it not been generated as a random number.
rng = np.random.default_rng()
rng.choice(5, size =(2,8), p=[0.5,0.1, 0, 0.1, 0.3])

Visualise<br>
Using a histogram again, lets visualise the random generation of 8000 numbers from 0 to 4, however this time entering the probability as follows:<br>
0 50% <br>
1 10% <br>
2 0 <br>
3 10% <br>
4 30% <br>

In [None]:
rng = np.random.default_rng()
x = rng.choice(5, size =(8000), p=[0.5,0.1, 0, 0.1, 0.3])
sns.histplot(x, color="skyblue")
plt.show()

As I selected a 0% probability for the number 2, we see on the histogram there was no result for this. For the number 0, we selected a probability of 50% so we see roughly 50% of the results in this range

#### Bytes

random.Generator.bytes(length)

The Bytes method generates a random sample of bytes. <br>
Bytes function is suitable for cryptographic use, such as when generating salts, keys or initialization vectors

*length* : the number of random bytes you want returned 

Uses
https://www.php.net/manual/en/function.random-bytes.php


In [None]:
np.random.default_rng().bytes(7)

### Permutations

The Permutations function has 3 methods 
    <br>Shuffle 
    <br>Permutation
    <br>Permuted

Both shuffle and permutation methods are used to permute the sequence randomly. The major difference between them is that the shuffle() method modifies the sequence in-place and returns None, while the permutation() method generates a new ndarray of the same shape after the modification. Let’s see some examples below.
One thing to note is that when we permute a multi-dimensional array, it only works on its first axis as shown in the last two examples. In other words, the contents of the subarrays in a multi-dimensional array stay the same.<br>
https://towardsdatascience.com/a-cheat-sheet-on-generating-random-numbers-in-numpy-5fe95ec2286

#### Shuffle

In [None]:
# Shuffle numbers from 0 to 9
rng = np.random.default_rng()
arr = np.arange(10)
rng.shuffle(arr)
arr
# https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.shuffle.html#numpy.random.Generator.shuffle

random.Generator.shuffle(x, axis=0)

*x*: the array or list to be shuffled

*axis* : the axis which x is shuffled along. Default is 0. It is only supported on ndarray objects.



In [None]:
# To shuffle the numbers from 0 to 9
rng = np.random.default_rng()
arr = np.arange(10)
rng.shuffle(arr)
arr

In [None]:
# To shuffle the numbers from 100 to 200
rng = np.random.default_rng()
arr = np.arange(100,201)
rng.shuffle(arr)
arr

In [None]:
# To shuffle the numbers from 1 to 20 into 5 rows of 4 numbers. 
# To begin with the numbers are split into the 5 groups.
rng = np.random.default_rng()
arr = np.arange(1,21).reshape((5, 4))
print("original array\n", arr)

In [None]:
# The default is to shuffle by row, default is axis=0
rng.shuffle(arr)
print("Shuffled list by row\n", arr)

In [None]:
# Taking the same example, to shuffle by column, enter axis = 1 
rng.shuffle(arr, axis=1)
print("Shuffled list by column\n", arr)

#### Permutation

random.Generator.permutation(x, axis=0)

*x*: If x is an integer, randomly permute np.arange(x). If x is an array, make a copy and shuffle the elements randomly

*axis* : the axis which x is shuffled along. Default is 0.

In [None]:
# To randomly permute a sequence from 0 to 9
rng = np.random.default_rng()
print("Original list", np.arange(10))
print ("Shuffled List", rng.permutation(10))

In [None]:
# To randomly permute a sequence from the range entered
rng = np.random.default_rng()
list = [1001, 234, 751, 814, 96]
print("Original list", list)
arr = rng.permutation([1001, 234, 751, 814, 96])
print("Shuffled list", arr)

In the case of a two-dimensional array, axis=0 will, in effect, rearrange the rows of the array, and axis=1 will rearrange the columns. For example

In [None]:
# To permutate the numbers from 0 to 14 into 3 rows of 5 numbers each
# To begin with, the numbers are split into the 5 groups.
# The default is to shuffle by row, default is axis=0
rng = np.random.default_rng()
original = np.arange(0, 15).reshape(3, 5)
print("Original list\n", original)
shuffled = rng.permutation(original)
print ("Shuffled List\n", rng.permutation(shuffled))

In [None]:
# To permutate the numbers from 0 to 14 into 3 rows of 5 numbers each
# To shuffle by column, enter axis = 1
rng = np.random.default_rng()
original = np.arange(0, 15).reshape(3, 5)
print("Original list\n", original)
shuffled = rng.permutation(original)
print ("Shuffled List\n", rng.permutation(shuffled, axis=1))

### 3. Use and purpose of 5 Distributions functions

The Distributions function has 36 methods which can viewed on the NumPy page <br>
[Click to view full list of Distributions](https://numpy.org/doc/stable/reference/random/generator.html)

I will discuss a sample of 5 Distributions:
1. Normal Distribution
2. Binomial Distribution
3. Geometric Distribution
4. Logistic Distribution 
5. Pareto Distribution
6. Multinomial Distribution

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

x1 = rng.normal(20, 1,size=(50))
x2 = rng.normal(40, 10,size=(20))
y1 = rng.normal(20, 1, size=(50))
y2 = rng.normal(10, 2, size=(20))

fig, ax = plt.subplots(2, figsize=(10, 6))
ax[0].scatter(x = x1, y = y1, color='r')
ax[0].set_xlabel("Mean = 20, Std Dev = 1")
ax[0].set_ylabel("Frequency")

ax[1].scatter(x = x2, y = y2, color='g')
ax[1].set_xlabel("Mean = 40, Std Dev = 10")
ax[1].set_ylabel("Frequency")

plt.show()

#### > Binomial Distribution

Binomial Distribution is a Discrete Distribution.

It describes the outcome of binary scenarios (two outcomes), e.g. toss of a coin, it will either be head or tails.

random.Generator.binomial(n, p, size=None)

*n* - number of trials.

*p* - probability of occurence of each trial (e.g. for toss of a coin 0.5 each).

*size* - The shape of the returned array.

https://www.w3schools.com/python/numpy/numpy_random_binomial.asp

Difference Between Normal and Binomial Distribution
The main difference is that normal distribution is continous whereas binomial is discrete, but if there are enough data points it will be quite similar to normal distribution with certain loc and scale

In [None]:
from numpy import random
# Flip the coin 7 times, since it is a coin it is a 50:50 chance of getting heads enter a probability of 0.5)
x = random.binomial(n=7, p=0.5)
# the result will show how many times heads appeared in the 7 flips
x

In [None]:
from numpy import random
# Repeat the experiment 10 times
x = rng.binomial(n=7, p=0.5, size=10)
# the result will show how many times heads appeared for each of the experiments (7 flips per experiment)
x

In [None]:
# Plot the result 

sns.displot(rng.binomial(n=7, p=0.5, size=10), kde=False, color="pink")

plt.show()

The plot shows us that how many times heads was achieved in each trail of 10 flips. 

#### Geometric Distribution

Geometric Distribution is a Discrete Distribution.

Geometric probability distribution is about determining probabilities of discrete random variable X which represents number of trials it would take for the event to happen (first time)

Bernoulli trials are experiments with one of two outcomes: success or failure (an example of such an experiment is flipping a coin). The geometric distribution models the number of trials that must be run in order to achieve success. It is therefore supported on the positive integers, k = 1, 2, ....

random.Generator.geometric(p, size=None)

*p* - probability of occurence of each trial (e.g. for toss of a coin 0.5 each).

*size* - The shape of the returned array.

https://vitalflux.com/geometric-distribution-explained-with-python-examples/

In [None]:
x = rng.geometric(p=0.50, size=10)
x

In [None]:
#plotting 1000 sample from 
#different geometric distribution
# https://www.alphacodingskills.com/numpy/numpy-geometric-distribution.php

size = 1000
sns.ecdfplot(rng.geometric(0.2, size))
sns.ecdfplot(rng.geometric(0.5, size))
sns.ecdfplot(rng.geometric(0.8, size))

plt.legend(["p = 0.2", 
            "p = 0.5", 
            "p = 0.8"])
plt.show()

The plot shows how many trials need to happen before we get the desired result. We need to run fewer trails on the 80% probability vs the 20% probability as success happens quicker with a higher probability. 

#### Logistics Distribution

Logistic Distribution is used to describe growth.

It is used extensively in machine learning in logistic regression, neural networks etc.
These distributions help us in describing the statistical growth of the data. It is known for predicting how the growth will happen by taking in certain data. These distributions are continuous in nature. These distributions do resemble the normal distribution in certain ways, but they have heavier tails. The application of such distribution is in the fields of physics, logistic regression, hydrology and also machine learning and neural networks
https://morioh.com/p/a48980a45b4d

random.Generator.logistic(loc=0.0, scale=1.0, size=None)

*loc* - mean, where the peak is. Default 0.

*scale* - standard deviation, the flatness of distribution. Default 1.

*size* - The shape of the returned array.

https://www.w3schools.com/python/numpy/numpy_random_logistic.asp

In [None]:
# With a mean of 20 and a standard deviation of 5, return 10 random numbers 
x = rng.logistic(loc=20, scale=5, size=10)
x

The outcome show us that most results happen within the 15 to 25 range (5 either side of 20) with a few outliers beyond this range

In [None]:
# Using the default of a mean of 0 and a standard deviation of 1, visualise the results of 1000 random numbers 
sns.histplot(rng.logistic(size=1000), kde=True, color="green")
plt.show()

Again we see that most results happen within the standard deviation of the mean with outliers beyond this range

In [None]:
# Plot 3 logistics distributions together to see the impact
m1 = random.logistic(loc=0, scale=1, size=100)
m2 = random.logistic(loc=5, scale=2, size=100)
m3 = random.logistic(loc=10, scale=3, size=100)
sns.kdeplot(x=m1, color="blue", label="Mean 0, Std Dev 1")
sns.kdeplot(x=m2, color="red", label="Mean 5, Std Dev 2")
sns.kdeplot(x=m3, color="green", label="Mean 10, Std Dev 3")
plt.title('Logistics Distribution', size=14)
plt.ylabel('frequency')
plt.legend() 
plt.show()

The outcome show us that where the mean is zero and the standard deviation of 1, the majority of the results are within a tight range. However where the mean is 10 and the standard deviation of 3, the results are spread out more. 

#### Pareto Distribution

A distribution following Pareto's law or the 80-20 rule i.e. 80-20 distribution (20% factors cause 80% outcome).

It has two parameter:

a - shape parameter.

size - The shape of the returned array.

https://www.w3schools.com/python/numpy/numpy_random_pareto.asp

In [None]:
# To 
x = rng.pareto(a=2, size=(2, 3))
print(x)

In [None]:
sns.histplot(rng.pareto(a=2, size=10), kde=False, color="Purple")
plt.show()

The outcome shows that most of the results happen within the 20%  == TO FINISH

#### Multinomial Distribution

Multinomial distribution is a generalisation of binomial distribution.

It describes outcomes of multi-nomial scenarios unlike binomial where scenarios must be only one of two. e.g. Blood type of a population, dice roll outcome.

It has three parameters:

n - number of possible outcomes (e.g. 6 for dice roll).

pvals - list of probabilties of outcomes (e.g. [1/6, 1/6, 1/6, 1/6, 1/6, 1/6] for dice roll).

size - The shape of the returned array.

https://www.w3schools.com/python/numpy/numpy_random_multinomial.asp

In [None]:
diceroll = random.multinomial(n=60, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
diceroll
# Shows how many times it landed on 1, 2, 3, 4, 5 & 6 in total for the 60 throws of the dice

In [None]:
sns.histplot(diceroll,discrete=True)
plt.show()

 NOTES ON THE PLOT *************************

### 4. Seeds in generating pseudorandom numbers

The purpose of the seed is to allow the user to "lock" the pseudo-random number generator, to allow replicable analysis.

How Seed Function Works ?<br>
Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator. For the first time when there is no previous value, it uses current system time.<br>
https://www.geeksforgeeks.org/random-seed-in-python/

If you provide the the same input the alogrithm will produce the same output

https://towardsdatascience.com/stop-using-numpy-random-seed-581a9972805f


In [None]:
# Run the random function 
rng.random(size=(2,4))

In [None]:
# Run the same random function a second time
rng.random(size=(2,4))

Running the random function twice produces 2 completely different set of results. 
Let's set the seed this time and run the random function twice

In [None]:
# Setting the seed to 10
rng = np.random.default_rng(10)
rng.random(size=(2,4))

In [None]:
# Run a second time
rng = np.random.default_rng(10)
rng.random(size=(2,4))

Both times the random function runs, it produces the same result. Let's try it with a different seed number.

In [None]:
# Setting the seed to 2021
rng = np.random.default_rng(2021)
rng.random(size=(2,4))

In [None]:
# Run a second time
rng = np.random.default_rng(2021)
rng.random(size=(2,4))

Again the same results are produced when the function is run twice. 

The important thing about using a seed for a pseudo-random number generator is that it makes the code repeatable.

Remember what I said earlier?

… pseudo-random number generators operate by a deterministic process.

If you give a pseudo-random number generator the same input, you’ll get the same output.

This can actually be a good thing!

There are times when you really want your “random” processes to be repeatable.

Code that has well defined, repeatable outputs is good for testing.

Essentially, we use NumPy random seed when we need to generate pseudo-random numbers in a repeatable way.

NUMPY RANDOM SEED MAKES YOUR CODE EASIER TO SHARE

Ultimately, creating pseudo-random numbers this way leads to repeatable output, which is good for testing and code sharing.

Uses of 
Probability and Statistics
Random Sampling
Machine learning - splitting datasets into training and test sets requires random smapling 
Deep learning

https://www.sharpsightlabs.com/blog/numpy-random-seed/

A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. 
The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG's seed (which may include truly random values). Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.

PRNGs are central in applications such as simulations (e.g. for the Monte Carlo method), electronic games (e.g. for procedural generation), and cryptography. Cryptographic applications require the output not to be predictable from earlier outputs, and more elaborate algorithms, which do not inherit the linearity of simpler PRNGs, are needed.

Good statistical properties are a central requirement for the output of a PRNG. In general, careful mathematical analysis is required to have any confidence that a PRNG generates numbers that are sufficiently close to random to suit the intended use. John von Neumann cautioned about the misinterpretation of a PRNG as a truly random generator, and joked that "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."

https://en.wikipedia.org/wiki/Pseudorandom_number_generator

Algorithmic random number generators are everywhere, used for all kinds of tasks, from simulation to computational creativity. Learn more about algorithmic random number generation...

But despite their widespread use, the odds are that you're using a flawed random number generator.


Mersenne Twister
This RNG is one of the most widely used and highly recommended RNGs. It is provided in C++11 as mt19937 and mt19937_64, and is also the default random number generator for Python. (See the Mersenne Twister Wikipedia page for more details about the points listed below.)

Positive Qualities
Produces 32-bit or 64-bit numbers (thus usable as source of random bits)
Passes most statistical tests
Neutral Qualities
Inordinately huge period of 219937 - 1
623-dimensionally equidistributed
Period can be partitioned to emulate multiple streams
Negative Qualities
Fails some statistical tests, with as few as 45,000 numbers.
Predictable — after 624 outputs, we can completely predict its output.
Generator state occupies 2504 bytes of RAM — in contrast, an extremely usable generator with a huger-than-anyone-can-ever-use period can fit in 8 bytes of RAM.
Not particularly fast.
Not particularly space efficient. The generator uses 20000 bits to store its internal state (20032 bits on 64-bit machines), but has a period of only 219937, a factor of 263 (or 295) fewer than an ideal generator of the same size.
Uneven in its output; the generator can get into “bad states” that are slow to recover from.
Seedings that only differ slightly take a long time to diverge from each other; seeding must be done carefully to avoid bad states.
While jump-ahead is possible, algorithms to do so are slow to compute (i.e., require several seconds) and rarely provided by implementations.

https://www.pcg-random.org/other-rngs.html#id28

PCG 

This page is supposed to be about other RNGs besides the PCG family, but we may as well show how the PCG family compares.

Positive Qualities
Fast (if multiplication is fast)
Uses a small amount of memory (although the extended generators allow it to use an arbitrary amount of additional memory to extend the period)
Good for producing b-bit numbers for any b (or a stream of random bits)
Easily passes empirical statistical tests (and offers better statistical performance than any of the above generators)
pcg32 offers a 264 period and 263 distinct random streams, but arbitrarily large periods are possible (e.g., the library provides pcg32_k16384 which has a period of 2524352, which is vastly larger than the computational capacity of the universe)
Uniform
Extended generation scheme provides k-dimensional equidistribution for arbitrary k
Challenging to predict
Offers jump ahead and distance-between-states
Neutral Qualities
Can perform party tricks, like creating a generator which will produce exactly 3,141,592,653,589,793,238,463 random numbers, and then suddenly output a Zip file containing Hamlet.
Negative Qualities
New
Although it is less trivial to predict than many mainstream generators, that does not mean it should be considered crypographically secure. Exactly how hard it is to break different members of the PCG family is unknown at this time.


What's Wrong with Your Current RNG
Most random number generators in widespread use today have one of the following problems:

Not Actually Random
Behaving like a true and unbiased source of randomness seems like a fundamental requirement that any random number generator ought to satisfy, yet many RNGs fail statistical tests for randomness. Learn more...
Predictable & Insecure
Many RNGs can be predicted with after observing small amount of their output. If you use random numbers as a way to ensure fairness or unpredictability, that's a problem. Learn more...
Mediocre Performance
Many RNGs are either slow or require a relatively large amount of memory. Learn more...
Lack Useful Features
Most popular RNGs don't provide useful features like “jump ahead”. Learn more...

Sure, some RNGs are bad, but I'm using a good one, right?
Unless you're using a very esoteric RNG, odds are that the RNG you're using is flawed in one way or another. If you're using the Mersenne Twister, arc4random, ChaCha20, Unix's drand48, Unix random, Unix rand, XorShift*, RanQ1, or several others there are flaws you might want to know about. Learn more...


The PCG family combines properties not previously seen together in the same generation scheme:

It's really easy to use, and yet its very flexible and offers powerful features (including some that allow you to perform silly party tricks). Learn more...
It's very fast, and can occupy very little space. Learn more...
It has small code size. Learn more...
It's performance in statistical tests is excellent (see the PCG paper for full details).
It's much less predictable and thus more secure than most generators.
It's open source software, with a permissive license (the Apache license).

https://www.pcg-random.org/index.html

PCG is a family of simple fast space-efficient statistically good algorithms for random number generation.

https://www.pcg-random.org/index.html

When the Mersenne Twister made his first appearance in 1997 it was a powerful example of how linear maps
on F2 could be used to generate pseudorandom numbers. In particular, the easiness with which generators
with long periods could be defined gave the Mersenne Twister a large following, in spite of the fact that such
long periods are not a measure of quality, and they require a large amount of memory. Even at the time of
its publication, several defects of the Mersenne Twister were predictable, but they were somewhat obscured
by other interesting properties. Today the Mersenne Twister is the default generator in C compilers, the
Python language, the Maple mathematical computation system, and in many other environments. Nonetheless,
knowledge accumulated in the last 20 years suggests that the Mersenne Twister has, in fact, severe defects, and
should never be used as a general-purpose pseudorandom number generator. Many of these results are folklore,
or are scattered through very specialized literature. This paper surveys these results for the non-specialist,
providing new, simple, understandable examples, and it is intended as a guide for the final user, or for language
implementors, so that they can take an informed decision about whether to use the Mersenne Twister or not.
https://arxiv.org/pdf/1910.06437.pdf

References:

Numpy image : https://icon-icons.com/icon/numpy-logo/168073 <br>
PCG image : https://www.pcg-random.org/index.html

Purpose of the NumPy random package:

[1] https://www.w3schools.com/python/numpy/default.asp <br>
[2] https://www.javatpoint.com/numpy-random <br>
[3] https://www.w3schools.com/python/numpy/numpy_random.asp <br>

Use of Simple random data and Permutations functions: 

[4] https://www.w3schools.com/python/numpy/default.asp <br>
[5] https://www.javatpoint.com/numpy-random <br>
[6] https://www.w3schools.com/python/numpy/numpy_random.asp <br>


Use and purpose of 5 Distributions functions:

[7] https://www.w3schools.com/python/numpy/default.asp <br>
[8] https://www.javatpoint.com/numpy-random <br>
[9] https://www.w3schools.com/python/numpy/numpy_random.asp <br>

Use of seeds in generating pseuodorandom numbers: 

[10] https://www.w3schools.com/python/numpy/default.asp <br>
[11] https://www.javatpoint.com/numpy-random <br>
[12] https://www.w3schools.com/python/numpy/numpy_random.asp <br>
