# The Numpy.Random Package - an overview

##### This notebook give a detailed overview of the Random Package, found in the Numpy library, including examples of its use. 

## Purpose
The NumPy.Random package is a sub-package of the NumPy library, a library in the Python programming language. The NumPy library provides functions and data structures that are useful for numerical processing for data analytics purposes. A common need in data analytics is generating random samples of data. The NumPy.Random package exists for this purpose. 

In this notebook, the primary uses of the NumPy.Random package are explained. Some functions in the package will be explained in detail, as will some distributions that the package can generate. The generation of random numbers using seeds will also be explained. 

In [1]:
# import numpy package for use in the notebook
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

## Functions
What follows is an explanation of some basic funtions that can be found in the NumPy.Random package.
### Simple Random Data
Four functions in Numpy.Random that can be used to generate random data are:
- integers
- random
- choice
- bytes

Examples of their use is given below.

#### Integers
This function takes an integer as an argument and returns a random integer between 0 and the input. If an upper limit is specified, the returned integer is between the input integer and the upper limit. If a size is specified, an array of random integers with the specified size and shape is returned. The results are uniformly distributed. 

In [6]:
# The integer function, like a lot of functions, is accessed through the path numpy.random.default_rng()
# For ease of use, it's being assigned here as the variable rndm. 
rndm = np.random.default_rng()

# With just one argument, a random number between 0 and the input is output. 
rndm.integers(8)

1

In [None]:
# In this example, the function will output an array with 5 items as specified in the size argument. 
# With low and high arguments specified, integers between 1 and 100 are selected. 
rndm.integers(low=1, high=100, size=5)

The endpoint argument can be used to include or exclude the upper limit from being randomly chosen. The datatype of the output can also be specified.

In [None]:
# Arrays with multiple dimensions can be specified, and the datatype can be changed from the default signed 64 bit integer.
# Here the datatype has been changed to an unsigned 16 bit integer.
# The upper limit is included when endpoint = True.
rndm.integers(low=0, high=2, size=(3, 4), endpoint=True, dtype=np.uint16)

#### Random
This function works similarly to the integers function, but deals with floats. If no argument is specified, it returns a float between 0 up to but not including 1. 

In [None]:
# No argument specified
rndm.random()

The default argument gives the size of an array to output the result to. The results are uniformly distributed. 

In [None]:
# Random numbers between 0 and 1 are selected and output to an array with 1 row and 15 columns
rndm.random(15)

Unlike with the integers function, upper and lower limits cannot be specified. Because of this, usually the function is used to multiply with other numbers to get random numbers greater than 1. 

In [None]:
# The following returns a 3x4 array of floats between 0 and 5. 
5 * rndm.random((3, 4))

#### Choice
This takes in an array or an integer. If an array, it returns a sample of the items in that array. If an integer, it returns a random integer between 0 and the integer exclusive, determined using the function np.arange(). if a second integer is specified, this

In [None]:
rndm.choice(12)

In [None]:
rndm.choice([2, 4, 6, 8])

Similar to the integers and random functions, the size of the output can be specified. The probability of each entry in the input array being chosen can be specified using the p argument and passing in an array with the probabilities. Whether or not the item is replaced in the pool of possible choices after it is selected is determined by the replace argument, which is True by default. If set to false, the item can only be chosen once. 

In [None]:
# With replacement
rndm.choice(5, size=3, p=[0.1, 0.1, 0.2, 0.4, 0.2])

In [None]:
# Without replacement. The items with less probability are now more likely to be chosen.
rndm.choice(5, size=3, p=[0.1, 0.1, 0.2, 0.4, 0.2], replace=False)

The function by default selects by row for arrays greater than 1 dimension. Setting the axis argument =1 will select by column. The shuffle argument is True by default and shuffles the array when sampling without replacement. Setting this =False can improve performance.

In [None]:
twoDarrcol = [[0, 2, 4, 6, 8],[1, 3, 5, 7, 9]]
rndm.choice(twoDarr, axis=1, replace=False, shuffle=False)

In [None]:
twoDarrrow = [[0, 2, 4, 6, 8],[1, 3, 5, 7, 9]]
rndm.choice(twoDarr, axis=0, replace=False, shuffle=False)

#### Bytes
This function takes an integer as an argument and returns a random string of characters. The number of characters is determined byt the input argument.  

In [None]:
rndm.bytes(15)

The number of characters can be checked using the len() function. 

In [None]:
byt_out = rndm.bytes(15)
print(len(byt_out))

### Permutation Functions
In mathematics, a permutation describes the linear order of a set of numbers. 
The NumPy.Random package has functions that randomly change the order of items in an array. These are:
* shuffle
* permutation
* permuted

#### Shuffle
This function takes in an iterable such as an array, shuffles its contents, then returns the output to the original variable. The axis argument can be used to specify whether to suffle along the rows (0) or the columns (1).

In [7]:
# arange() returns an array with items 0-input integer exclusive. 
it = np.arange(9)
rndm.shuffle(it)
print(it)

[3 4 2 5 8 0 6 7 1]


In [14]:
twodarrRows = np.arange(32).reshape((4, 8))
rndm.shuffle(twodarrRows, axis=0)
print(twodarr)

[[ 4  5  2  6  0  7  3  1]
 [12 13 10 14  8 15 11  9]
 [20 21 18 22 16 23 19 17]
 [28 29 26 30 24 31 27 25]]


In [15]:
twodarr = np.arange(32).reshape((4, 8))
rndm.shuffle(twodarr, axis=1)
print(twodarr)

[[ 5  2  7  1  3  6  0  4]
 [13 10 15  9 11 14  8 12]
 [21 18 23 17 19 22 16 20]
 [29 26 31 25 27 30 24 28]]


#### Permutation
This function returns a random permutation of numbers depending on a set of inputs. The permutation is returned as an array. 

In [8]:
# The function can accept an integer, in this case, 8.
# This will return a random set of 8 integers from 0 up to but not including 8
rndm.permutation(8)

array([1, 3, 4, 7, 6, 5, 0, 2])

In [9]:
# The function can also accept a list and return it with the items rearranged
rndm.permutation([23, 7, 1, 83, 31, 6, 301])

array([  7,  23,  83,   6,  31, 301,   1])

#### Permuted
This function takes only arrays. The output array can be specified by the user using the out argument.

In [17]:
in_arr = np.arange(32).reshape(4, 8)
out_arr = rndm.permuted(in_arr, axis=0)
print(in_arr, out_arr)

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30 31]] [[ 0 17 26 27 28 29 30  7]
 [ 8  1 10 11 12  5  6 15]
 [16  9 18  3  4 21 14 31]
 [24 25  2 19 20 13 22 23]]


In [18]:
rndm.permuted(in_arr, axis=0, out=in_arr)
print(in_arr, out_arr)

[[16 25  2  3 12 13 22 15]
 [24  1 18 11 20 21 30 31]
 [ 0  9 26 27 28  5  6 23]
 [ 8 17 10 19  4 29 14  7]] [[ 0 17 26 27 28 29 30  7]
 [ 8  1 10 11 12  5  6 15]
 [16  9 18  3  4 21 14 31]
 [24 25  2 19 20 13 22 23]]


## Distributions
A probability distribution is a plot that shows the possible values a variable can be and how frequently the values occur. For example, the measurements of heights of members of a popluation shown in a histogram would show that most of the measurements cluster around an average/mean and get less frequent at taller and shorter heights. A mathematical function describes the curve under which the measurement frequencies will most likely be found, this gives the probability distribution. The distribution in this case would most likely be a "normal" or "Gaussian" distribution.

Probability distributions can be **discrete** (data can be only certain values) or **continuous** (data can be any value within a range). There exist different mathematical functions that describe the different probability distributions that can exist, depending on the circumstances of data selection and properties of the data. The Numpy.Random package has functions that are capable of randomly sampling data according to many different probability distributions. In this notebook, examples of five different probability distribution functions will be demonstrated using Numpy.Random to sample the data and Seaborn to visalise it. 



### Normal Distribution

In [None]:
res_arr = np.random.normal(loc=0,scale=1,size=(400,))
plt.hist(res_arr)
sns.displot(res_arr, kind='kde')
plt.show()

### Uniform Distribution

In [None]:
uni_arr = np.random.uniform(-1,0,1000)
plt.hist(uni_arr)
sns.displot(uni_arr, kind='kde')
plt.show()

### Weibull Distribution

In [None]:
shape=5
wei_arr = np.random.weibull(shape,400)
plt.hist(wei_arr)
sns.displot(wei_arr, kind='kde')
plt.show()

### Binomial distribution

In [None]:
bi_arr = np.random.binomial(36,1/6,400)
plt.hist(bi_arr)
sns.displot(bi_arr, kind='kde')
plt.show()

### Poisson distribution

In [None]:
poi_arr = np.random.poisson(2,4000)
plt.hist(poi_arr)
sns.displot(poi_arr, kind='kde')
plt.show()

## Random Number Generation (RNG)

The ability to generate sequences of random numbers is useful for many applications, including security, statistical sampling, cryptography, gambling and gaming. True random number generation is difficult to achieve without relying on random natural phenomena such as radioactive decay and thermal fluctuations. Without this, numbers can instead be generated in what is called **pseudorandom** methods. This requires use of an **algorithm** to generate the sequence of numbers, and a **seed** to initialise the process. This can simulate random occurences, but if the seed and algorithm are known, the pattern can be determined, hence it is called "pseudo" random as opposed to "true" random.

### Using Numpy for RNG

Generator and BitGenerator are classes within the Numpy.Random package that are used in combination to generate pseudorandom numbers and create statistical distributions. BitGenerator generates a random stream of numbers, which Generator then rearranges into probability distributions like those described above. The default algorithm used is **PCG64** as can be seen when the function .default_rng() is used.

## References
* Distributions: https://www.bmc.com/blogs/numpy-statistical-functions/
* Numpy simple random functions: https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.default_rng
* Random number generation: https://techlinkcenter.org/technologies/hardware-based-random-number-generator-leverages-the-variations-in-materials/37920ab6-513f-440f-ac3a-a518016a652f
* BitGenerator and Generator: https://numpy.org/doc/stable/reference/random/index.html