# The numpy.random package

***

This notebook will discuss the numpy.random package in Python.

It will focus on:
* Explaining the overall use of the package.
* Explaining the use of "Simple random data" and "Permutations" functions.
* Explaining the use and purpose of 5 "Distributions" functions.
* Explaining the use of seeds in generating pseudorandom numbers.

***

## Overview of numpy.random

***

1. Overview of NumPy and what it's used for
2. Arrays 
3. Overview of NumPy.random

From Numpy's documentation:

"NumPy (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering. It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python packages.

The NumPy library contains multidimensional array and matrix data structures. It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it. NumPy can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices." [1]

So what does this mean?

"NumPy gives you an enormous range of fast and efficient ways of creating arrays and manipulating numerical data inside them. While a Python list can contain different data types within a single list, all of the elements in a NumPy array should be homogeneous. The mathematical operations that are meant to be performed on arrays would be extremely inefficient if the arrays weren’t homogeneous.

Why use NumPy?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further." []

"An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the array dtype.

An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The rank of the array is the number of dimensions. The shape of the array is a tuple of integers giving the size of the array along each dimension." []

"You might occasionally hear an array referred to as a “ndarray,” which is shorthand for “N-dimensional array.” An N-dimensional array is simply an array with any number of dimensions. You might also hear 1-D, or one-dimensional array, 2-D, or two-dimensional array, and so on. The NumPy ndarray class is used to represent both matrices and vectors. A vector is an array with a single dimension (there’s no difference between row and column vectors), while a matrix refers to an array with two dimensions. For 3-D or higher dimensional arrays, the term tensor is also commonly used." []

"There are 6 general mechanisms for creating arrays:

Conversion from other Python structures (i.e. lists and tuples)

Intrinsic NumPy array creation functions (e.g. arange, ones, zeros, etc.)

Replicating, joining, or mutating existing arrays

Reading arrays from disk, either from standard or custom formats

Creating arrays from raw bytes through the use of strings or buffers

Use of special library functions (e.g., random)" []

So, what is this special library function?

"Numpy's random number routines produce pseudo random numbers using combinations of a BitGenerator to create sequences and a Generator to use those sequences to sample from different statistical distributions" [1]. 

There is some new terminology described in the Numpy documentation, specifically, pseudo random numbers, a Bit Generator and a Generator. Let's take a look at the explanation of each of these in order to understand what numpy.random is actually doing in layman's terms.

First of all, what is a pseudo random number?

In order to explain this, we will need to have a deeper understanding of "random" in computing.

This is explained very eloquently on W3Schools:

"Random number does NOT mean a different number every time. Random means something that can not be predicted logically. Computers work on programs, and programs are definitive sets of instructions. [Therefore]... it means there must be some algorithm to generate a random number as well. If there is a program to generate random number[s] it can be predicted, this it is not truly random. Random numbers generated through a generation algorithm are called pseudo random" [2].

Now that we have an understanding of what a pseudo random number is, what are the Bit Generator and Generator that we have to use in order to produce them, as the documentation says?

"BitGenerators: Objects that generate random numbers. These are typically unsigned integer words filled with sequences of either 32 or 64 random bits.

Generators: Objects that transform sequences of random bits from a BitGenerator into sequences of numbers that follow a specific probability distribution (such as uniform, Normal or Binomial) within a specified interval.

Since Numpy version 1.17.0 the Generator can be initialized with a number of different BitGenerators. It exposes many different probability distributions." []


# Functions in numpy.random

***

## Simple random data

URL: https://numpy.org/doc/stable/reference/random/legacy.html#simple-random-data

***

* rand
* randn
* randint
* random_integers
* random_sample
* choice
* bytes

## Permutations

URL: https://numpy.org/doc/stable/reference/random/generated/numpy.random.permutation.html

https://numpy.org/doc/stable/reference/random/index.html#random-quick-start

https://www.w3schools.com/python/numpy/numpy_random_permutation.asp

***

* shuffle
* permutation

The permutation function is used to "randomly permute a sequence, or return a permuted range. If [it] is a multi-dimensional array, it is only shuffled along its first index." []. 

For example:

https://numpy.org/doc/stable/reference/random/legacy.html#permutations

In [21]:
from numpy.random import default_rng
rng = default_rng()
values = ([1,2,3,4,5,6,7,8,9,10])
print (rng.permutation(values))

[ 2  3  8 10  5  9  7  1  6  4]


***

## Distributions

***

There are many functions available in numpy which can be used to visualise your data. This notebook is going to look at 5 of them in-depth.

https://numpy.org/doc/stable/reference/random/legacy.html#distributions

https://numpy.org/doc/stable/reference/random/legacy.html#functions-in-numpy-random

# Seeds in numpy.random

***

https://numpy.org/doc/stable/reference/random/legacy.html#seeding-and-state

* get_state
* set_state
* seed

https://numpy.org/doc/stable/reference/random/generator.html#random-generator

# References

***

[1] https://numpy.org/doc/stable/reference/random/index.html

[2] https://www.w3schools.com/python/numpy/numpy_random.asp

### End