# Random sampling
Click [here](https://datahub.berkeley.edu/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fberkeley-physics%2FPython-Tutorials&urlpath=tree%2FPython-Tutorials%2F3+-+Specific+topics%2FRandom+sampling.ipynb&branch=master) to open this notebook in the DataHub.

## Learning objectives
By the end of this tutorial, you will be able to:
- Sample from a uniform random distribution
- Sample from a discrete distribution
- Sample from a normal distribution
- Sample from a multivariate Gaussian distribution with a given covariance matrix

## Relevant documentation
- [Numpy `random.Generator`](https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator)

## Technical note
This tutorial uses a new RNG algorithm that was added to numpy in a relatively recent version. If the environemnt you are running this on does not have a new enough numpy distirbution, you will get the error `TypeError: module 'numpy.random' has no attribute 'default_rng'` in the first cell below. If you get such an error, try running the cell below, which will upgrade to the newest version of numpy, and then restart the kernel (under the "Kernel" menu option above). 

In [None]:
!pip install --upgrade numpy

## Uniform distributions
NumPy contains a "random" number generator, which uses a complex algorithm (you can read about it in the docs) to deterministically generate a pseudo-random sequence of independent uniformly-distributed bits, based on a given _seed_. This allows you to obtain seemingly random numbers that can be later recreated.

To obtain an object representing an RNG, call `numpy.random.default_rng`, passing your seed. If you don't set the seed explicitly, it will use a "random" number from the OS, such as the current system time, or some properties of memory addresses.

In [1]:
import numpy as np

rng = np.random.default_rng(56735) #use your own seed (or remove it)
type(rng)

numpy.random._generator.Generator

You can obtain random floats between 0 and 1 by calling the `random` method on the RNG object.

In [None]:
rng.random(5)

Fill in the following function to scale and shift a bunch of random integers between 0 and 1 in an array `randoms` to a uniform distribution between any two given endpoints $a<b$:

In [None]:
def uniform_distribution(randoms, a, b):
    return 

You can use the `hist` function in matplotlib to plot a histogram.

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

plt.hist(uniform_distribution(rng.random(2000),-3,5))

## Discrete distributions
To sample from a uniform distribution of integers, use the `integers` method:

In [None]:
rng.integers(10,size=20)

You can also map this to equiprobable choices from a given array:

In [None]:
rng.choice(["red","blue","green"],30)

There are several other methods that might be useful: see the docs for a list of these. 

What if we wanted to sample from a non-uniform distribution? For example, suppose we wanted to get 0 with 90% porbability and 1 with 10% probability. We can re-use the `random` method (which returns floats between 0 and 1), as shown below.

In [None]:
bools = rng.random(50) > 0.9
ints = bools.astype(int)
ints

Using a similar technique, fill in the following function that, given an array of random floats (between 0 and 1) and an array of probabilities $p_0,p_1,p_2,...$ (summing to 1) returns an array of random integers where index $i$ appears with probability $p_i$.

In [None]:
def discrete_distribution(randoms, probs):
    x = zeros(randoms.size, dtype=int)
    
    return x

discrete_distribution(rng.random(50),[0.1,0.6,0.3])

## Gaussian distributions
Typically the strategy to sample from non-uniform continuous distributions is to transform the uniformly deviates in a way that the probability distribution transforms to the desired one. For several commonly-used distributions, these transformations have been implemented by NumPy (see the docs). To sample from the standard normal distribution (Gaussian centred at 0 with unit variance), use the `standard_normal` method.

In [None]:
randoms = rng.standard_normal(10000)
plt.hist(randoms, bins=50)

You can change the mean and standard deviation by sclaing and shifting the random numbers you obtain, or by using the `normal` method.

In [None]:
randoms = rng.normal(100,20,10000)
plt.hist(randoms, bins=50)

To sample from a given covaraince matrix, use the `multivarite_normal` method.

In [None]:
mean = [1,-2]
cov = [[1,-0.3],[-0.3,0.5]]
randoms = rng.multivariate_normal(mean, cov, 20000)
plt.hist2d(randoms[:,0], randoms[:,1], bins=50)