# Welcome to the statistics and uncertainty workshop at readr

Since we'll be using them all day, let's import some scipy packages

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Random variables

## A Monte-Carlo method for simulating random variables

To start working with random variables we need to build some computational models of them. Lets start by writing a function that generates samples from a given probability density function.

We do this using rejection sampling. It involves generating random points in a given space and rejecting any that lie above our probability density function.

Begin by writing a function that generates a random sample somewhere in the space (xmin<=x<xmax, ymin<=y<ymax) 
*hint: look up the documentation for numpy.random.random

In [None]:
def uniform_sample(xmin, xmax, ymin, ymax):
    return np.random.random(2,)*np.array([xmax-xmin,ymax-ymin])+np.array([xmin, ymin])

Let's test it on a region of x in \[-10,10) and y in \[0,5)

In [None]:
nsamples = 1000
samples = np.zeros([nsamples,2])
for i in range(nsamples):
    samples[i]=uniform_sample(-10, 10, 0, 5)

plt.scatter(samples[:,0],samples[:,1])
plt.show()

Let's write a function to generate one random sample from a given probability density function

In [None]:
def sample_randv(pdf, xmin, xmax, ymin, ymax):
    while True:
        sample = uniform_sample(xmin, xmax, ymin, ymax)
        if pdf(sample[0])>sample[1]:
            break
    
    return sample[0]

Next, write a normal probability density function that has a variance of 1 and is centered at zero:

In [None]:
norm_pdf = lambda x: np.exp(-x**2/2)

Let's test it!

*Note the the build_samples function uses an area of [-10,10] by [0,1]. It's best to use PDFs that fill this space but don't go outside it!

In [None]:
NormSamples = []
for i in range(10000):
    NormSamples.append(sample_randv(norm_pdf, -10, 10, 0, 1))

print('Mean: {}'.format(np.mean(NormSamples)))
print('Variance: {}'.format(np.var(NormSamples)))

plt.hist(NormSamples, bins='auto')
plt.show()

## Functions of random variables

We can take functions of random variables just as we can for regular variables. Transforming a random variable  <b>X</b> through a function <b>f</b> will modify the probability density function.

Try to apply the function <b>f=(X^2)</b> to a normal distribution and compare the result to what we 'derived' (found on Wikipedia) in the slides. You might need to play with the vertical scaling on the derived function to get it to overlap.

In [None]:
YSamples = np.array(NormSamples)**2

fYDerived = lambda x: np.exp(-x/2)/np.sqrt(2*np.pi*x)
    
print('Mean: {}'.format(np.mean(YSamples)))
print('Variance: {}'.format(np.var(YSamples)))

nbins = 100

xrange = np.arange(0.1,10,0.01)
plt.plot(xrange, 14*nbins*fYDerived(xrange))

plt.hist(YSamples, bins=nbins)
plt.show()

# Linear functions of random variables

For linear functions, things aren't so bad. We can get a normally distributed random variable with any center point and variance by rescaling and shifting our distribution. Apply the function <b>f=m*X+b</b> to <b>X</b> with m=0.2 and b=3.

In [None]:
YSamples = 0.2*np.array(NormSamples)+3
    
print('Mean: {}'.format(np.mean(YSamples)))
print('Variance: {}'.format(np.var(YSamples)))

plt.hist(YSamples, bins='auto')
plt.show()

# The central limit theorem

Write a handful of different probability density functions that have a roughly zero mean and a variance on the order of ~1. Don't worry about normalisation, but keep them bounded by 0 and 1.

In [None]:
def f1(x):
    return norm_pdf(x)

def f2(x):
    if x>1 or x<-1:
        return 0
    else:
        return 1
    
def f3(x):
    y = 1-np.abs(x)
    if y<0:
        return 0
    else:
        return y
    
def f4(x):
    y = np.abs(x)
    if y>1:
        return 0
    else:
        return y
    
def f5(x):
    y = np.cos(x)
    if y<0 or x>np.pi or x<-np.pi:
        return 0
    else:
        return y

Let's test them!

In [None]:
samples = []
for i in range(10000):
    samples.append(sample_randv(f4, -10, 10, 0, 1))

print('Mean: {}'.format(np.mean(samples)))
print('Variance: {}'.format(np.var(samples)))

plt.hist(samples, bins='auto')
plt.show()

What happens if a random variable is a linear combination of different random variables? Let's see!

We'll make sample sets from 20 different random variables by multiplying our probability density functions with random scaling factors. We can then take the element-wise mean of the sample sets to get a new derived distribution.



In [None]:
funcs = [f1,f2,f3,f4,f5]

NRandomFunctions = 20
NSamples = 2000

samples = []
for i in range(NRandomFunctions):
    fsamples = []
    fWeight = 2*np.random.random()
    iFunction = np.random.randint(len(funcs))
    for j in range(NSamples):
        fsamples.append(fWeight*sample_randv(funcs[iFunction], -10, 10, 0, 1))
    samples.append(fsamples)
    
sampleArray = np.array(samples)
samples = sampleArray.mean(axis=0)

In [None]:
print('Mean: {}'.format(np.mean(samples)))
print('Variance: {}'.format(np.var(samples)))

xrange = np.arange(-0.5,0.5,0.01)

plt.hist(samples, bins='auto')
plt.plot(xrange,160*np.exp(-xrange**2/(2*np.var(samples))))
plt.show()