# Randomness and Reproducibility

> In this notebook, we are going to cover the randomness and its uses in python, and utilize python seed to reproduce analysis. It will also show demonstration about generating random variables from a probability distribution, and random sampling from a population.

- toc: true 
- badges: true
- comments: true
- author: Chanseok Kang
- categories: [Python, Coursera, Visualization]
- image: 

In [2]:
import random
import numpy as np

## What is Randomness?

In the beginning of this week's lectures, we touched on the significance of randomness when it comes to performing statistical inference on population samples.  If we have complete randomness, our estimates of means, proportions, and totals are ubiased.  This means our estimates are equal to the population values on average. 

In Python, we refer to randomness as the ability to generate data, strings, or, more generally, numbers at random.

However, when conducting analysis it is important to consider reproducibility.  If we are creating random data, how can we enable reproducible analysis?

We do this by utilizing **pseudo-random number generators (PRNGs)**.  PRNGs start with a random number, known as the seed, and then use an algorithm to generate a psuedo-random sequence based on it.

This means that we can replicate the output of a random number generator in python simply by knowing which seed was used.

We can showcase this by using the functions in the python library [*__random__*](https://docs.python.org/3/library/random.html).

### Setting a Seed and Generating Random Numbers

In [3]:
random.seed(1234)

random.random()

0.9664535356921388

In [4]:
random.random()

0.4407325991753527

In [5]:
random.seed(1234)
random.random()

0.9664535356921388

## Random Numbers from Real-Valued Distributions

### Uniform

In [6]:
random.uniform(0, 1)

0.4407325991753527

In [10]:
unifNumbers = [random.uniform(0, 1) for _ in range(100)]
unifNumbers[:5]

[0.10492879371292163,
 0.9151463405506053,
 0.49261849780239064,
 0.6356612096039463,
 0.821002397522168]

### Normal

In [11]:
mu = 0
sigma = 1

random.normalvariate(mu, sigma)

1.626094987171871

In [12]:
mu = 5
sigma = 2

random.normalvariate(mu, sigma)

2.5884118640245717

In [13]:
normNumbers = [random.normalvariate(mu, sigma) for _ in range(100)]
normNumbers[:5]

[4.745035300333395,
 -0.26607739126651087,
 6.080060473690979,
 7.257861602356256,
 3.6995005838953183]

### Random Sampling from a Population

From lecture, we know that **Simple Random Sampling (SRS)** has the following properties:

* Start with known list of *N* population units, and randomly select *n* units from the list
* Every unit has **equal probability of selection =** $\frac{n}{N}$
* All possible samples of size *n* are equaly likely
* Estimates of means, proportions, and totals based on SRS are **UNBIASED** (meaning they are equal to the population values on average)

In [15]:
mu = 0
sigma = 1

population = [random.normalvariate(mu, sigma) for _ in range(10000)]

In [17]:
sampleA = random.sample(population, 500)
sampleB = random.sample(population, 500)

In [18]:
np.mean(sampleA)

-0.007751737484381486

In [19]:
np.mean(sampleB)

-0.030050990136551485

In [20]:
np.std(sampleA)

0.9828571981089269

In [21]:
np.std(sampleB)

0.9782533949229204

In [22]:
means = [np.mean(random.sample(population, 1000)) for _ in range(100)]

In [23]:
np.mean(means)

-0.003968800335502866

In [25]:
stds = [np.std(random.sample(population, 1000)) for _ in range(100)]

np.mean(stds)

0.9949989707904869