# Basic Probability

## Random Variable
A random variable, usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon. There are two types of random variables, discrete and continuous.

The range space of a discrete RV is discrete, while for a continuous RV it is continuous. Discrete RV's have a probability distribution whereas continous RV's have a probability density function

Probability theory works on random variables, and all definations examples are best understood in this framework

Symbols: $X$ for the RV, $x$ for its value, $f_{X}(x)$ for its pdf. ($P_{X}(x)$ for discrete)

#### Cdf
Cdf (cumulative density function) of a Random variable X is denoted by $F_{X}(x)$.

The cdf at x is the probability that if we sample/measure X, its value will be less than x. 
**$ F_{X}(x) = P(X \le x ) $**

Cdf functions can be calculated by:-

Discrete: $ F_{X}(x) = \sum_{K \le x} P_x(k) $ and Continuous: $ F_{X}(x) = \int_{-\infty}^{x} f_X(t) dt $

**Quantile: ** $ q_{X}(t) = F_{X}^{-1}(t) $, or the inverse of cdf function gives us the value of x before which the probabilty is t, and is useful in MCMC methods, simulations.

#### Multiple Random variables
For the case of two random variables, let the RVs be $X Y$, and the values be $x y$. The various probabilities for the two will be:-

a. Joint pdf; $ f_{X,Y}(x,y) = P(X = x, Y = y) $. Joint probability of both variables having a specific value

b. Marginal pdf; $ f_{X}(x) = \int_{-\infty}^{+\infty} f_{X,Y}(x,y) dy $, the probabilty of the random variable X having value x, if we have not observed y. Ie, pdf of X if we have the joint pdf

c. Conditional pdf; $ f_{X,y}(x,y) = \frac{f_{X,Y}(x,y)}{f_{X}(x)} $. Probability of RV X having a value x, conditioned on some value y for RV Y. The denominator is for normalisation. 

Similary this can be extended to multiple random variables


## Expection, Variance, Moments

Expectations are weighted averages of functions of random variables, where the weights are the pdf of the random variable and the values are values of the random variable.

The **expected value** of a function $g(x)$ of a continous random variable X with pdf $f_X(x)$ is:

$ \mathbb{E} [g(X)] = \int_{-\infty}^{+\infty} g(x)f_{X}(x) dx  $

It means that if X is a RV then $ \mathbb{E} [g(X)]$ is the average value we get when we sample the transformed value $g(X)$ many times. Note:

1. Any transformation of a RV is also a RV, hense here g(X) is also a RV.
2. The pdf if the weight as more the pdf  more probabilty of that value occuring
3. The fact that we sample many times leads to a useful estimate of $ \mathbb{E} [g(X)]$ 

It does not mean the expected value of the RV so never say that, rather it is always reffered to as the expectation value

#### Means

$ \mu_{X} = \mathbb{E} [X] = \int_{-\infty}^{+\infty} xf_{X}(x) dx $, is the mean or first moment of X.

$ \mathbb{E} [X^r] = \int_{-\infty}^{+\infty} x^rf_{X}(x) dx $, is the $r^{th}$ moment

#### Variance

Variance of any RV X is its secound central moment defined by:

$ Var(X) = \mathbb{E} [(X- \mathbb{E} )^2] = \int_{-\infty}^{+\infty} (x - \mu_{X})^2 f(x)dx $ 

After some calculations:- $Var(X) = \mathbb{E} [X^2 + \mathbb{E}^2 - 2X\mathbb{E} ] =  \mathbb{E}[X^2] + (\mathbb{E}[X])^2 -2\mathbb{E}[X]\mathbb{E}[X]$, hense

$Var(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$ (imp) or $\mathbb{E}[X^2] - (\mu_{X})^2 = \int_{-\infty}^{+\infty} x^2f_X(x)dx - (\mu_{X})^2 $

#### Multiple Random Variables

Expectation Value of the product of two random variables: $ \mathbb{E} [XY] = \int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} xy f_{X,Y}(x,y) dx dy $

Covariance of X,Y:

$\sigma_{X,Y} = cov(X,Y) = \mathbb{E}[(x-\mu_{X})(y-\mu(Y))] = \mathbb{E}[XY] - \mu_{X} \mu_{Y} $

$ \rho(X,Y) = correlation(X,Y) = \frac{cov(X,Y)}{\sqrt{Var(X)Var(Y)}} = \frac{\sigma_{X,Y}}{\sigma_{X} \sigma_{Y}} $, the correlation actually has a meaning between the physical relations of the two radom varaibles, as to how they vary with each other.

#### Independence:

Two random variables X and Y can be called independent if $ f_{X,Y}(x,y) = f_{X}(x) f_{Y}(y) $ (Note: having 0 correlation does not imply independence).

Which means that if p(X = x, Y = y_i) = p(X = x, Y = y_j) (for all i,j) , ie the probability is independent of the value of Y then X and Y are independent RVs.

A set of multiple Random Variables $\{X_1, X_2, ... X_n \}$ is independent if: $ f(x_1, ... x_n) = \prod_{j = 1}^{n} f_j(x_j) $



## Random Sample

Let X be a random variable with pdf $f_X(x)$, then a random sample from X is a set of random variables $\{X_1, X_2 .. X_n \}$ if they are sampled out of X. 

$X_i = i^{th}$ random sample (a RV), $x_i = $ value of the $i^{th}$ random sample.

A set of random samples is set to be **independent and identically distributed (iid)** if:
$ f(x_1, ... x_n) = \prod_{i = 1}^{n} f_{X}(x_i) $


## Properties of Expectation and Variance

1. $\mathbb{E}[aX+b] = a\mathbb{E}[X] + b $

Expectation of Sum of two RV's (Note how expectations for multiple RVs are calculated)

2. $\mathbb{E}[X+Y] = \mathbb{E}[X]+\mathbb{E}[Y] $, expectation of adding two different RV's adds both.

 $\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (x+y) f_{X,Y}(x,y) dx dy = \int_{-\infty}^{\infty} (x) \int_{-\infty}^{\infty} (y) f_{X,Y}(x,y) dy dx + \int_{-\infty}^{\infty} (y) \int_{-\infty}^{\infty} (x) f_{X,Y}(x,y) dx dy = \int_{-\infty}^{-\infty} xf_{X}(x) dx + \int_{-\infty}^{-\infty} yf_{Y}(y) dy $

3. $\mathbb{E}[XY] = \mathbb{E}[X] \mathbb{E}[Y]$, if X,Y are independent RVs

4. Var(a) = 0, variance of a scalar is zero

5. $Var(aX+b) = a^2 Var(X)$, as $\mathbb{E}[(aX - \mathbb{E}[aX])^2] = a^2 \mathbb{E}[(X - \mathbb{E}[X])^2]$

Variance of Sum of two RV's

6. $ Var(X+Y) = Var(X) + Var(Y) - 2Cov(X,Y)$, as 

$ Var(X+Y)  =\mathbb{E}[(x+y - \mathbb{E}[X+Y])^2] = \mathbb{E}[((x- \mu_{X}) + (y-\mu_{Y}))^2] = \mathbb{E}[(x-\mu_{X})^2] + \mathbb{E}[(y-\mu_{Y})^2] + \mathbb{E}[2(x-\mu_{X})(y-\mu{Y})] $ 

7. For random samples: if $ \{ X_1, X_2, ... X_n \} $ are **iid**, then $\mathbb{E}[X_1+X_2+ ... X_n ] = n\mu_{X} $ and $ Var(X_1, X_2, ... X_n) = n(\sigma_{X})^2 $

The expectation value of the sample mean $ \bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_i $ is $\mathbb{E}[ \bar{X}] = \mu_{X} $

The variance of sample mean is $ Var(\bar{X}) = \frac{\sigma_{X}^2}{n} $, 

as $Var(\bar{X}) = Var(\frac{1}{n}(X_1 + X_2 .... X_n) ) = \frac{1}{n^2} Var(X_1 + X_2 + ... ) = \frac{n}{n^2} Var(X) $, as independent samples

Note, in the important case that we will study ie. MCMC, the sample will not be independent.

## 3 Fundamental Laws

#### Weak Law of Large Numbers (WLLN)


#### Strong Law of Large Numbers (sLLN)


#### Central Limit Theorem (CLT)
Probabilty distribution (pdf) of the sample mean $ $ of a RV $X$ tends to a normal distribution, with mean $\mu_{X} $, and variance $ \frac{\sigma_{X}^2}{n} $ as we increase the size of the sample. All the higher moments and central measures tend to zero as n tends to infinity


In [None]:
# Take a double normal and make histograms of samples of means of random samples of size 1,2,5, 10

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Set up the figure
fig, ax = plt.subplots(1, 1, figsize=(10, 6))



