### Random variables

:::{admonition} What you need to know

- Summing independent random variables results in another random variable called **sumple sum**. The mean of the sample sum is different from the population mean or expectation which is an exact quantity we want to approximate by sampling.
- The **Law of Large Numbers** is a principle that states that as the number $N$, the sample mean approaches the population mean with a standard deviation falling off as $N^{-1/2}$
- The **Central Limit Theorem (CLT)** tells us that summing independent and identically distributed random variables with well-defined means and variances results in Gaussian distribution regardless of the nature of a random variable. 
- A model of **random walk** describes the erratic, unpredictable motion of atoms and molecules, providing a fundamental model for diffusion processes and molecular motion in fluids.
The number of steps to the right (or left) of a 1D random walker results in a binomial probability distribution. Following CLT binomial distribution in the large N limit can be shown to be well approximated by gaussian with the same mean and variance.
:::

### Introducing random variables


- **A random variable X** is a variable whose value depends on the outcomes of a random phenomenon. $X(\omega)$ is a function from possible outcomes of a sample space $\omega \in \Omega$.
    - For a coin toss $\Omega={H,T}$ $X(H)=+1$ and $X(T)=-1$. Every time the experiment is done, X returns either +1 or -1. We could also make functions of random variables, e.g., every time X=+1, we ear 25 cents, etc. 

- Random variables are classified into two main types: **discrete and continuous.**

    - **Discrete Random Variable:** It assumes a number of distinct values. Discrete random variables are used to model scenarios where outcomes can be counted, such as the number of particles emitted by a radioactive source in a given time interval or the number of photons hitting a detector in a certain period.

    - **Continuous Random Variable:** It can take any value within a continuous range. These variables describe quantities that can vary smoothly, such as the position of a particle in space, the velocity of a molecule in a gas, or the energy levels of an atom.

### **Probability Distribution of a Random Variable**

- For any random variable $ X $, we are interested in finding the probability distribution over its possible values $ x $, denoted as $ p_X(x) $.
- It is important to distinguish between:
  - $ x $, which represents a specific value the variable can take (e.g., $ 1,2, \dots, 6 $ for a die).
  - $ X $, which is the random variable itself, generating values $ x $ according to the probability distribution $ p(x) $.

-

### **Expectation**

- The expectation $ E[x] $ represents the theoretical mean, distinguishing it from the sample mean computed in simulations. 
- For instance, consider the difference between:
  - The average height of people computed from a small sample of cities.
  - The true mean height of the entire world population.
- As the sample size increases, the sample mean converges to the expectation.
- Expectation can apply to:
  - The variable itself (mean).
  - Any function of the variable (e.g., squared deviation for variance).


$$
E[x] = \int x \cdot p(x)dx = \mu
$$

$$
E[f(x)] = \int f(x) \cdot p(x)dx
$$

- We often use shorthand notation for mean $\mu$

### Variance (Fluctuation)

- Variance measures the expected squared deviation from the mean, quantifying the spread in values of $ x $.

$$
V[x] = E[(x-E[x])^2] = E[x^2] - E[x]^2 = \sigma^2
$$

- We often use shorthand notation for variance $\mu$


### Sum of Two Random Variables

- Consider the sum of two random variables, such as:
  - The sum of numbers obtained from rolling two dice.
  - The sum of two coin flips (e.g., heads = 1, tails = 0).
- The sum of random variables is itself a random variable:
Your equations are mostly correct but need minor corrections for clarity and accuracy. Here's the cleaned-up and properly formatted version in LaTeX:


$$
X = X_1 + X_2
$$

- Expectation is always a **linear operator**, which follows from the definition of expectation and the linearity of integration:

$$
E[X_1 + X_2] = E[X_1] + E[X_2]
$$

- However, variance is **not** generally a linear operator:

$$
V[X_1 + X_2] = E\left[(X_1 + X_2 - E[X_1 + X_2])^2\right] 
$$

Expanding the expectation term:

$$
V[X_1 + X_2] = E\left[(X_1 - E[X_1] + X_2 - E[X_2])^2\right]
$$

- Defining the **mean-subtracted variables**:

$$
Y_i = X_i - E[X_i]
$$

we rewrite the variance expression in terms of $Y_i$:

$$
E[(Y_1 + Y_2)^2] = E[Y_1^2] + E[Y_2^2] + 2E[Y_1 Y_2]
$$

Since $ V[X_i] = E[Y_i^2] $, this simplifies to:

$$
V[X_1 + X_2] = V[X_1] + V[X_2] + 2E[Y_1 Y_2]
$$

- In the special case where $X_1 $ and $X_2 $ are **independent**, the cross-term vanishes:

$$
E[Y_1 Y_2] = E[(X_1 - E[X_1])(X_2 - E[X_2])] = 0
$$

Thus, for independent variables:

$$
V[X_1 + X_2] = V[X_1] + V[X_2]
$$

This result is **fundamental** in statistical mechanics, probability theory, and the sciences, as it explains why variances add for independent random variables.



### Sumս of N Random Variables

- Consider a sequence of **independent and identically distributed (i.i.d.)** random variables, $X_1, X_2, \ldots$. Examples include coin tosses, the random displacement of a molecule, or the random placement of a molecule in the left vs. right corner of a box. Being identically distributed means each term has a well-defined **mean** $\mu$ and **variance** $\sigma^2$: 

- The sum of $n$ random variables, denoted by $S_n$, is called a **sample sum**, and average of sum $M_n = \frac{1}{n}S_n$ is called a **sample mean**.  
- Our goal is to understand how these summed or aggregated quantities behave as a function of sample size.

$$S_n = \sum_{i=1}^n X_i$$

- Expectation is always a linear operator. Variance in general is not. But for  **i.i.d.** random variables, variance becomes linear operator. To show this we denote mean subtracted random variable as $Y_i = X_i - \mu$ which  has zero expectation $E[Y_i] = 0$ 
- Because of the independence of random variables all cross-terms are zero $E[Y_i Y_j] = 0$ for $i \neq j$ and all the self-terms $i=j$ are equal to variance $E[Y^2_i] = V[X_1]=\sigma^2$

$$E(S_n) = \sum_{i=1}^n E(X_i) = n\mu$$

$$V(S_n) = E[(S_n - n\mu)^2] =\sum_i \sum_j E[Y_i Y_j] = \sum_i E[Y^2_i] = n\sigma^2$$


### Law of Large Numbers (LLN)

- Applying previous result to sample mean $M_n=\frac{1}{n}S_n$ we arrive at the **Law of Large Numbers (LLN)** which states that as the number of independent and identically distributed (i.i.d.) random variables increases, their sample mean converges to the true expected value.  

$$
M_n \to \mu
$$  

$$V[M_n] \to \sigma^2 / n$$

**Implication:**  
- The sample mean provides a reliable estimate of $\mu$ for large $n$.  
- The variance of $M_n$ decreases as , meaning fluctuations shrink as $1/\sqrt{n}$.  
- This justifies ensemble averaging in statistical mechanics, ensuring macroscopic observables (e.g., temperature, pressure) are stable and predictable.

In [None]:
# Number of trials and runs
N, runs = int(1e5), 30

# Store fractions of heads for each trial in each run
fractions = np.zeros((runs, N))

# Simulate coin tosses
for run in range(runs):
    # Generate coin tosses (0 for tails, 1 for heads)
    tosses = np.random.randint(2, size=N)
    # Calculate cumulative sum to get the number of heads up to each trial
    cum_heads = np.cumsum(tosses)
    # Calculate fraction of heads up to each trial
    fractions[run, :] = cum_heads / np.arange(1, N+1)

# Plotting
plt.figure(figsize=(14, 8))

# Plot all runs with low opacity
for run in range(runs):
    plt.plot(fractions[run, :], color='grey', alpha=0.3)

# Highlight first run
plt.semilogx(fractions[0, :], color='blue', linewidth=2, label='Highlighted Run')

# Expected value line
plt.axhline(y=0.5, color='red', linestyle='--', label='Expected Value (0.5)')
plt.xlabel('Number of Trials')
plt.ylabel('Fraction of Heads')
plt.title('Law of Large Numbers: Fraction of Heads in Coin Tossing')
plt.legend()

### The Central Limit Theorem  (CLT)

- **Central Limit Theorem** asserts that the probability distribution function or **PDF** of sum of random variables becomes gaussian distribution with mean $n\mu$ and $n\sigma^2$. Note that CLT is based on assumption that the **mean and variance**, $\mu$ and $\sigma^2$, **are finite!**. Thus, CLT does not hold for certain power-law distributed RVs.

$$S_n = X_1 +X_2+...+X_n \rightarrow N(n\mu, n\sigma^2)$$

$$p(s) = \frac{1}{(2\pi  n\sigma^2)^{1/2}}e^{-\frac{(s-n\mu)^2}{2 n\sigma^2}}$$

- If we subtract mean and scale the sample sum by its standard deviation we will get a standard normal distribution. 

$$Z_n = \frac{S_n - n\mu}{\sqrt{n}\sigma} \rightarrow N(0, 1)$$

In [None]:
from scipy.stats import norm

# Number of coin tosses in each experiment, number of experiments
N, runs    = 100, 1000  

# Simulate coin tosses: num_experiments rows, num_tosses_per_experiment columns
tosses = np.random.randint(2, size=(N, runs))

# Calculate means of each experiment
M = np.mean(tosses, axis=0)

z = ( M-M.mean() ) / np.std(M)

# Plotting the distribution of sample means
plt.figure()
plt.hist(z, density=True, bins=30)
plt.title('Distribution of Sample Means of Coin Tosses')
plt.xlabel('Sample Mean')
plt.ylabel('Density')

zs = np.linspace(z.min(), z.max(), 1000)
plt.plot(zs, norm.pdf(zs),'k', label='mean=0, var=1')
plt.legend()

#### Law of large numbers and random walk

Applying the formulas to random walk model we get mean and variance for single step

$$E[X_1] = \theta \cdot 1 + (1-\theta) \cdot (-1) = 2\theta-1$$ 

$$V[X_1] = E[X^2_1] -  E[X_1]^2 = \theta \cdot 1^2+ (1-\theta) (-1)^2 - (2\theta-1)^2 = 4 \theta(1-\theta)$$

Since steps of a random walker are independent we can compute the variance of a total displacement by multiplying mean and varaince of a single step by N 

$$E[x]=N(2\theta -1)$$

$$V[x]=N\bar{\sigma^2_1} = 4N\theta (1-\theta)$$ 

The variance of the mean $\bar{x} = x/N$ would then be:

$$V[\bar{x}] = \frac{4\theta (1-\theta)}{N}$$ 

### Random numbers in python

- The [**numpy.random**](https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html) has the fastest random number generators based on low-level code written in C. 
- The [**Scipy.stats**](https://docs.scipy.org/doc/scipy/reference/stats.html ) has an extensive library of statistical distributions and tools for statistical analysis.

- First, we take a look at the most widely used random numbers of numpy, also called standard random numbers. These are rand (uniform random number on interval 0,1) and randn (stnadard average random number with 0 mean and 1 variance). 

- When running code that uses random numbers results will always differ for every run. If you want code to reproduce the same result, you can fix the seed to get reproducible results: ``` np.random.seed(8376743)```

- To convert random variables to probability distributions we need to generate large enough sample then perform histogramming via ```np.hist``` or directly histogram and visualize by one shot via ```plt.hist()```

In [None]:
import numpy as np

X = np.random.rand(30)

print(X)
plt.plot(X)

In [None]:
counts, edges = np.histogram(X, range=(0,1), bins=20)

counts, edges

In [None]:
plt.hist(X, density=True)

### Probability distributions

In [None]:
def rnplot(r):
    '''Convenience function for making quick two-panel plot showing 
    a line plot for the sequence of random numbers (RN)
    a histogram plot of the probability density of random numbers 
    '''
    
    fig, ax = plt.subplots(ncols=2) 

    ax[0].plot(r,  color='blue', label='trajectory')
    ax[1].hist(r,  density=True, color='red',  label = 'histogram')
    
    
    ax[0].set_xlabel('Samples of RN')
    ax[0].set_ylabel('Values of RN')
    
    ax[1].set_xlabel('Values of RN')
    ax[1].set_ylabel('Probability Density')

    fig.legend();
    fig.tight_layout()

### Uniform

**Probability distribution**

$$p(x| a, b)=\begin{cases}
{\frac {1}{b-a}}&\mathrm {for} \ a\leq x\leq b,\$$8pt]0&\mathrm {for} \ x<a\ \mathrm {or} \ x>b
\end{cases}
$$

- $E[x] = \frac{1}{2}(a+b)$

- $V[x] = \frac{1}{12}(a-b)^2$

**Random Variable**

- $U(a, b)$ modeled by ```np.random.uniform(a,b, size=(N, M))```
- $U(0, 1)$ modeled by ```np.random.rand(N, M, P, ...)```

In [None]:
np.random.rand(5)

In [None]:
r = np.random.rand(200) 

rnplot(r)

### Binomial

#### Probability Mass Function

$$P(n |p, N) =  \frac{N!}{(N-n)! n!}p^n (1-p)^{N-n}$$

- $E[n] = Np$
- $V[n] = 4Np(1-p)$


**Random Variable**

- $B(n, p)$ modeled by ```np.random.binomial(n, p, size)```

In [None]:
np.random.binomial(n=10, p=0.6, size=20)

In [None]:
r = np.random.binomial(n=10, p=0.6, size=2000) 

rnplot(r)

### Gaussian

$$P(x |\mu, \sigma) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$

- $E[x] = \mu$
- $V[x] = \sigma^2$

**Random Variable**

- $N(a, b)$ modeled by ```np.random.normal(loc,scale, size=(N, M))```
- $N(0, 1)$ modeled by ```np.random.randn(N, M, P, ...)```

In [None]:
# For a standard normal with sigma=1, mu=0
r = np.random.randn(200)

# For a more general Gaussian
rgauss = np.random.normal(loc=2., scale=5, size=200) 

rnplot(r)

### Transforming random variables

- Changing from random variable X to a new variable $Y=f(X)$ leads to a relation between two probability functions where the multiplying factor is jacobian. E.g think of a factor accounting for changing the spacing between points (or volumes) $p(x)dx=p(y)dy$

$$p(y) = p(x)\cdot \Big|\frac{dx}{dy}\Big|$$


- Let us consider addition $Y=X+a$ where we have $p(y)=p(x+a) \cdot 1$ and multiplication $Y=aX$ where we have $p(y) = p(x)\cdot \frac{1}{a}$
- These transformations mean we can shift average by a constant factor $E[X+a]=E[X]+a$. And when the original variable is multiplied by a constant factor $a$ the variance become $a^2$ times more: $V[ax] = a^2V[x]$.
- As an example we can use standard uniform and standard normal to carry out the following transformations:

$$N(\mu, \sigma^2) = \mu+\sigma\cdot N(0,1)$$

$$U(a, b) = (b-a)*U(0,1)$$

### Exact vs sampled probability distributions

- [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html) contains a large number of probability distributions. Explores examples there

In [None]:
from scipy.stats import binom, norm, poisson

In [None]:
xmin, xmax, step = -5, 5, 1000

dx = (xmax-xmin)/step

x = np.linspace(xmin, xmax, step)

px = norm.pdf(x)

print('normalization', np.sum(px*dx))

In [None]:
r = np.random.randn(100)

plt.hist(r,  density=True, label=f'Sampled mean={r.mean():.2f}, var={r.var():.2f}');


plt.plot(x, px,'k', label='Exact mean=0, var=1')

plt.legend(loc=2)
plt.ylabel('$p(x)$')
plt.xlabel('$x$')

### Problems

#### Problem 1 Binomial as generator of Gaussian and Poisson distributions

- Show that in large number limit binomial distribution tends to gaussian. Show is by expanding binomial distirbution $logp(n)$ in power series showing that terms beyond quadratic can be ignored. 

- In the limit $N\rightarrow \infty$ but for very small values of $p \rightarrow 0$ such that $\lambda =pN=const$ there is another distribution that better approximates Binomial distribution: $p(x)=\frac{\lambda^k}{k!}e^{-\lambda} $ It is known as Poisson distribution. <br>
Poisson distribution is an excellent approximation for probabilities of rare events. Such as, infrequently firing neurons in the brain, radioactive decay events of Plutonium or rains in the desert. <br>  Derive Poisson distribution by taking the limit of $p\rightarrow 0$ in binomial distribution.

- Using numpy and matplotlib plot binomial probability distribution
against Gaussian and Poisson distributions for different values of N=(10,100,1000,10000). <br>
- For a value N=10000 do four plots with the following values 
p=0.0001, 0.001, 0.01, 0.1. You can use  subplot functionality to make a pretty 4 column plot. (See plotting module)

```python
fig, ax =  plt.subplots(nrows=1, ncols=4)
ax[0].plot()
ax[1].plot()
ax[2].plot()
ax[3].plot()
```