# Random Vector

## Objective Learning

- Understand what **random vectors** are  
- Learn about **joint PMF, CDF, and PDFs**  
- Compute and interpret **covariance and correlation matrices**  
- Differentiate between **uncorrelated** and **independent** random variables  
- Understand **jointly normal** and **multivariate distributions**  
- Explore **multinomial** and **Dirichlet** distributions  
- Apply **Central Limit Theorem (CLT)** to multivariate data  

---

## Table of Contents
1. [Definition](#Definition)  
2. [Joint PMF](#JointPMF)  
3. [Joint CDF](#JointCDF)  
4. [Covariance Matrix](#CovarianceMatrix)  
5. [Characteristics of Covariance Matrix](#CovMatrixCharacteristics)  
6. [Correlation Matrix](#CorrelationMatrix)  
7. [Cross Covariance Matrix](#CrossCovMatrix)  
8. [Cross Correlation Matrix](#CrossCorrMatrix)  
9. [Jointly Normal Random Variables](#JointlyNormal)  
10. [Uncorrelated Random Variables](#UncorrelatedRVs)  
11. [Independent Identically Distributed (i.i.d.) Variables](#IID)  
12. [Central Limit Theorem (CLT)](#CLT)  
13. [Multinomial Distribution](#Multinomial)  
14. [Dirichlet Distribution](#Dirichlet)  

---

## 1. Definition <a name="Definition"></a>

A **random vector** is a vector of random variables:
$$
\mathbf{X} = [X_1, X_2, ..., X_n]^T
$$

Each $X_i$ is a random variable.  
We can represent multivariate data (e.g., height & weight) as a random vector.


In [4]:
import numpy as np

# Example: 2D random vector
X = np.random.normal(0, 1, 1000)
Y = 0.5 * X + np.random.normal(0, 1, 1000)

vector = np.vstack((X, Y)).T
vector[:5]


array([[-0.95099252, -0.90776692],
       [ 0.39456696,  0.18504736],
       [ 0.637566  ,  1.9986147 ],
       [ 0.54805266, -0.1523023 ],
       [-0.37731781, -0.61862094]])

## 2. Joint PMF <a name="JointPMF"></a>

For **discrete random variables**, the **Joint Probability Mass Function (PMF)** gives the probability of two variables taking specific values:

$$
P(X = x_i, Y = y_j)
$$

It describes the joint behavior of two discrete random variables.  
The sum of all joint probabilities equals 1:

$$
\sum_i \sum_j P(X = x_i, Y = y_j) = 1
$$

---

## 3. Joint CDF <a name="JointCDF"></a>

The **Joint Cumulative Distribution Function (CDF)** gives the probability that two variables are simultaneously less than or equal to given values:

$$
F_{X,Y}(x, y) = P(X \le x, \, Y \le y)
$$

It provides a complete description of the joint distribution for both **discrete** and **continuous** random variables.

Properties:
- $0 \le F_{X,Y}(x, y) \le 1$
- Non-decreasing in both arguments
- $\lim_{x, y \to \infty} F_{X,Y}(x, y) = 1$

---

## 4. Covariance Matrix <a name="CovarianceMatrix"></a>

The **Covariance Matrix** measures the relationship between multiple random variables.  
For a random vector $\mathbf{X} = [X_1, X_2, ..., X_n]^T$, it is defined as:

$$
\Sigma = E[(\mathbf{X} - \mu)(\mathbf{X} - \mu)^T]
$$

Where:
- $\Sigma_{ii} = Var(X_i)$ (variance of each variable)
- $\Sigma_{ij} = Cov(X_i, X_j)$ (covariance between variables)

Properties:
- **Symmetric:** $\Sigma = \Sigma^T$
- **Positive semi-definite:** $v^T \Sigma v \ge 0$ for all $v$
- **Diagonal entries** show individual variances
- **Off-diagonal entries** show how variables change together

## 5. Characteristics of Covariance Matrix <a name="CharacteristicsCovMatrix"></a>

The **covariance matrix** provides information about how each pair of random variables is related to each other. Here are the key characteristics:

- **Symmetry**: The covariance matrix is always symmetric because $Cov(X_i, X_j) = Cov(X_j, X_i)$.
- **Diagonal entries**: Represent the variances of the random variables.
- **Off-diagonal entries**: Represent the covariances between pairs of random variables.

For a **positive definite matrix**, all eigenvalues are positive, which implies that the variance in the system is always non-negative.

---

## 6. Correlation Matrix <a name="CorrelationMatrix"></a>

The **Correlation Matrix** is a normalized form of the covariance matrix. It measures the strength of the linear relationship between the variables.

The elements of the correlation matrix are given by:

$$
\rho_{ij} = \frac{Cov(X_i, X_j)}{\sqrt{Var(X_i)Var(X_j)}}
$$

Properties:
- Diagonal entries are always 1 (the correlation of a variable with itself is always 1).
- The range of values for off-diagonal entries is between -1 and 1, where:
  - 1 indicates perfect positive linear correlation.
  - -1 indicates perfect negative linear correlation.
  - 0 indicates no linear correlation.

---

## 7. Cross Covariance and Correlation Matrices <a name="CrossCovCor"></a>

**Cross Covariance Matrix** measures the covariance between two sets of random variables (e.g., $\mathbf{X}$ and $\mathbf{Y}$):

$$
\Sigma_{XY} = E[(\mathbf{X} - \mu_X)(\mathbf{Y} - \mu_Y)^T]
$$

**Cross Correlation Matrix** normalizes the cross covariance matrix, providing the relationship between the two sets of variables:

$$
\rho_{XY} = \frac{\Sigma_{XY}}{\sqrt{\Sigma_X \Sigma_Y}}
$$

---

## 8. Jointly Normal Random Variables <a name="JointlyNormal"></a>

A set of random variables is said to be **jointly normal** if every linear combination of these variables is also normally distributed.

For a **jointly normal** distribution:
- The **covariance matrix** defines the shape of the joint distribution.
- The **marginal distributions** of the individual variables are also normal.

---
## 9. Independent and Identically Distributed (i.i.d.) <a name="IID"></a>

**Independent and Identically Distributed** (i.i.d.) random variables are those that are both:
- **Independent**: The occurrence of one does not affect the probability of another.
- **Identically Distributed**: All variables follow the same probability distribution.

For example, if we have a set of independent coin flips, each flip is i.i.d., meaning each flip is independent and has the same probability distribution (50% heads, 50% tails).

---

## 10. Central Limit Theorem (CLT) <a name="CLT"></a>

The **Central Limit Theorem (CLT)** states that the distribution of the sum (or average) of a large number of independent, identically distributed random variables approaches a **normal distribution** as the number of variables increases, regardless of the original distribution.

Mathematically:

$$
\lim_{n \to \infty} \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \sim N(0, 1)
$$

Where:
- $\bar{X}$ is the sample mean
- $\mu$ is the population mean
- $\sigma$ is the population standard deviation

The CLT is fundamental in statistics because it allows us to use normal distribution approximations for large sample sizes, even if the original data is not normally distributed.

---

## 11. Normal Distribution <a name="NormalDistribution"></a>

The **Normal Distribution** (also called the Gaussian distribution) is the most widely used probability distribution in statistics. It is characterized by the **bell-shaped curve**, and is defined by two parameters:
- **Mean** ($\mu$)
- **Variance** ($\sigma^2$)

The probability density function (PDF) of a normal distribution is:

$$
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}
$$

---

## 12. Multinomial Distribution <a name="Multinomial"></a>

The **Multinomial Distribution** is a generalization of the **Binomial Distribution**. It describes the probabilities of the counts of different categories when performing experiments with more than two possible outcomes. It is useful for categorical outcomes like rolling a die or selecting multiple items from a set of categories.

The probability mass function (PMF) for the multinomial distribution is:

$$
P(X_1 = x_1, X_2 = x_2, ..., X_k = x_k) = \frac{n!}{x_1! x_2! ... x_k!} p_1^{x_1} p_2^{x_2} ... p_k^{x_k}
$$

Where:
- $n$ is the total number of trials
- $p_1, p_2, ..., p_k$ are the probabilities of each category
- $x_1, x_2, ..., x_k$ are the counts of each category

---

## 13. Dirichlet Distribution <a name="Dirichlet"></a>

The **Dirichlet Distribution** is a multivariate generalization of the **Beta Distribution**. It is often used as a prior distribution in **Bayesian inference** when modeling probabilities of multiple categories or events. 

The probability density function (PDF) of the Dirichlet distribution is:

$$
f(x_1, x_2, ..., x_k) = \frac{1}{B(\alpha)} \prod_{i=1}^{k} x_i^{\alpha_i - 1}
$$

Where:
- $\alpha_i$ are the parameters of the distribution
- $B(\alpha)$ is the Beta function

The Dirichlet distribution is widely used in **topic modeling** and **Bayesian mixture models**.

---

## 14. Jointly Normal Distribution <a name="JointlyNormal"></a>

A set of random variables is **jointly normal** if every linear combination of these variables follows a normal distribution. That means if we take any linear combination of these variables, it will result in a normally distributed variable.

For example, if $\mathbf{X} = [X_1, X_2]^T$ is jointly normal, then for any constants $a_1$ and $a_2$:

$$
a_1 X_1 + a_2 X_2 \sim N(\mu_1 a_1 + \mu_2 a_2, \sigma_1^2 a_1^2 + \sigma_2^2 a_2^2 + 2a_1 a_2 Cov(X_1, X_2))
$$

The joint normality property helps in simplifying multivariate analysis and is widely used in regression analysis, portfolio theory, and other fields.