<img src="./images/banner.png" width="800">

# Random Variables and Probability Theory Fundamentals

Welcome to the fascinating world of random variables! 🎲🔢


A **random variable** is a variable whose possible values are outcomes of a random phenomenon. It's like a function that assigns a numerical value to each outcome of an experiment.


> 💡 Think of it as a magical box that spits out numbers, but you can't predict exactly which number you'll get!


Random variables come in two flavors:

1. **Discrete Random Variables**: These take on countable values.
   - Example: Number of heads in 10 coin flips (0, 1, 2, ..., 10)

2. **Continuous Random Variables**: These can take any value within a range.
   - Example: The exact time it takes you to read this sentence (3.14159... seconds)


Let $(\Omega, \mathcal{F}, P)$ be a probability space. A random variable $X$ is a function:

$X: \Omega \rightarrow \mathbb{R}$

such that for every Borel subset $B$ of $\mathbb{R}$, the subset of $\Omega$ given by

$\{\omega \in \Omega : X(\omega) \in B\} \in \mathcal{F}$


Random variables are the building blocks of probability theory and statistics. They allow us to:

- Model uncertain outcomes
- Analyze patterns in data
- Make predictions about future events


In machine learning, we often deal with datasets that can be thought of as realizations of random variables. Understanding random variables is crucial for:

- Feature engineering
- Model selection
- Uncertainty quantification


Can you think of three examples each for discrete and continuous random variables in real life? Write them down!


In the next section, we'll dive deeper into discrete random variables and their properties. Get ready to roll some dice! 🎲

**Table of contents**<a id='toc0_'></a>    
- [Discrete Random Variables](#toc1_)    
  - [Probability Mass Function (PMF)](#toc1_1_)    
- [Continuous Random Variables](#toc2_)    
  - [Probability Density Function (PDF)](#toc2_1_)    
- [Cumulative Distribution Functions (CDF)](#toc3_)    
  - [Examples and Interpretations](#toc3_1_)    
- [Joint, Marginal, and Conditional Probability](#toc4_)    
  - [Joint Probability Distributions](#toc4_1_)    
  - [Marginal Distributions](#toc4_2_)    
  - [Conditional Distributions](#toc4_3_)    
- [Independence and Conditional Independence](#toc5_)    
  - [Independence Between Random Variables](#toc5_1_)    
  - [Conditional Independence](#toc5_2_)    
  - [Importance in Probabilistic Modeling and Machine Learning](#toc5_3_)    
- [Conclusion and Further Resources](#toc6_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Discrete Random Variables](#toc0_)

A **discrete random variable** is a random variable that can take on a countable number of distinct values. These values are often (but not always) integers.

Key characteristics:
- Countable set of possible values
- Each value has a non-negative probability
- Sum of probabilities over all possible values equals 1

Examples of discrete random variables:
1. **Number of heads in coin tosses** 🪙
   - Possible values: 0, 1, 2, ..., n (where n is the number of tosses)

2. **Number of customers in a queue** 👥
   - Possible values: 0, 1, 2, ...

3. **Number of defective items in a batch** 🏭
   - Possible values: 0, 1, 2, ..., N (where N is the batch size)


### <a id='toc1_1_'></a>[Probability Mass Function (PMF)](#toc0_)


The **Probability Mass Function** (PMF) of a discrete random variable $X$ is a function that gives the probability that $X$ takes on exactly the value $x$:

$p_X(x) = P(X = x)$


Properties of PMF:
1. Non-negativity: $p_X(x) \geq 0$ for all $x$
2. Sum to 1: $\sum_{x} p_X(x) = 1$
3. Probability of an event: $P(X \in A) = \sum_{x \in A} p_X(x)$


Examples of PMFs:

1. **Bernoulli Distribution** (coin flip)

   $p_X(x) = \begin{cases} 
   p & \text{if } x = 1 \\
   1-p & \text{if } x = 0
   \end{cases}$

   Where $p$ is the probability of success (e.g., getting heads).

2. **Binomial Distribution** (number of successes in n trials)

   $p_X(k) = \binom{n}{k} p^k (1-p)^{n-k}$

   Where:
   - $n$ is the number of trials
   - $k$ is the number of successes
   - $p$ is the probability of success on each trial


**Quick Exercise** 💡

Imagine you're rolling a fair six-sided die. 

1. What are the possible values of this discrete random variable?
2. What is the PMF for this random variable?
3. What's the probability of rolling an even number?

(Try to solve these before moving on to the next section!)


---

Next up, we'll explore the continuous cousins of discrete random variables. Get ready to dive into the world of smooth curves and integrals!

## <a id='toc2_'></a>[Continuous Random Variables](#toc0_)

A **continuous random variable** can take on any value within a range of numbers. Unlike discrete random variables, continuous ones can assume an uncountable infinite number of values.

Key characteristics:
- Can take any value within a range
- Probability of any exact value is zero
- Probabilities are calculated over intervals

Examples of continuous random variables:
1. **Time until next bus arrival** ⏱️
   - Range: Any positive real number

2. **Height of individuals** 📏
   - Range: Typically between 0 and 3 meters (for humans)

3. **Temperature at a specific location** 🌡️
   - Range: Any real number (in appropriate units)


### <a id='toc2_1_'></a>[Probability Density Function (PDF)](#toc0_)


The **Probability Density Function** (PDF) for a continuous random variable $X$ is a function $f_X(x)$ that describes the relative likelihood of $X$ taking on a given value.


Properties of PDF:
1. Non-negativity: $f_X(x) \geq 0$ for all $x$
2. Total area under the curve equals 1: $\int_{-\infty}^{\infty} f_X(x) dx = 1$
3. Probability over an interval: $P(a \leq X \leq b) = \int_{a}^{b} f_X(x) dx$

Note: $P(X = x) = 0$ for any single point $x$


Examples of PDFs:

1. **Uniform Distribution**

   $f_X(x) = \begin{cases} 
   \frac{1}{b-a} & \text{for } a \leq x \leq b \\
   0 & \text{otherwise}
   \end{cases}$

   Where $a$ and $b$ are the lower and upper bounds of the distribution.

2. **Normal (Gaussian) Distribution**

   $f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2}$

   Where:
   - $\mu$ is the mean
   - $\sigma$ is the standard deviation


**Quick Exercise** 💡

Consider a uniform distribution between 0 and 1:

1. Sketch the PDF of this distribution.
2. What is the probability that a random value from this distribution is between 0.3 and 0.7?
3. Why is the probability of getting exactly 0.5 equal to zero?


---

In the next section, we'll explore Cumulative Distribution Functions, which provide a unified way to work with both discrete and continuous random variables. Stay tuned!

## <a id='toc3_'></a>[Cumulative Distribution Functions (CDF)](#toc0_)

The **Cumulative Distribution Function** (CDF) of a random variable $X$ is a function that gives the probability that $X$ will take a value less than or equal to $x$. It's defined for both discrete and continuous random variables.


For a random variable $X$, the CDF $F_X(x)$ is defined as:

- $F_X(x) = P(X \leq x)$

- For discrete random variables:
$F_X(x) = \sum_{t \leq x} p_X(t)$

- For continuous random variables:
$F_X(x) = \int_{-\infty}^x f_X(t) dt$


Properties of CDF:
1. **Monotonically increasing**: If $a < b$, then $F_X(a) \leq F_X(b)$
2. **Right-continuous**: $\lim_{x \to a^+} F_X(x) = F_X(a)$
3. **Limits**: $\lim_{x \to -\infty} F_X(x) = 0$ and $\lim_{x \to \infty} F_X(x) = 1$
4. **Probability of an interval**: $P(a < X \leq b) = F_X(b) - F_X(a)$


Relationship between CDF and PMF/PDF is given by:
- For discrete random variables:
$p_X(x) = F_X(x) - F_X(x^-)$, where $x^-$ is the value just before $x$

- For continuous random variables:
$f_X(x) = \frac{d}{dx}F_X(x)$ (when the derivative exists)


### <a id='toc3_1_'></a>[Examples and Interpretations](#toc0_)


1. **Discrete Example: Fair Die Roll**

   CDF for rolling a fair six-sided die:

   $F_X(x) = \begin{cases}
   0 & \text{if } x < 1 \\
   \frac{1}{6} & \text{if } 1 \leq x < 2 \\
   \frac{2}{6} & \text{if } 2 \leq x < 3 \\
   \frac{3}{6} & \text{if } 3 \leq x < 4 \\
   \frac{4}{6} & \text{if } 4 \leq x < 5 \\
   \frac{5}{6} & \text{if } 5 \leq x < 6 \\
   1 & \text{if } x \geq 6
   \end{cases}$

   Interpretation: $F_X(3.5) = \frac{3}{6} = 0.5$ means there's a 50% chance of rolling 3 or less.

2. **Continuous Example: Uniform Distribution on [0, 1]**

   CDF for uniform distribution on [0, 1]:

   $F_X(x) = \begin{cases}
   0 & \text{if } x < 0 \\
   x & \text{if } 0 \leq x < 1 \\
   1 & \text{if } x \geq 1
   \end{cases}$

   Interpretation: $F_X(0.7) = 0.7$ means there's a 70% chance of getting a value less than or equal to 0.7.


**Quick Exercise** 💡

Consider the CDF of a normal distribution with mean $\mu=0$ and standard deviation $\sigma=1$:

1. Sketch this CDF.
2. What does $F_X(0)$ represent?
3. How would you use the CDF to find $P(-1 < X \leq 1)$?


---

Next, we'll explore joint, marginal, and conditional probabilities, which are crucial for understanding relationships between multiple random variables. Get ready to level up your probability skills! 🚀

## <a id='toc4_'></a>[Joint, Marginal, and Conditional Probability](#toc0_)

When dealing with multiple random variables, we need to understand how they interact and relate to each other. This is where joint, marginal, and conditional probabilities come into play.


### <a id='toc4_1_'></a>[Joint Probability Distributions](#toc0_)


A **joint probability distribution** describes the probability of two or more random variables occurring together.


For discrete random variables $X$ and $Y$:
- Joint PMF:
    - $p_{X,Y}(x,y) = P(X=x \text{ and } Y=y)$

For continuous random variables $X$ and $Y$:
- Joint PDF:
    - $f_{X,Y}(x,y)$, where $P((X,Y) \in A) = \iint_A f_{X,Y}(x,y) dx dy$


Properties:
1. Non-negative: $p_{X,Y}(x,y) \geq 0$ or $f_{X,Y}(x,y) \geq 0$
2. Sum or integral over all possibilities equals 1


### <a id='toc4_2_'></a>[Marginal Distributions](#toc0_)


**Marginal distributions** are single-variable distributions derived from joint distributions.


- For discrete random variables:
    - $p_X(x) = \sum_y p_{X,Y}(x,y)$

- For continuous random variables:
    - $f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x,y) dy$


Interpretation: Marginal distributions "sum out" or "integrate out" the other variables.


### <a id='toc4_3_'></a>[Conditional Distributions](#toc0_)


A **conditional distribution** gives the probability distribution of a random variable, given that another random variable has taken on a specific value.


- For discrete random variables:
$p_{X|Y}(x|y) = \frac{p_{X,Y}(x,y)}{p_Y(y)}$, where $p_Y(y) > 0$

- For continuous random variables:
$f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}$, where $f_Y(y) > 0$


Key relationship:
$p_{X,Y}(x,y) = p_{X|Y}(x|y) \cdot p_Y(y)$ (discrete case)
$f_{X,Y}(x,y) = f_{X|Y}(x|y) \cdot f_Y(y)$ (continuous case)


**Example:**
Let's consider a simple discrete case:
- $X$: Number of cars sold (0, 1, or 2)
- $Y$: Weather (Sunny or Rainy)


Joint probability table:

|       | Y=Sunny | Y=Rainy |
|-------|---------|---------|
| X=0   | 0.1     | 0.2     |
| X=1   | 0.3     | 0.1     |
| X=2   | 0.2     | 0.1     |


1. Joint probability: $P(X=1 \text{ and } Y=\text{Sunny}) = 0.3$
2. Marginal probability: $P(X=1) = 0.3 + 0.1 = 0.4$
3. Conditional probability: $P(X=1|Y=\text{Sunny}) = \frac{0.3}{0.1+0.3+0.2} = 0.5$


**Quick Exercise** 💡

Using the joint probability table above:

1. Calculate the marginal distribution for Y.
2. Find $P(Y=\text{Rainy}|X=0)$.
3. Are X and Y independent? Why or why not?


---

Next, we'll dive into the concepts of independence and conditional independence, which are crucial for understanding relationships between random variables and for many machine learning algorithms. Stay tuned! 🔗

## <a id='toc5_'></a>[Independence and Conditional Independence](#toc0_)

Understanding independence and conditional independence is crucial in probability theory and machine learning. These concepts help us simplify complex problems and make informed assumptions about relationships between variables.


### <a id='toc5_1_'></a>[Independence Between Random Variables](#toc0_)


Two random variables $X$ and $Y$ are **independent** if the occurrence of one does not affect the probability of the other.


Mathematically, for discrete random variables:
$P(X=x \text{ and } Y=y) = P(X=x) \cdot P(Y=y)$ for all $x$ and $y$

For continuous random variables:
$f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)$ for all $x$ and $y$

Equivalent definitions:
- $P(Y|X) = P(Y)$ or $P(X|Y) = P(X)$
- $F_{X,Y}(x,y) = F_X(x) \cdot F_Y(y)$ (for CDFs)

**Example:**
Rolling two fair dice. The outcome of one die doesn't affect the other.


### <a id='toc5_2_'></a>[Conditional Independence](#toc0_)


Random variables $X$ and $Y$ are **conditionally independent** given $Z$ if, once we know $Z$, knowledge of $X$ provides no additional information about $Y$ (and vice versa).


- Mathematically:
    - $P(X=x, Y=y | Z=z) = P(X=x | Z=z) \cdot P(Y=y | Z=z)$ for all $x$, $y$, and $z$

- Equivalent definition:
    - $P(X|Y,Z) = P(X|Z)$ or $P(Y|X,Z) = P(Y|Z)$

**Example:**
Let $X$ be "carrying an umbrella", $Y$ be "wearing sunglasses", and $Z$ be "weather". $X$ and $Y$ might be dependent (e.g., people rarely wear both), but given $Z$, they become conditionally independent.


### <a id='toc5_3_'></a>[Importance in Probabilistic Modeling and Machine Learning](#toc0_)


1. **Simplification of Joint Distributions:**
   Independence allows us to factorize joint distributions, reducing computational complexity.
   $P(X_1, X_2, ..., X_n) = P(X_1) \cdot P(X_2) \cdot ... \cdot P(X_n)$

2. **Naive Bayes Classifier:**
   Assumes conditional independence of features given the class, simplifying the model.

3. **Bayesian Networks:**
   Graphical models that encode conditional independence relationships among variables.

4. **Dimensionality Reduction:**
   Independent Component Analysis (ICA) seeks to find independent underlying factors.

5. **Feature Selection:**
   Independent features are often preferred to avoid redundancy.

6. **Markov Chain Monte Carlo (MCMC) Methods:**
   Leverage conditional independence for efficient sampling.


**Quick Exercise** 💡

Consider three events:
- A: It's raining
- B: The grass is wet
- C: The sprinkler is on

1. Are B and C independent?
2. Are B and C conditionally independent given A?
3. How might understanding these relationships help in building a probabilistic model for predicting wet grass?


---

Understanding independence and conditional independence is key to building effective probabilistic models. In the next section, we'll wrap up this lecture with a summary and some practical applications. Stay curious! 🧠🔍

## <a id='toc6_'></a>[Conclusion and Further Resources](#toc0_)


Let's recap the key points:

1. We explored discrete and continuous random variables
2. We learned about Probability Mass Functions (PMFs) and Probability Density Functions (PDFs)
3. We studied Cumulative Distribution Functions (CDFs) and their properties
4. We delved into joint, marginal, and conditional probabilities
5. We examined independence and conditional independence


These concepts form the backbone of probability theory and are crucial for understanding more advanced topics in statistics and machine learning.


In the world of data science and machine learning:
- Random variables help us model uncertainty
- Probability distributions are used in various algorithms (e.g., Naive Bayes, Gaussian Processes)
- Understanding independence is crucial for feature selection and model assumptions
- Joint and conditional probabilities are fundamental to Bayesian inference


To deepen your understanding, check out these resources:

1. [Introduction to Probability](https://www.edx.org/course/introduction-to-probability-0) - A comprehensive course on edX
2. [Probabilistic Graphical Models](https://www.coursera.org/specializations/probabilistic-graphical-models) - Coursera specialization by Stanford University
3. [Think Stats](https://greenteapress.com/wp/think-stats-2e/) - Free book on probability and statistics for programmers
4. [Probability & Statistics for Machine Learning & Data Science](https://www.youtube.com/playlist?list=PLblh5JKOoLUK0FLuzwntyYI10UQFUhsY9) - YouTube playlist by StatQuest
5. [Seeing Theory](https://seeing-theory.brown.edu/) - A visual introduction to probability and statistics


Remember, practice is key! Try to apply these concepts to real-world problems and datasets.