# Random Variables and Distributions (Core Concepts for Machine Learning)

This notebook explains the foundational probability concepts used throughout machine learning, with clear intuition and ML-oriented interpretations.

---

## 1. Random Variable

A **random variable (RV)** is a numerical description of the outcome of a random process.

In machine learning:
- Data is assumed to be generated by an unknown random process.
- Features and labels are modeled as random variables.

**Examples**
- Height of a person
- Pixel intensity
- Class label (spam / not spam)

Random variables allow us to represent **uncertainty mathematically**.

---

## 2. Discrete vs Continuous Random Variables

### Discrete Random Variable

A random variable is **discrete** if it takes **countable values**.

**Examples**
- Class labels: {0, 1}
- Number of clicks
- Word counts in NLP

**Key property**
- All possible values can be listed.

**In ML**
- Classification outputs
- Topic IDs
- Cluster labels

---

### Continuous Random Variable

A random variable is **continuous** if it can take values from a continuous interval.

**Examples**
- Temperature
- Rainfall amount
- Height, weight, sensor readings

**Key property**
- Probability at a single point is zero.
- Probabilities are defined over intervals.

**In ML**
- Input features
- Regression targets
- Latent variables

---

## 3. Probability Mass Function (PMF)

The **PMF** is used for **discrete random variables**.

### Definition
\[
P(X = x) = p(x)
\]

### Properties
- $0 \le p(x) \le 1$
- $\sum_x p(x) = 1$

### Interpretation
- Gives the probability of each discrete outcome.

### ML Interpretation
- Softmax outputs
- Class prior probabilities
- Label distributions

**Example**
\[
P(Y=0)=0.2, P(Y=1)=0.5, P(Y=2)=0.3
\]

Loss functions like **cross-entropy** assume a PMF.

---

## 4. Probability Density Function (PDF)

The **PDF** is used for **continuous random variables**.

### Definition
$$ f(x) \ge 0,\quad \int_{-\infty}^{\infty} f(x)\,dx = 1$$

### Important Note
- $f(x)$ is **not a probability**.
- Probability is obtained by integration:

$$ P(a \le X \le b) = \int_a^b f(x)\,dx $$

### Why This Matters
$$ P(X=x) = 0 $$
for continuous variables.

### ML Interpretation
- Gaussian Naive Bayes
- Gaussian Mixture Models
- Density-based anomaly detection

The PDF answers:
> “How plausible is this value under the model?”

---

## 5. Cumulative Distribution Function (CDF)

The **CDF** applies to both discrete and continuous random variables.

### Definition
$$ F(x) = P(X \le x)$$

### Properties
- Non-decreasing
- $0 \le F(x) \le 1$
- $F(-\infty)=0,\quad F(\infty)=1$

### Relation to PDF
$$ F(x) = \int_{-\infty}^{x} f(t)\,dt $$

### Interpretation
- Accumulated probability up to a value.

### ML Interpretation
- Threshold-based decisions
- Percentiles and quantiles
- Confidence estimation

**Example**
$$ F(1.0) = 0.84 $$

Meaning: 84% of values lie below 1.0.

---

## 6. How These Concepts Fit Together in ML

### Classification
- Labels → Discrete random variable
- Model output → PMF
- Loss → compares true label with predicted PMF

### Regression / Feature Modeling
- Features → Continuous random variables
- Model assumption → PDF
- Loss → negative log-likelihood

### Decision Making
- Scores → Random variables
- Thresholds → CDF-based probabilities

---

## Summary

- **Random Variable**: models uncertainty
- **Discrete RV + PMF**: probability of exact outcomes
- **Continuous RV + PDF**: likelihood density
- **CDF**: accumulated probability

These concepts form the mathematical backbone of probabilistic machine learning models.


------
------
------

### 1. Discrete random variable (PMF)

Think of this as class probabilities in classification.

In [1]:
import numpy as np
# Discrete random variable: class labels
classes = np.array([0,1,2]) #e.g., three classes
pmf = np.array([0.2,0.5,0.3]) # P(Y = class)

# Smapling labels based on PMF
samples  = np.random.choice(classes,size = 20, p =pmf)

print("Sample class labels:",samples)
print("PMF:")
for c,p in zip(classes,pmf):
    print(f"P(Y={c})={p}")

Sample class labels: [1 2 1 1 2 2 2 2 2 1 1 1 1 1 0 0 1 1 1 1]
PMF:
P(Y=0)=0.2
P(Y=1)=0.5
P(Y=2)=0.3


#### ML link:
* This is exactly what a softmax output represents.
* Cross-entropy loss assumes a categorical PMF.

### 2. Continuous random variable (PDF)

This mirrors feature distributions assumed in many models (Naive Bayes, GMMs).

In [3]:
from scipy.stats import norm

# Continuous random variable: a feature
mu = 0.0
sigma = 1.0

x = np.linspace(-4,4,200)
pdf = norm.pdf(x,mu,sigma)

print("PDF at x=0:",norm.pdf(0,mu,sigma))

PDF at x=0: 0.3989422804014327


#### ML link:
* Gaussian Naive Bayes models each feature using a PDF.
* Density estimation is the foundation of anomaly detection.

### 3. Cumulative Distribution Function (CDF)

CDFs are used for probability thresholds, quantiles, and decision rules.

In [4]:
cdf = norm.cdf(x,mu,sigma)
print("P(X<= 1):",norm.cdf(1,mu,sigma))

P(X<= 1): 0.8413447460685429


#### ML link:
* Used in probabilistic decision-making.
* Helps convert raw scores into probabilities or confidence levels.

### 4. Discrete vs Continuous together (real ML-style example)

Binary classification with a continuous feature.

In [5]:
# Binary class (discrete)
y = np.array([0,1])
# Feature distributions conditioned on class
mu_0, sigma_0 = -1,1
mu_1, sigma_1 = 1,1
# New data point
x_new = 0.5
# Likelihoods (PDFs)
p_x_given_0 = norm.pdf(x_new,mu_0,sigma_0)
p_x_given_1 = norm.pdf(x_new,mu_1,sigma_1)

# Class priors(PMF)
p_0, p_1 = 0.6,0.4

# Posterior (Bayes rule, unnormalized)
posterior_0 = p_x_given_0 * p_0
posterior_1 = p_x_given_1 * p_1

print("Posterior scores:")
print("Class 0:",posterior_0)
print("Class 1:",posterior_1)
print("Predicted class:",0 if posterior_0> posterior_1 else 1)


Posterior scores:
Class 0: 0.07771055739953504
Class 1: 0.1408261307057198
Predicted class: 1


#### ML link:
* This is Gaussian Naive Bayes in its simplest form.
* Shows how PMF + PDF combine to make predictions.

### Key intuition for ML
* Discrete RV + PMF → labels, class probabilities, softmax
* Continuous RV + PDF → features, likelihoods, density estimation
* CDF → thresholds, confidence, decision boundaries