# Probability & Statistics for Machine Learning

## **Introduction**

Probability and statistics are the backbone of machine learning. They help us analyze data, make predictions, and measure uncertainty. This document is designed to teach everything from scratch in a structured and easy-to-understand way.

---

# **1️⃣ Basics of Statistics**

## **1.1 What is Statistics?**

Statistics is the science of collecting, organizing, analyzing, and interpreting data.

🔹 **Example:** If we collect data about students' exam scores, statistics helps us understand their performance trends.

### **1.2 Types of Statistics**

1️⃣ **Descriptive Statistics** – Summarizes and describes data.
2️⃣ **Inferential Statistics** – Makes predictions based on data.

---

## **2️⃣ Descriptive Statistics**

### **2.1 Measures of Central Tendency**

Used to find the center of a dataset.

✔ **Mean (Average)**: Sum of all values divided by the total number.

- **Example**: Scores = [80, 85, 90], Mean = (80+85+90)/3 = 85

✔ **Median**: Middle value when arranged in order.

- **Example**: Scores = [10, 20, 30, 40, 50], Median = 30

✔ **Mode**: Most frequently occurring value.

- **Example**: Scores = [10, 20, 20, 30], Mode = 20

---

### **2.2 Measures of Spread (Variability)**

Used to understand how data is spread.

✔ **Range**: Difference between the highest and lowest value.

- **Example**: Scores = [10, 50], Range = 50 - 10 = 40

✔ **Variance (σ²)**: Average squared difference from the mean.

✔ **Standard Deviation (σ)**: Square root of variance, showing how much values deviate from the mean.

- **Example**: If σ is high, values are spread out; if low, values are close to the mean.

---

## **3️⃣ Probability Basics**

### **3.1 What is Probability?**

Probability measures how likely an event is to happen.

✔ **Formula**:
\($P(Event) = \frac{Favorable \ Outcomes}{Total \ Outcomes}$\)

✔ **Example**: Rolling a die 🎲

- Probability of rolling a **3** = 1/6 (since 3 appears once out of 6 numbers)

### **3.2 Types of Probability**

1️⃣ **Theoretical Probability** – Based on mathematical reasoning.
2️⃣ **Empirical Probability** – Based on experiments or past data.
3️⃣ **Subjective Probability** – Based on intuition or experience.

---

## **4️⃣ Probability Distributions**

### **4.1 What is a Probability Distribution?**

A probability distribution describes how probabilities are assigned to different possible outcomes of a random experiment.

👉 **Example:** Tossing a coin

- Probability of Heads (H) = 0.5
- Probability of Tails (T) = 0.5

### **4.2 Role of Probability Distributions in ML**

Probability distributions play a key role when we analyze a dataset in ML. We check the distribution of the dataset to understand its type, which helps in deciding the appropriate model and preprocessing steps.

- **Example:** If data follows a normal distribution, we can use standardization for feature scaling.
- **Example:** If data is highly skewed, we might use log transformation to normalize it.

### **4.3 Discrete Probability Distributions**

✔ **Binomial Distribution** – Models the number of successes in a fixed number of trials.

- **Example**: Probability of getting **2 heads** in **3 coin flips**.
- **ML Use Case**: Used in classification problems where outcomes are binary (e.g., Spam vs. Not Spam).
- **Probability Formula:**
\[ $P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} $\]
  - **n**: Total number of trials
  - **k**: Number of successes
  - **p**: Probability of success in a single trial

✔ **Poisson Distribution** – Models the number of rare events in a fixed interval of time or space.

- **Example**: Number of customers arriving at a shop per hour.
- **ML Use Case**: Used in anomaly detection (e.g., detecting fraudulent transactions).
- **Probability Formula:**
\[ $ P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!} $\]
  - **λ (lambda)**: Average number of occurrences in the interval
  - **k**: Exact number of occurrences we want to find the probability for
  - **e**: Euler's number (~2.718)

### **4.4 Continuous Probability Distributions**

✔ **Normal Distribution (Bell Curve)** – Many natural phenomena follow this distribution.

- **Example**: Heights of students in a school.
- **ML Use Case**: Assumes normality in many algorithms like linear regression.
- **Probability Density Function:**
\[$ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ \]
  - **μ (mu)**: Mean of the distribution
  - **σ (sigma)**: Standard deviation
  - **π (pi)**: Mathematical constant (~3.1416)

✔ **Exponential Distribution** – Models the time between events in a Poisson process.

- **Example**: Time between incoming calls at a call center.
- **ML Use Case**: Used in survival analysis and failure prediction.

### **4.5 Use Cases of Probability Distributions in ML**

| Distribution | Used In | ML Application |
|-------------|--------|---------------|
| Binomial | Classification | Spam vs. Not Spam |
| Poisson | Anomaly Detection | Fraud Detection |
| Normal | Feature Scaling | Regression Models |
| Exponential | Survival Analysis | Predicting Failure Rates |


## **5️⃣ Inferential Statistics**

### **5.1 Hypothesis Testing in Machine Learning**

Hypothesis testing is used in ML to validate assumptions and compare models.

✔ **Example**: Checking if a new model performs better than an existing one.

- **Null Hypothesis (H₀)**: No significant difference exists.
- **Alternative Hypothesis (H₁)**: A significant difference exists.

✔ **p-value**: If **p < 0.05**, we reject H₀.

### **5.2 Types of Hypothesis Tests in ML**

1️⃣ **T-Test** – Compares means of two groups (used in A/B testing for ML models).
2️⃣ **Chi-Square Test** – Tests independence between categorical variables.
3️⃣ **ANOVA (Analysis of Variance)** – Compares means of more than two groups.
4️⃣ **Kolmogorov-Smirnov Test** – Checks if a dataset follows a specific distribution.
5️⃣ **Mann-Whitney U Test** – Non-parametric test for comparing two distributions.

### **5.3 Confidence Intervals**

Gives a range in which we expect the true value to be.

✔ **Example**: If we estimate average height = **170 cm ± 5 cm**, we are **95% confident** that the true height is between **165 cm and 175 cm**.

---


