## Probability for AI / ML

### 1. What is Probability (ML Perspective)

Probability = measure of uncertainty

**In ML:**
- Classification → probability of class
- Regression → uncertainty in prediction
- Bayesian models → probability is the core

`Basic Definition`

$P(A) = \frac{\text{Favorable outcomes}}{\text{Total outcomes}}$

`Range:`

$0 \leq P(A) \leq 1$

### 2. Types of Events (Important for Questions)
- **Independent**
  - Meaning: One event doesn’t affect another
  - ML Relevance: Feature independence (Naive Bayes)

- **Dependent**
  - Meaning: One event affects another
  - ML Relevance: Conditional probability

- **Mutually Exclusive**
  - Meaning: Events cannot occur together
  - ML Relevance: Class labels

- **Exhaustive**
  - Meaning: Covers all possible outcomes
  - ML Relevance: Probability distributions

### 3. Complementary Rule (Very Important)

$P(A^c) = 1 - P(A)$

**Used when:**
- Probability is easier to compute for opposite event

**Example (Interview-style):**
- If spam probability = 0.2
- Not spam = 0.8

### 4. Addition Rule
`Case 1: Mutually Exclusive`

$P(A \cup B) = P(A) + P(B)$

`Case 2: Not Mutually Exclusive`

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

**Used in:**
- Multi-class probabilities
- Overlapping feature events

### 5. Multiplication Rule
`Independent Events`

$P(A \cap B) = P(A) \times P(B)$

`Dependent Events`

$P(A \cap B) = P(A) \times P(B|A)$

> Naive Bayes assumes independence.
That’s why it multiplies probabilities.

### 6. Conditional Probability (Core for ML)

$P(A|B) = \frac{P(A \cap B)}{P(B)}$

$P(B|A) = \frac{P(A \cap B)}{P(A)}$

**Meaning:**
Probability of A given B has occurred

**ML Example**
- Probability of disease given symptoms
- Probability of class given features

### 7. Law of Total Probability

$P(A) = \sum_{i=1}^{n} P(A|B_i)P(B_i)$

**Used when:**
- Data is split into groups / classes
- Forms base for Bayes’ Theorem

### 8. Bayes’ Theorem (MOST IMPORTANT)

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$

**ML Meaning**
- **P(A) — Prior**
  - Meaning: Initial probability before observing data
- **P(B | A) — Likelihood**
  - Meaning: Probability of data given the hypothesis
- **P(A | B) — Posterior**
  - Meaning: Updated probability after observing data
- **P(B) — Evidence**
  - Meaning: Total probability of observing the data

**Used in:**
- Naive Bayes
- Spam filtering
- Medical diagnosis
- Recommendation systems

### 9. Random Variables

A random variable maps outcomes → numbers.

**Types**
- Discrete → finite values (Binomial)
- Continuous → infinite values (Normal)

> ML models work with random variables, not raw events.

### 10. Mean, Variance & Standard Deviation
`Mean (Expectation)`

$E(X) = \mu = \sum x \cdot P(x)$

**Used in:**
- Central tendency
- Feature scaling

`Variance`

$Var(X) = \sigma^2 = E[(X - \mu)^2]$

**Measures:**
- Spread of data

`Standard Deviation`

$\sigma = \sqrt{Var(X)}$

**Used in:**
- Normalization
- Outlier detection

### 11. Probability Distributions (VERY IMPORTANT)

#### 1. Binomial Distribution

**Used when:**
- Yes / No outcomes
- Fixed number of trials

$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$

**ML usage:**
- Bernoulli Naive Bayes

#### 2. Uniform Distribution

- All values equally likely

**Used in:**
- Random initialization
- Sampling

#### 3. Normal Distribution (MOST USED)

$f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$

**Used in:**
- Linear Regression
- Logistic Regression (assumptions)
- Noise modeling

### 12. Central Limit Theorem (CLT)

No matter the distribution, sample mean → normal distribution

**Why ML loves it:**
- Justifies Gaussian assumptions
- Enables statistical inference