# `What is Probability?`

In simplest terms, probability is a measure of the likelihood that a particular event will occur. It is a fundamental concept in statistics and is used to make predictions and informed decisions in a wide range of disciplines, including science, engineering, medicine, economics, and social sciences.

Probability is usually expressed as a number between 0 and 1, inclusive:

*   A probability of 0 means that an event will not happen.
*   A probability of 1 means that an event will certainly happen.
*   A probability of 0.5 means that an event will happen half the time (or that it is as likely to happen as not to happen).

> # `Probability Terminology`

<details>
<summary>Click to expand</summary>

* **Experiment** → A process with uncertain results (e.g., rolling dice).
* **Trial** → One performance of an experiment.
* **Outcome** → A single possible result (e.g., rolling a 4).
* **Sample Space (S)** → Set of all outcomes. Example: $S = \{1,2,3,4,5,6\}$.
* **Event (E)** → Subset of the sample space (e.g., rolling an even number).
* **Random Experiment** → Repeated under same conditions but outcome unknown.
* **Random Variable (RV)** → A variable whose value is determined by the outcome of a random experiment.

  * **Discrete RV** → Countable values (e.g., number of heads).
  * **Continuous RV** → Infinite values in an interval (e.g., height, time).
* **Probability (P)** → Likelihood of an event happening.
* **Mutually Exclusive Events** → Events that cannot happen together.
* **Independent Events** → Events that don’t affect each other’s probability.
* **Joint Probability** → Probability of two events together ($P(A \cap B)$).
* **Marginal Probability** → Probability of one event, ignoring others.
* **Conditional Probability** → Probability of event A given B happened:

  $$
  P(A|B) = \frac{P(A \cap B)}{P(B)}
  $$
* **Theoretical Probability** → Based on logic or formulas.
* **Empirical Probability** → Based on experimental data.
* **Bayes’ Theorem** → Updates probability when new information is given.
* **Law of Large Numbers** → With more trials, empirical probability ≈ theoretical probability.
* **Expected Value (Mean)** → Long-term average of a random variable.
* **Variance** → Spread of random variable around its mean.
* **Standard Deviation** → Square root of variance (average spread).

</details>

> # `Types of Events in Probability`

<details>
<summary>Click to expand</summary>

### 1. **Simple Event**

* An event with only **one outcome**.
* Example: Rolling a dice → event “getting 5” = {5}.

---

### 2. **Compound Event**

* An event with **more than one outcome**.
* Example: “Getting an even number” on dice = {2, 4, 6}.

---

### 3. **Sure (Certain) Event**

* An event that **always happens**.
* Example: When tossing a coin, “getting head OR tail”.

---

### 4. **Impossible Event**

* An event that **can never happen**.
* Example: Rolling a 7 on a dice.

---

### 5. **Mutually Exclusive Events**

* Events that **cannot occur together**.
* Example: Rolling a dice → “getting 2” and “getting 5”.

---

### 6. **Exhaustive Events**

* A set of events that **covers the whole sample space**.
* Example: On a dice → {1}, {2}, {3}, {4}, {5}, {6}.

---

### 7. **Independent Events**

* Occurrence of one event **does not affect** the other.
* Example: Tossing 2 coins → result of coin 1 does not affect coin 2.

---

### 8. **Dependent Events**

* Occurrence of one event **affects** the probability of the other.
* Example: Drawing 2 cards from a deck **without replacement**.

---

### 9. **Complementary Events**

* Two events that are **opposites**, together they make the whole sample space.
* Example: “Getting a head” and “Not getting a head”.

---

**Memory Tip:**
Think of events as “scenarios.” The type depends on:

* How many outcomes it has (simple/compound).
* Whether it can happen (sure/impossible).
* How events relate (mutually exclusive/independent/dependent).

---
</details>

> # `Empirical Probability vs Theoretical Probability`

<details>
<summary>Click to expand</summary>

###  **1. Theoretical Probability**

* Based on **mathematical reasoning** (logic, formulas, equally likely outcomes).
* No actual experiment needed.
* Formula:

  $$
  P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}
  $$
* **Example:** Toss a coin → probability of getting head = 1/2 = 0.5.

---

###  **2. Empirical Probability**

* Based on **actual experiments or observations**.
* Uses collected data to estimate probability.
* Formula:

  $$
  P(E) = \frac{\text{Number of times event occurred}}{\text{Total number of trials}}
  $$
* **Example:** Toss a coin 100 times → heads appeared 47 times.
  Empirical probability of head = 47/100 = 0.47.

---

###  Key Differences

| Feature                 | **Theoretical Probability**              | **Empirical Probability**                     |
| ----------------------- | ---------------------------------------- | --------------------------------------------- |
| **Basis**               | Pure logic, formulas                     | Actual experiments/data                       |
| **Accuracy**            | Exact (ideal case)                       | Approximate, improves with more trials        |
| **Need for Experiment** | Not required                             | Required                                      |
| **Example**             | Probability of rolling a 6 on dice = 1/6 | Rolling dice 60 times, 12 sixes → 12/60 = 0.2 |

---

**Connection:**

* The **Law of Large Numbers** says that as the number of trials increases, **empirical probability approaches theoretical probability**.

---
</details>

> # `Random Variable`

<details>
<summary>Click to expand</summary>

### **Definition**

A **Random Variable (RV)** is a variable that takes **numerical values** based on the outcome of a **random experiment**.

It maps outcomes (like coin toss results) to numbers.

---

###  **Types of Random Variables**

1. **Discrete Random Variable (DRV)**

   * Takes **countable values** (finite or countably infinite).
   * Examples:

     * Number of heads in 3 coin tosses → {0, 1, 2, 3}.
     * Rolling a dice → {1, 2, 3, 4, 5, 6}.

2. **Continuous Random Variable (CRV)**

   * Takes **uncountably infinite values** (any real number in an interval).
   * Examples:

     * Height of a student → [140 cm, 200 cm].
     * Time to run a marathon → [2.0, 6.5 hours].

---

### **Probability Distribution of a Random Variable**

* For **discrete RVs**, we use **PMF (Probability Mass Function)**.
* For **continuous RVs**, we use **PDF (Probability Density Function)**.
* The distribution tells us how probabilities are assigned to values.

---
### **Mean (Expected Value) of a Random Variable**

Represents the **long-run average** value.

* Discrete:

  $$
  E[X] = \sum x_i \cdot P(x_i)
  $$
* Continuous:

  $$
  E[X] = \int x \cdot f(x) \, dx
  $$

---

### **Variance of a Random Variable**

Measures **spread** of values around the mean.

$$
Var(X) = E[(X - \mu)^2] = E[X^2] - (E[X])^2
$$

---

### **Example (Coin Toss)**

* Toss a fair coin. Let $X$ = number assigned to outcome:

  * Head = 1, Tail = 0.
* Then:

  * $P(X=1) = 0.5$, $P(X=0) = 0.5$.
  * $E[X] = (0)(0.5) + (1)(0.5) = 0.5$.
  * $Var(X) = E[X^2] - (E[X])^2 = (0^2)(0.5)+(1^2)(0.5) - (0.5)^2 = 0.25$.

---

**Why Random Variables matter in ML?**

* Features (X) and target (Y) are modeled as random variables.
* Distributions of RVs → guide us in choosing models (e.g., Gaussian assumptions in Linear Regression, Bernoulli in Logistic Regression).

---

</details>

> # `Probability Distribution of a Random Variable`

<details>
<summary>Click to expand</summary>

### **Definition**

A **probability distribution** describes how probabilities are distributed across the values of a random variable.
It answers: *“What values can the random variable take, and with what likelihood?”*

---

## **1. Discrete Probability Distribution**

* For **Discrete Random Variables (DRV)**.
* Described using **Probability Mass Function (PMF)**.
* Properties:

  1. $P(X = x_i) \geq 0$
  2. $\sum P(X = x_i) = 1$
* Example: Tossing a fair coin →

  * $P(X=0) = 0.5$, $P(X=1) = 0.5$.
* Common distributions:

  * **Bernoulli, Binomial, Poisson, Geometric**.

---

## **2. Continuous Probability Distribution**

* For **Continuous Random Variables (CRV)**.
* Described using **Probability Density Function (PDF)**.
* Properties:

  1. $f(x) \geq 0$ for all $x$.
  2. $\int_{-\infty}^{\infty} f(x) dx = 1$.
* Probability for a range:

  $$
  P(a \leq X \leq b) = \int_a^b f(x) dx
  $$
* Example: Heights of students ~ Normal Distribution.
* Common distributions:

  * **Normal, Exponential, Uniform, Log-normal**.

---

## **3. Cumulative Distribution Function (CDF)**

* Works for **both discrete & continuous** variables.
* Definition:

  $$
  F(x) = P(X \leq x)
  $$
* Example: Dice → $F(3) = P(X \leq 3) = \frac{3}{6} = 0.5$.

---

## **Example (Dice Roll)**

Let $X$ = outcome of rolling a fair die.

* **PMF:**

  $$
  P(X = k) = \frac{1}{6}, \quad k = 1,2,3,4,5,6
  $$

* **CDF:**

  $$
  F(3) = P(X \leq 3) = \frac{3}{6} = 0.5
  $$

---

## **Why it matters in ML?**

* Many ML models assume data comes from specific distributions.
* Example:

  * **Logistic regression** assumes Bernoulli distribution for labels.
  * **Naive Bayes** uses conditional probability distributions.
  * **Gaussian assumptions** underlie Linear Discriminant Analysis (LDA).

---

**Summary:**

* PMF → Discrete RV.
* PDF → Continuous RV.
* CDF → Both, cumulative probabilities.

---
</details>

> # `Mean (Expected Value) of a Random Variable`

<details>
<summary>Click to expand</summary>

###  **Definition**

The **mean of a random variable** is the **long-run average value** of the random variable after many repetitions of the experiment.

* Think of it as the **center of gravity** of the probability distribution.
* Notation: $E[X]$ or $\mu$.

---

## **1. Discrete Random Variable (DRV)**

If $X$ is a discrete random variable with possible values $x_1, x_2, …, x_n$ and probabilities $P(X = x_i)$:

$$
E[X] = \sum_{i=1}^{n} x_i \cdot P(X = x_i)
$$

**Example:** Toss a fair coin, let $X=1$ if Head, $X=0$ if Tail.

* $P(X=1)=0.5, \, P(X=0)=0.5$.

$$
E[X] = (0)(0.5) + (1)(0.5) = 0.5
$$

---

## **2. Continuous Random Variable (CRV)**

If $X$ is continuous with **Probability Density Function (PDF)** $f(x)$:

$$
E[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx
$$

**Example:** Suppose $X \sim U(0,1)$ (Uniform distribution from 0 to 1).

* PDF: $f(x)=1$ for $0 \leq x \leq 1$.

$$
E[X] = \int_0^1 x \cdot 1 \, dx = \frac{1}{2}
$$

---

## **Interpretation in Machine Learning**

* **Expected value = prediction baseline**.
* Example:

  * In regression, the mean of target variable $Y$ is the naive predictor if no features are used.
  * In probability models (like Naive Bayes), expected values determine likelihoods.

---

**Summary (Short Notes):**

* Discrete: $E[X] = \sum x_i P(x_i)$.
* Continuous: $E[X] = \int x f(x) dx$.
* Represents **average outcome** over many trials.

---
</details>

> # ` Variance of a Random Variable`

<details>
<summary>Click to expand</summary>

###  **Definition**

The **variance** measures how much the values of a random variable deviate from the mean, **on average (squared)**.

* Symbol: $Var(X)$ or $\sigma^2$.
* Formula idea: *average of squared deviations from the mean*.

---

## **1. Discrete Random Variable (DRV)**

If $X$ takes values $x_1, x_2, …, x_n$ with probabilities $P(X = x_i)$:

$$
Var(X) = \sum_{i=1}^{n} (x_i - \mu)^2 \cdot P(X = x_i)
$$

 Alternative formula (very useful!):

$$
Var(X) = E[X^2] - (E[X])^2
$$

---

## **2. Continuous Random Variable (CRV)**

If $X$ has PDF $f(x)$:

$$
Var(X) = \int_{-\infty}^{\infty} (x - \mu)^2 \cdot f(x) \, dx
$$

or equivalently,

$$
Var(X) = E[X^2] - (E[X])^2
$$

---

## **3. Example (Coin Toss, Bernoulli)**

Let $X$ ~ Bernoulli(p). (Head=1, Tail=0).

* $E[X] = p$.
* $Var(X) = p(1-p)$.

If fair coin (p=0.5):

$$
Var(X) = 0.5 \cdot (1 - 0.5) = 0.25
$$

---

## **Interpretation in Machine Learning**

* Variance shows **uncertainty & spread** in data.
* Example:

  * High variance in features → models may need normalization.
  * Bias-Variance tradeoff: high variance models (like deep trees) overfit, low variance models (like linear regression) may underfit.

---

**Summary (short notes)**

* $Var(X) = E[(X - \mu)^2]$.
* Shortcut: $Var(X) = E[X^2] - (E[X])^2$.
* Small variance → values close to mean.
* Large variance → values spread out.

---
</details>

> # `Venn Diagrams in Probability`

<details>
<summary>Click to expand</summary>

###  **Definition**

A **Venn diagram** is a visual tool (using circles inside a rectangle) to represent **sample space** and the relationships between events.

* Rectangle = **Sample space (S)** (all possible outcomes).
* Circle(s) = **Events** (subsets of the sample space).
* Overlap = **Intersection of events**.

---

## **1. Basic Structures**

* **Single Event (A)**: One circle inside the sample space.
* **Two Events (A, B):**

  * **Union (A ∪ B):** Area covered by A or B or both.
  * **Intersection (A ∩ B):** Overlap (A *and* B).
  * **Complement (A′):** Area outside A.
  * **Difference (A − B):** Area in A but not in B.

---

## **2. Types of Relationships**

* **Mutually Exclusive Events:**
  $A ∩ B = ∅$ (no overlap).
  Example: Tossing a die → A: even number, B: odd number.
* **Independent Events:**
  $P(A ∩ B) = P(A) \cdot P(B)$.
  (They don’t influence each other).
* **Exhaustive Events:**
  Union of events = entire sample space.

---

## **3. Example: Tossing a Die**

Let:

* A = event {even numbers} = {2, 4, 6}.

* B = event {prime numbers} = {2, 3, 5}.

* **Union:** $A ∪ B = \{2,3,4,5,6\}$.

* **Intersection:** $A ∩ B = \{2\}$.

* **Complement (A′):** {1, 3, 5}.

These relationships are neatly shown in a Venn diagram with two overlapping circles.

---

## **4. Use in Probability**

* Helps compute probabilities like:

  $$
  P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
  $$
* Visualizing **conditional probability** and **Bayes’ theorem**.

---

**Summary (Notebook Notes):**

* Venn diagram = visual representation of **events & probabilities**.
* Useful for: Union, Intersection, Complement, Conditional probability.
* Special event relations: Mutually Exclusive, Independent, Exhaustive.

---

</details>

> # `Contingency Tables in Probability`

<details>
<summary>Click to expand</summary>

###  **Definition**

A **contingency table** (also called a cross-tabulation or cross-tab) is a table that displays the **frequency distribution** of variables.

* Rows = categories of one variable.
* Columns = categories of another variable.
* Cells = counts or probabilities of joint outcomes.

It’s a structured way to compute **joint, marginal, and conditional probabilities**.

---

## **1. Structure of a 2×2 Contingency Table**

|               | Event B (Yes) | Event B (No) | Row Total    |
| ------------- | ------------- | ------------ | ------------ |
| Event A (Yes) | a             | b            | a + b        |
| Event A (No)  | c             | d            | c + d        |
| Column Total  | a + c         | b + d        | N (=a+b+c+d) |

* $a, b, c, d$ = frequencies (or probabilities if normalized).
* $N$ = total number of trials.

---

## **2. Types of Probabilities from the Table**

* **Joint Probability:**
  Probability of A and B happening together.

  $$
  P(A \cap B) = \frac{a}{N}
  $$

* **Marginal Probability:**
  Probability of a single event, regardless of the other.

  $$
  P(A) = \frac{a+b}{N}, \quad P(B) = \frac{a+c}{N}
  $$

* **Conditional Probability:**
  Probability of one event given the other.

  $$
  P(A|B) = \frac{a}{a+c}, \quad P(B|A) = \frac{a}{a+b}
  $$

---

## **3. Example: Medical Test**

Suppose a medical test is done on 100 people:

* 20 people have the disease, and test positive (True Positive).
* 5 people don’t have the disease but test positive (False Positive).
* 10 people have the disease but test negative (False Negative).
* 65 people don’t have the disease and test negative (True Negative).

|            | Test Positive | Test Negative | Total |
| ---------- | ------------- | ------------- | ----- |
| Disease    | 20            | 10            | 30    |
| No Disease | 5             | 65            | 70    |
| Total      | 25            | 75            | 100   |

* Joint: $P(\text{Disease ∩ Positive}) = 20/100 = 0.20$.
* Marginal: $P(\text{Disease}) = 30/100 = 0.30$.
* Conditional: $P(\text{Positive | Disease}) = 20/30 = 0.67$.

---

## **4. Use in Probability & ML**

* Classifier evaluation (like **confusion matrices** in ML).
* Chi-square test for independence.
* Computing sensitivity, specificity, precision, recall.

---

**Summary (Notebook Notes):**

* Contingency table organizes **joint, marginal, conditional probabilities**.
* Key tool in hypothesis testing, independence checks, and ML model evaluation.
* Structure is very similar to a **confusion matrix** in classification.

---
</details>

> # `Joint Probability`

<details>
<summary>Click to expand</summary>

### What is Joint Probability?

* Joint probability measures the chance that **two (or more) events happen at the same time**.
* Notation:

  $$
  P(A \cap B) \quad \text{or sometimes} \quad P(A,B)
  $$

  means "the probability that both event A and event B occur."

---

### Formula (for two events A and B)

$$
P(A \cap B) = \frac{\text{Number of outcomes in both A and B}}{\text{Total number of outcomes in sample space}}
$$

If A and B are **independent**, then:

$$
P(A \cap B) = P(A) \cdot P(B)
$$

---

### Example

Suppose we toss a **fair die** 🎲.

* Event A = getting an even number = {2, 4, 6} → $P(A) = 3/6 = 1/2$
* Event B = getting a number > 3 = {4, 5, 6} → $P(B) = 3/6 = 1/2$
* Both A and B = {4, 6} → 2 outcomes → $P(A \cap B) = 2/6 = 1/3$

Notice: $P(A) \cdot P(B) = (1/2)(1/2) = 1/4 \neq 1/3$.
So here A and B are **not independent**.

---

### Joint Probability in a Contingency Table

If we survey **100 students**:

* 40 like tea 
* 50 like coffee 
* 20 like both tea and coffee

Then:

$$
P(\text{Tea ∩ Coffee}) = \frac{20}{100} = 0.20
$$

---

</details>

> # `Marginal Probability`

<details>
<summary>Click to expand</summary>

##  What is Marginal Probability?

* **Marginal probability** is the probability of a **single event happening**, regardless of the outcome of other events.
* It’s called “marginal” because in a contingency table, you find it by summing along the **margins** (rows/columns).

---

### Formula

If you have two events **A** and **B**:

$$
P(A) = \sum_{b} P(A \cap B=b)
$$

$$
P(B) = \sum_{a} P(A=a \cap B)
$$

Basically, add up the joint probabilities across the other variable.

---

### Example 1 (Simple)

Roll a fair die .

* Event A = getting an even number.

$$
P(A) = \frac{3}{6} = \frac{1}{2}
$$

Here we didn’t care about any other event — just the marginal probability of A.

---

###  Example 2 (Contingency Table)

Survey of **100 students**:

|           | Coffee  | No Coffee    | Total |
| --------- | -------- | ------------ | ----- |
| Tea      | 20       | 20           | 40    |
| No Tea   | 30       | 30           | 60    |
| **Total** | 50       | 50           | 100   |

* Joint probability: $P(\text{Tea ∩ Coffee}) = 20/100 = 0.20$
* **Marginal probability of Tea**:

$$
P(\text{Tea}) = 40/100 = 0.40
$$

* **Marginal probability of Coffee**:

$$
P(\text{Coffee}) = 50/100 = 0.50
$$

---

 So:

* **Joint probability** → chance of A *and* B.
* **Marginal probability** → chance of just A (or just B), ignoring the other event.

---
</details>

> # `Conditional Probability`

<details>
<summary>Click to expand</summary>

##  What is Conditional Probability?

* It’s the probability of one event happening **given that another event has already happened**.
* Notation:

$$
P(A \mid B)
$$

(read: probability of A given B)

---

###  Formula

$$
P(A \mid B) = \frac{P(A \cap B)}{P(B)} , \quad P(B) > 0
$$

Meaning: we zoom into the world where B has happened, then ask: *what fraction of those cases also satisfy A?*

---

###  Example 1 (Cards ♥️♣️♦️♠️)

Pick a card from a deck.

* Event A: card is a king.
* Event B: card is a heart.
* $P(A \cap B) = 1/52$ (only king of hearts).
* $P(B) = 13/52 = 1/4$.

$$
P(A \mid B) = \frac{1/52}{1/4} = \frac{1}{13}
$$

So if we already know the card is a heart, chance it’s a king is 1/13.

---

### Example 2 (Contingency Table — Students & Coffee)

From before:

|           | Coffee  | No Coffee  | Total |
| --------- | -------- | ------------ | ----- |
| Tea      | 20       | 20           | 40    |
| No Tea  | 30       | 30           | 60    |
| **Total** | 50       | 50           | 100   |

* Joint probability: $P(\text{Tea ∩ Coffee}) = 20/100 = 0.20$
* Marginal probability: $P(\text{Coffee}) = 50/100 = 0.50$
* **Conditional probability**:

$$
P(\text{Tea} \mid \text{Coffee}) = \frac{P(\text{Tea ∩ Coffee})}{P(\text{Coffee})}
= \frac{0.20}{0.50} = 0.40
$$

So: among coffee drinkers, 40% also drink tea.

---

**Key relationship**:

$$
P(A \cap B) = P(A \mid B) \cdot P(B)
$$

---

</details>

> # `Independent Events`

<details>
<summary>Click to expand</summary>

##  What are Independent Events?

Two events **A** and **B** are **independent** if the occurrence of one does **not affect** the probability of the other.

Mathematically:

$$
P(A \mid B) = P(A)
$$

$$
P(B \mid A) = P(B)
$$

This means knowing that **B** happened gives no extra information about **A**.

---

### Multiplication Rule for Independent Events

If **A** and **B** are independent:

$$
P(A \cap B) = P(A) \cdot P(B)
$$

---

### Example 1 (Coin Toss + Dice Roll )

* Event A: Toss a coin → Heads (P = 0.5)
* Event B: Roll a die → 6 (P = 1/6)

Are they independent? Yes — coin toss doesn’t affect dice roll.

So:

$$
P(A \cap B) = P(A) \cdot P(B) = 0.5 \times \frac{1}{6} = \frac{1}{12}
$$

---

### Example 2 (Cards)

Pick 1 card from a deck.

* Event A: card is a spade. (P = 13/52 = 1/4)
* Event B: card is a king. (P = 4/52 = 1/13)

Is A independent of B?

* Only 1 card is both spade and king → P(A ∩ B) = 1/52.
* P(A)·P(B) = (1/4)·(1/13) = 1/52.  Yes, independent.

---

###  Non-Independent Example

Pick **2 cards without replacement**.

* Event A: First card is a heart.
* Event B: Second card is a heart.

Here, probabilities **change** after first draw (since deck size changes). → Not independent.

---
</details>

> # `Mutually Exclusive Events`

<details>
<summary>Click to expand</summary>

##  What are Mutually Exclusive Events?

Two events **A** and **B** are **mutually exclusive** (or disjoint) if they **cannot happen at the same time**.

Mathematically:

$$
P(A \cap B) = 0
$$

* If A happens, B **cannot** happen, and vice versa.

---

###  Example 1 (Dice Roll )

* Event A = rolling an even number → {2, 4, 6}

* Event B = rolling an odd number → {1, 3, 5}

* Can A and B happen together?  No → P(A ∩ B) = 0

* So A and B are **mutually exclusive**.

---

###  Example 2 (Coin Toss)

* Event A = heads

* Event B = tails

* Both cannot happen on a single toss → **mutually exclusive**.

---

### Important Note

* **Mutually exclusive ≠ Independent**

  * Independent → events don’t affect each other.
  * Mutually exclusive → events **cannot** happen together.

In fact, if events are mutually exclusive and non-zero probability:

$$
P(A \mid B) = 0 \neq P(A)
$$

→ They are **dependent**, not independent.

---

### Key Formula for Mutually Exclusive Events

$$
P(A \cup B) = P(A) + P(B)
$$

(Intersection is zero, so no subtraction needed)

---

</details>

> # `Bayes Theorem`

<details>
<summary>Click to expand</summary>

### **Definition**

Bayes’ Theorem allows us to **update the probability of an event** based on new evidence.

* It’s the foundation of **Bayesian statistics**.
* Formula:

$$
P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)}, \quad P(B) > 0
$$

Where:

* $P(A)$ = prior probability of A (before observing B)
* $P(B \mid A)$ = likelihood of observing B if A is true
* $P(A \mid B)$ = posterior probability of A given B
* $P(B)$ = total probability of B

---

### Law of Total Probability

If $B_1, B_2, ..., B_n$ are all mutually exclusive events covering the sample space:

$$
P(B) = \sum_{i=1}^{n} P(B \mid A_i) \cdot P(A_i)
$$

---

### Example 1: Medical Test

Suppose:

* 1% of population has disease → $P(D) = 0.01$
* Test detects disease correctly 99% → $P(+ \mid D) = 0.99$
* False positive rate 5% → $P(+ \mid \neg D) = 0.05$

We want: $P(D \mid +)$ (probability of disease **given positive test**)

$$
P(D \mid +) = \frac{P(+ \mid D) \cdot P(D)}{P(+ \mid D) P(D) + P(+ \mid \neg D) P(\neg D)}
$$

$$
P(D \mid +) = \frac{0.99 \cdot 0.01}{0.99 \cdot 0.01 + 0.05 \cdot 0.99} \approx 0.167
$$

Even with a positive test, only 16.7% chance of actually having the disease — shows the power of **Bayes’ update**.

---

### Example 2: Machine Learning

* **Naive Bayes classifier** uses Bayes’ Theorem:

$$
P(Class \mid Features) = \frac{P(Features \mid Class) \cdot P(Class)}{P(Features)}
$$

* Assumes features are independent → simplifies computation.

---

### Key Takeaways

1. Bayes’ Theorem connects **prior knowledge** with **new evidence**.
2. Useful in medical diagnosis, spam filtering, predictive modeling.
3. Requires careful calculation of **likelihoods** and **priors**.

---
</details>