# Content

[Constructing a probability distribution for random variable](#constructing-a-probability-distribution-for-a-random-variable)

[Probability with discrete random variable](#probability-with-a-discrete-random-variable)

[Mean, Variance, and Standard Deviation of a Discrete Random Variable](#mean-variance-and-standard-deviation-of-a-discrete-random-variable)

[Transforming Random Variables](#transforming-random-variables)

[Mean of Sum and Difference of Random Variables](#mean-of-sum-and-difference-of-random-variables)

[Variance of Sum and Difference of Random Variables](#variance-of-sum-and-difference-of-random-variables)

[Deriving the Variance of the Difference](#deriving-the-variance-of-the-difference-why-we-add)

[Analyzing the Distribution of the Sum/Difference of Normal Variables](#analyzing-the-distribution-of-the-sumdifference-of-normal-variables)

[Binomial Variables](#binomial-variables)

[The 10% Rule for Assuming "Independence"](#the-10-rule-for-assuming-independence)

[Calculating Binomial Probabilities (Generalizing k successes in n attempts)](#calculating-binomial-probabilities-generalizing-k-successes-in-n-attempts)

[Free Throw Binomial Probability Distribution](#free-throw-binomial-probability-distribution)

[Mean, Variance, and Standard Deviation of a Binomial Variable](#mean-variance-and-standard-deviation-of-a-binomial-variable)

[Geometric Random Variables]

[Probability for a Geometric Random Variable]

[Cumulative Geometric Probability]

[Geometric Distribution Mean and Standard Deviation]

## Constructing a Probability Distribution for a Random Variable

#### Theory
First, what is a **Random Variable**? It's a variable whose value is a numerical outcome of a random process. We typically denote it with a capital letter, like `X`.

A **Discrete Random Variable** is one that can only take on a finite or countable number of distinct values (e.g., the numbers on a die: 1, 2, 3, 4, 5, 6; the number of emails you get in an hour: 0, 1, 2, ...).

A **Probability Distribution** for a discrete random variable is essentially a table, graph, or formula that links each possible value of the random variable with its probability of occurring.

For a distribution to be valid, it must follow two rules:
1.  The probability for every value of `X` must be between 0 and 1 (inclusive).
    *   `0 ≤ P(X=x) ≤ 1`
2.  The sum of all the probabilities for all possible values of `X` must equal 1.
    *   `Σ P(X=x) = 1`

#### Calculation Example
Let's construct a probability distribution for the number of heads when we flip **three** fair coins.

1.  **Define the Random Variable:**
    *   Let `X` = the number of heads that appear.

2.  **List all possible outcomes (the Sample Space):**
    *   HHH
    *   HHT, HTH, THH
    *   HTT, THT, TTH
    *   TTT
    *   Total number of equally likely outcomes = 8.

3.  **Link each outcome to a value of the random variable `X`:**
    *   `X=3`: (HHH) - 1 outcome
    *   `X=2`: (HHT, HTH, THH) - 3 outcomes
    *   `X=1`: (HTT, THT, TTH) - 3 outcomes
    *   `X=0`: (TTT) - 1 outcome

4.  **Calculate the probability for each value of `X`:**
    *   `P(X=3)` = (Number of ways to get 3 heads) / (Total outcomes) = 1/8
    *   `P(X=2)` = 3/8
    *   `P(X=1)` = 3/8
    *   `P(X=0)` = 1/8

5.  **Create the Probability Distribution Table:**

| `x` (Number of Heads) | `P(X=x)` |
| :---: | :---: |
| 0 | 1/8 = 0.125 |
| 1 | 3/8 = 0.375 |
| 2 | 3/8 = 0.375 |
| 3 | 1/8 = 0.125 |
| **Total** | **8/8 = 1.0** |

This table is the probability distribution. It fulfills both rules: all probabilities are between 0 and 1, and their sum is 1.

#### Real-Life Usage
*   **Retail:** A bookstore manager can create a probability distribution for the number of copies of a bestseller sold per day. This helps in managing inventory and deciding when to reorder.
*   **Call Centers:** A company can model the number of calls received per hour. This distribution helps determine how many operators need to be staffed at different times of the day to meet service level goals.

***

## Probability with a Discrete Random Variable

#### Theory
Once you have a probability distribution, you can use it to find the probability of various events. This usually involves identifying the relevant values of the random variable `X` and summing their probabilities.

You can answer questions like:
*   The probability of an **exact** value: `P(X=a)`
*   The probability of being **less than** a value: `P(X < a)`
*   The probability of being **less than or equal to** a value: `P(X ≤ a)`
*   The probability of being **greater than** a value: `P(X > a)`
*   The probability of being **greater than or equal to** a value: `P(X ≥ a)`

#### Calculation Example
Let's use the probability distribution we just created for flipping three coins.

| `x` | `P(X=x)` |
| :---: | :---: |
| 0 | 0.125 |
| 1 | 0.375 |
| 2 | 0.375 |
| 3 | 0.125 |

**Question 1: What is the probability of getting exactly two heads?**
*   This is `P(X=2)`.
*   We just read it from the table: `P(X=2) = 0.375`.

**Question 2: What is the probability of getting *fewer than* two heads?**
*   This is `P(X < 2)`, which means `X` can be 0 or 1.
*   We add the probabilities for those values: `P(X < 2) = P(X=0) + P(X=1)`.
*   `P(X < 2) = 0.125 + 0.375 = 0.500`.

**Question 3: What is the probability of getting *at least one* head?**
*   This is `P(X ≥ 1)`, which means `X` can be 1, 2, or 3.
*   We add the probabilities: `P(X ≥ 1) = P(X=1) + P(X=2) + P(X=3)`.
*   `P(X ≥ 1) = 0.375 + 0.375 + 0.125 = 0.875`.
*   **Shortcut using the complement:** `P(X ≥ 1) = 1 - P(X < 1) = 1 - P(X=0) = 1 - 0.125 = 0.875`.

#### Real-Life Usage
*   **Finance:** An analyst has a probability distribution for the number of times a stock's price will drop in a week. They can use this to calculate the probability that the stock will drop `more than 3 times`, helping them assess risk. `P(Drops > 3) = P(Drops=4) + P(Drops=5)`.
*   **Manufacturing:** A factory manager has a distribution for the number of defective items in a batch of 100. They can calculate the probability of a batch having `2 or fewer` defects (`P(Defects ≤ 2)`), which might be a condition for shipping the batch to a customer.

***

### Python Code Illustration



In [1]:
import pandas as pd

# --- Part 1: Constructing a Probability Distribution ---
print("--- Part 1: Constructing a Probability Distribution ---")

# Let's model the sum of rolling two 6-sided dice.
# The random variable X is the sum.
# Possible values for X are 2, 3, 4, ..., 12.

# There are 6x6 = 36 total possible outcomes.
# We'll create a dictionary mapping the sum (value of X) to its probability.
prob_map = {
    2: 1/36,  # (1,1)
    3: 2/36,  # (1,2), (2,1)
    4: 3/36,  # (1,3), (2,2), (3,1)
    5: 4/36,  # (1,4), (2,3), (3,2), (4,1)
    6: 5/36,  # ...and so on
    7: 6/36,
    8: 5/36,
    9: 4/36,
    10: 3/36,
    11: 2/36,
    12: 1/36,
}

# A pandas Series is a great way to represent a probability distribution.
prob_dist = pd.Series(prob_map)

print("Probability Distribution for the Sum of Two Dice:")
print(prob_dist)
print("\nVerifying the rules:")
print(f"Sum of all probabilities: {prob_dist.sum():.2f}\n")


# --- Part 2: Probability with a Discrete Random Variable ---
print("--- Part 2: Using the Probability Distribution ---")

# Let's use the distribution we just created to answer some questions.

# Question 1: What is the probability that the sum is exactly 5?
p_exact_5 = prob_dist[5]
print(f"P(Sum = 5) = {p_exact_5:.4f}")

# Question 2: What is the probability that the sum is 10 or more?
# P(Sum >= 10) = P(Sum=10) + P(Sum=11) + P(Sum=12)
p_10_or_more = prob_dist[prob_dist.index >= 10].sum()
print(f"P(Sum >= 10) = {p_10_or_more:.4f}")

# Question 3: What is the probability that the sum is less than 5?
# P(Sum < 5) = P(Sum=2) + P(Sum=3) + P(Sum=4)
p_less_than_5 = prob_dist[prob_dist.index < 5].sum()
print(f"P(Sum < 5) = {p_less_than_5:.4f}")

# Question 4: What is the probability the sum is between 6 and 8, inclusive?
# P(6 <= Sum <= 8) = P(Sum=6) + P(Sum=7) + P(Sum=8)
p_6_to_8 = prob_dist[(prob_dist.index >= 6) & (prob_dist.index <= 8)].sum()
print(f"P(6 <= Sum <= 8) = {p_6_to_8:.4f}")


--- Part 1: Constructing a Probability Distribution ---
Probability Distribution for the Sum of Two Dice:
2     0.027778
3     0.055556
4     0.083333
5     0.111111
6     0.138889
7     0.166667
8     0.138889
9     0.111111
10    0.083333
11    0.055556
12    0.027778
dtype: float64

Verifying the rules:
Sum of all probabilities: 1.00

--- Part 2: Using the Probability Distribution ---
P(Sum = 5) = 0.1111
P(Sum >= 10) = 0.1667
P(Sum < 5) = 0.1667
P(6 <= Sum <= 8) = 0.4444


### Mean, Variance, and Standard Deviation of a Discrete Random Variable

### 1. The Mean (or Expected Value)

#### Theory
The **mean** of a discrete random variable is its long-run average value. It's a weighted average of all possible values the variable can take, where the weights are the probabilities of each value.

The more formal term for the mean is **Expected Value**, denoted as `E(X)` or by the Greek letter `μ` (mu). It tells you the average outcome you would expect if you were to repeat the random experiment an infinite number of times.

The formula is:
**E(X) = μ = Σ [x * P(X=x)]**

This means you take each possible value of `x`, multiply it by its probability, and then sum all those products together.

#### Calculation Example
Let's use our probability distribution for flipping three coins, where `X` is the number of heads.

| `x` (Value) | `P(X=x)` (Probability) |
| :---: | :---: |
| 0 | 0.125 |
| 1 | 0.375 |
| 2 | 0.375 |
| 3 | 0.125 |

Now, we add a column for `x * P(X=x)`:

| `x` | `P(X=x)` | `x * P(X=x)` |
| :---: | :---: | :---: |
| 0 | 0.125 | 0 * 0.125 = 0.0 |
| 1 | 0.375 | 1 * 0.375 = 0.375 |
| 2 | 0.375 | 2 * 0.375 = 0.750 |
| 3 | 0.125 | 3 * 0.125 = 0.375 |
| **Total** | **1.0** | **Sum = 1.5** |

So, the **Mean (Expected Value)** is **1.5**.

**Interpretation:** It's impossible to get 1.5 heads on a single trial. However, if you were to repeat the experiment of flipping three coins thousands of times and record the number of heads each time, the average of all those results would be extremely close to 1.5.

#### Real-Life Usage
*   **Lotteries & Gambling:** The expected value of a lottery ticket is the average amount you would win or lose per ticket if you played infinitely. It's almost always a negative number, showing that, on average, you lose money. `E(Winnings) = (Prob of Winning * Prize) - (Cost of Ticket)`.
*   **Insurance:** An insurance company calculates the expected payout for a policyholder. They must set the premium higher than this expected value to be profitable.

---

### 2. The Variance and Standard Deviation

#### Theory
While the mean tells you the center of a distribution, the **variance** and **standard deviation** tell you about its **spread** or **variability**.
*   A **low** standard deviation means the outcomes tend to be very close to the mean.
*   A **high** standard deviation means the outcomes are spread out over a wider range.

**Variance (`Var(X)` or `σ²`)**: The variance is the average of the squared differences from the Mean. Squaring the differences ensures they are all positive and gives more weight to larger deviations.
The formula is:
**Var(X) = σ² = Σ [(x - μ)² * P(X=x)]**

**Standard Deviation (`SD(X)` or `σ`)**: The standard deviation is simply the **square root of the variance**. The main reason we use it is that it brings the measure of spread back into the same units as the random variable, making it more interpretable.
The formula is:
**SD(X) = σ = √Var(X)**

#### Calculation Example
Let's continue with our three-coin flip example. We already know the mean `μ = 1.5`.

1.  **Calculate the Variance (`σ²`)**:
    We'll build our table out further.

| `x` | `P(X=x)` | `x - μ` (Deviation) | `(x - μ)²` (Squared Dev) | `(x - μ)² * P(X=x)` |
| :---: | :---: | :---: | :---: | :---: |
| 0 | 0.125 | 0 - 1.5 = -1.5 | (-1.5)² = 2.25 | 2.25 * 0.125 = 0.28125 |
| 1 | 0.375 | 1 - 1.5 = -0.5 | (-0.5)² = 0.25 | 0.25 * 0.375 = 0.09375 |
| 2 | 0.375 | 2 - 1.5 = 0.5 | (0.5)² = 0.25 | 0.25 * 0.375 = 0.09375 |
| 3 | 0.125 | 3 - 1.5 = 1.5 | (1.5)² = 2.25 | 2.25 * 0.125 = 0.28125 |
| | | | **Total (Variance)** | **Sum = 0.75** |

The **Variance** is **0.75**.

2.  **Calculate the Standard Deviation (`σ`)**:
    *   `σ = √Variance = √0.75 ≈ 0.866`

**Interpretation:** The mean number of heads is 1.5, and the standard deviation is 0.866. This tells us that a "typical" or "standard" outcome will deviate from the mean by about 0.866 heads.

#### Real-Life Usage
*   **Investing:** The standard deviation of a stock's daily returns is a direct measure of its **volatility** or **risk**. A low-risk utility stock might have a very low standard deviation, while a speculative tech stock will have a very high one.
*   **Quality Control:** A manufacturer wants the length of a part to be 10 cm. The mean length might be exactly 10 cm, but if the standard deviation is high, it means many parts are too long or too short. A low standard deviation indicates a consistent and reliable manufacturing process.

***

### Python Code Illustration


In [1]:
import pandas as pd
import numpy as np

# Let's use our probability distribution for the sum of two dice.
# The random variable X is the sum.
prob_map = {
    2: 1/36, 3: 2/36, 4: 3/36, 5: 4/36, 6: 5/36, 7: 6/36,
    8: 5/36, 9: 4/36, 10: 3/36, 11: 2/36, 12: 1/36,
}
prob_dist = pd.Series(prob_map)

print("Probability Distribution for the Sum of Two Dice:")
print(prob_dist)
print("-" * 40)


# --- 1. Calculate the Mean (Expected Value) ---
# E(X) = Σ [x * P(X=x)]
# This is a weighted average. In pandas, we can get this by
# multiplying the series by its index and then summing the result.
mean_expected_value = (prob_dist * prob_dist.index).sum()

print(f"Mean (Expected Value) of the distribution: {mean_expected_value:.4f}\n")
print("Interpretation: If you roll two dice millions of times, the average sum would be very close to 7.")
print("-" * 40)


# --- 2. Calculate the Variance and Standard Deviation ---
# Var(X) = Σ [(x - μ)² * P(X=x)]
# Step 1: Calculate the squared deviations from the mean (x - μ)²
squared_deviations = (prob_dist.index - mean_expected_value)**2

# Step 2: Multiply the squared deviations by their probabilities and sum up
variance = (squared_deviations * prob_dist).sum()

# Step 3: Standard deviation is the square root of the variance
std_deviation = np.sqrt(variance)

print(f"Variance (σ²) of the distribution: {variance:.4f}")
print(f"Standard Deviation (σ) of the distribution: {std_deviation:.4f}\n")
print(f"Interpretation: A typical outcome when rolling two dice is expected to be about {std_deviation:.2f} away from the mean of 7.")


Probability Distribution for the Sum of Two Dice:
2     0.027778
3     0.055556
4     0.083333
5     0.111111
6     0.138889
7     0.166667
8     0.138889
9     0.111111
10    0.083333
11    0.055556
12    0.027778
dtype: float64
----------------------------------------
Mean (Expected Value) of the distribution: 7.0000

Interpretation: If you roll two dice millions of times, the average sum would be very close to 7.
----------------------------------------
Variance (σ²) of the distribution: 5.8333
Standard Deviation (σ) of the distribution: 2.4152

Interpretation: A typical outcome when rolling two dice is expected to be about 2.42 away from the mean of 7.


### Transforming Random Variables

#### Theory
Transforming a random variable means applying a mathematical operation to it. The most common type is a **linear transformation**, which involves **shifting** (adding or subtracting a constant) and/or **scaling** (multiplying or dividing by a constant).

Let's say we have a random variable `X` with a known mean `μ_X` and variance `σ²_X`.
We create a new random variable `Y` by applying the transformation:
**Y = a + bX**
*   `a` represents the **shift** (a constant being added).
*   `b` represents the **scale** (a constant `X` is being multiplied by).

The key question is: What are the new mean (`μ_Y`), variance (`σ²_Y`), and standard deviation (`σ_Y`) of our new variable `Y`?

---

### Impact of Transforming Random Variables

### 1. The Impact of Shifting (Adding a Constant `a`)

Imagine taking your entire probability distribution and just sliding it along the number line.

*   **Effect on the Mean:** If you add a constant `a` to every value of `X`, the mean of the distribution also shifts by `a`.
    *   **E(X + a) = E(X) + a**
*   **Effect on the Variance and Standard Deviation:** Shifting the distribution does **not** change its spread or shape. The distances between the values remain the same. Therefore, the variance and standard deviation are unchanged.
    *   **Var(X + a) = Var(X)**
    *   **SD(X + a) = SD(X)**



### 2. The Impact of Scaling (Multiplying by a Constant `b`)

Imagine stretching or compressing your distribution.

*   **Effect on the Mean:** If you multiply every value of `X` by a constant `b`, the mean is also scaled by `b`.
    *   **E(bX) = b * E(X)**
*   **Effect on the Variance and Standard Deviation:** This is the most important rule to remember.
    *   The **standard deviation** is scaled by the **absolute value** of `b`. It's always positive because spread can't be negative.
        *   **SD(bX) = |b| * SD(X)**
    *   The **variance** is scaled by **b²**. Since variance is in squared units, its scaling factor must also be squared.
        *   **Var(bX) = b² * Var(X)**



### 3. Combining Shifting and Scaling (The General Case)

When we combine both transformations in `Y = a + bX`:

*   **New Mean:** The mean is affected by both scaling and shifting.
    *   **E(Y) = E(a + bX) = a + b * E(X)**
*   **New Variance:** The variance is only affected by scaling.
    *   **Var(Y) = Var(a + bX) = b² * Var(X)**
*   **New Standard Deviation:** The standard deviation is only affected by scaling.
    *   **SD(Y) = SD(a + bX) = |b| * SD(X)**

#### Calculation Example
Let's use our previous example of rolling two dice, where `X` is the sum. We found:
*   Mean `μ_X = 7.0`
*   Variance `σ²_X ≈ 5.833`
*   Standard Deviation `σ_X ≈ 2.415`

Imagine a casino game where the payout (`Y`) is calculated as: **"You get $5 for every point on the dice, but you have to pay $10 to play."**

So, our new random variable is `Y = -10 + 5X`.
*   The shift `a` is -10.
*   The scale `b` is 5.

Let's find the mean, variance, and standard deviation of the payout without re-calculating the entire distribution.

1.  **Calculate the Mean Payout `E(Y)`:**
    *   `E(Y) = a + b * E(X)`
    *   `E(Y) = -10 + 5 * (7.0) = -10 + 35 = 25`
    *   The expected average payout of this game is $25.

2.  **Calculate the Variance of the Payout `Var(Y)`:**
    *   `Var(Y) = b² * Var(X)`
    *   `Var(Y) = 5² * (5.833) = 25 * 5.833 = 145.825`

3.  **Calculate the Standard Deviation of the Payout `SD(Y)`:**
    *   `SD(Y) = |b| * SD(X)`
    *   `SD(Y) = |5| * (2.415) = 12.075`
    *   (Or, `SD(Y) = √Variance = √145.825 ≈ 12.075`)

So, the payout has a mean of $25 with a standard deviation of about $12.08. This tells us the risk (spread) of the payouts.

#### Real-Life Usage
*   **Unit Conversion:** Converting temperatures from Celsius (`C`) to Fahrenheit (`F`) uses the formula `F = 32 + 1.8*C`. If you know the mean and standard deviation of daily temperatures in Celsius, you can instantly find the mean and standard deviation in Fahrenheit using these rules.
*   **Finance:** If you know the mean and variance of a stock's return `X`, and you decide to invest $5000 in it plus a fixed management fee of $50, your portfolio `Y` is `Y = -50 + 5000X`. You can find the mean and risk (standard deviation) of your portfolio's value.
*   **Statistics (Z-scores):** The process of standardization (calculating a Z-score) is a direct application of transformation!
    *   `Z = (X - μ) / σ = (-μ/σ) + (1/σ)X`
    *   Here, `a = -μ/σ` and `b = 1/σ`. This transformation always results in a new distribution with a mean of 0 and a standard deviation of 1.

***

### Python Code Illustration





In [1]:
import pandas as pd
import numpy as np

# Let's start with our probability distribution for the sum of two dice (X).
prob_map = {
    2: 1/36, 3: 2/36, 4: 3/36, 5: 4/36, 6: 5/36, 7: 6/36,
    8: 5/36, 9: 4/36, 10: 3/36, 11: 2/36, 12: 1/36,
}
prob_dist_X = pd.Series(prob_map)

# Original Mean and Variance (as calculated before)
mean_X = (prob_dist_X * prob_dist_X.index).sum()
variance_X = (((prob_dist_X.index - mean_X)**2) * prob_dist_X).sum()
std_dev_X = np.sqrt(variance_X)

print("--- Original Random Variable X (Sum of Dice) ---")
print(f"Mean (μ_X): {mean_X:.4f}")
print(f"Variance (σ²_X): {variance_X:.4f}")
print(f"Standard Deviation (σ_X): {std_dev_X:.4f}")
print("-" * 50)


# Define the transformation for our casino game Y = -10 + 5X
a = -10  # Shift
b = 5    # Scale

# --- Method 1: The Long Way (Transform the entire distribution) ---
# Create the new random variable Y by applying the transformation to the values of X
# The probabilities for each corresponding outcome remain the same.
prob_dist_Y = pd.Series(prob_dist_X.values, index=(a + b * prob_dist_X.index))

# Calculate the mean and variance of Y directly
mean_Y_direct = (prob_dist_Y * prob_dist_Y.index).sum()
variance_Y_direct = (((prob_dist_Y.index - mean_Y_direct)**2) * prob_dist_Y).sum()
std_dev_Y_direct = np.sqrt(variance_Y_direct)

print("--- Method 1: Direct Calculation from New Distribution Y ---")
print(f"New Mean (μ_Y): {mean_Y_direct:.4f}")
print(f"New Variance (σ²_Y): {variance_Y_direct:.4f}")
print(f"New Standard Deviation (σ_Y): {std_dev_Y_direct:.4f}")
print("-" * 50)


# --- Method 2: The Shortcut (Using the Transformation Rules) ---
# Apply the rules to the original mean and variance
mean_Y_rules = a + b * mean_X
variance_Y_rules = (b**2) * variance_X
std_dev_Y_rules = abs(b) * std_dev_X

print("--- Method 2: Applying the Shortcut Rules ---")
print(f"New Mean (μ_Y) = a + b*μ_X = {a} + {b}*{mean_X:.2f} = {mean_Y_rules:.4f}")
print(f"New Variance (σ²_Y) = b²*σ²_X = {b**2}*{variance_X:.2f} = {variance_Y_rules:.4f}")
print(f"New Standard Deviation (σ_Y) = |b|*σ_X = |{b}|*{std_dev_X:.2f} = {std_dev_Y_rules:.4f}")
print("-" * 50)

print("As you can see, both methods yield the exact same results!")


--- Original Random Variable X (Sum of Dice) ---
Mean (μ_X): 7.0000
Variance (σ²_X): 5.8333
Standard Deviation (σ_X): 2.4152
--------------------------------------------------
--- Method 1: Direct Calculation from New Distribution Y ---
New Mean (μ_Y): 25.0000
New Variance (σ²_Y): 145.8333
New Standard Deviation (σ_Y): 12.0761
--------------------------------------------------
--- Method 2: Applying the Shortcut Rules ---
New Mean (μ_Y) = a + b*μ_X = -10 + 5*7.00 = 25.0000
New Variance (σ²_Y) = b²*σ²_X = 25*5.83 = 145.8333
New Standard Deviation (σ_Y) = |b|*σ_X = |5|*2.42 = 12.0761
--------------------------------------------------
As you can see, both methods yield the exact same results!


### Mean of Sum and Difference of Random Variables

This is the most intuitive part of combining random variables.

#### Theory
Let's say you have two random variables, `X` and `Y`, with means `E(X) = μ_X` and `E(Y) = μ_Y`.

**Rule for the Sum:** The mean of the sum of two random variables is simply the sum of their individual means.
*   **E(X + Y) = E(X) + E(Y) = μ_X + μ_Y**

**Rule for the Difference:** The mean of the difference of two random variables is the difference of their individual means.
*   **E(X - Y) = E(X) - E(Y) = μ_X - μ_Y**

These two rules are beautifully simple and work **regardless of whether the variables are independent or not**.

#### Calculation Example
A student's commute involves a train ride (`X`) and a walk (`Y`).
*   The average train ride time is `μ_X = 20` minutes.
*   The average walk time is `μ_Y = 8` minutes.

**What is the average total commute time?**
*   Let `T = X + Y`.
*   `E(T) = E(X) + E(Y) = 20 + 8 = 28` minutes.

**On average, how much longer is the train ride than the walk?**
*   Let `D = X - Y`.
*   `E(D) = E(X) - E(Y) = 20 - 8 = 12` minutes.

***

### Variance of Sum and Difference of Random Variables

This is the most important—and most counter-intuitive—part of combining random variables.

#### Theory
Let's take the same two random variables, `X` and `Y`, with variances `Var(X) = σ²_X` and `Var(Y) = σ²_Y`.

For these rules to apply, we must assume that **`X` and `Y` are independent**.

**Rule for the Sum:** The variance of the sum of two independent random variables is the sum of their individual variances.
*   **Var(X + Y) = Var(X) + Var(Y) = σ²_X + σ²_Y**

**Rule for the Difference:** The variance of the difference of two independent random variables is **also the sum** of their individual variances.
*   **Var(X - Y) = Var(X) + Var(Y) = σ²_X + σ²_Y**

**Why do we always add?** This is a crucial point. Variance is a measure of uncertainty or variability. Whether you are adding or subtracting the variables, you are combining two sources of uncertainty. The difference between two uncertain values is *more* variable, not less, than the individual values. Uncertainty adds to uncertainty.

#### Calculation Example
Let's continue with the commute example.
*   The standard deviation of the train ride is `σ_X = 3` minutes, so `Var(X) = 3² = 9`.
*   The standard deviation of the walk is `σ_Y = 1` minute, so `Var(Y) = 1² = 1`.
*   Assume the train time and walk time are independent.

**What is the variance and standard deviation of the total commute time (`T = X + Y`)?**
*   `Var(T) = Var(X) + Var(Y) = 9 + 1 = 10`
*   Standard Deviation `SD(T) = √10 ≈ 3.16` minutes.

**What is the variance and standard deviation of the difference in times (`D = X - Y`)?**
*   `Var(D) = Var(X) + Var(Y) = 9 + 1 = 10` (We still add!)
*   Standard Deviation `SD(D) = √10 ≈ 3.16` minutes.

So, while the average total time is 28 minutes and the average difference is 12 minutes, both of these new random variables have the same spread (standard deviation of 3.16 minutes).

***

### Deriving the Variance of the Difference (Why We Add)

This section provides a brief mathematical proof for the rule `Var(X - Y) = Var(X) + Var(Y)`.

1.  **Start with the definition of variance:** `Var(Z) = E[ (Z - E[Z])² ]`. Let our variable `Z = X - Y`.
2.  **Substitute `Z`:** `Var(X - Y) = E[ ((X - Y) - E[X - Y])² ]`
3.  **Use the rule for the mean of a difference** (`E[X - Y] = E[X] - E[Y]`):
    `= E[ ((X - Y) - (E[X] - E[Y]))² ]`
4.  **Rearrange the terms inside the brackets** to group `X` terms and `Y` terms:
    `= E[ ( (X - E[X]) - (Y - E[Y]) )² ]`
5.  **Expand the square** (like `(a - b)² = a² - 2ab + b²`):
    `= E[ (X-E[X])² - 2(X-E[X])(Y-E[Y]) + (Y-E[Y])² ]`
6.  **Split the expectation across the terms:**
    `= E[(X-E[X])²] + E[(Y-E[Y])²] - 2 * E[(X-E[X])(Y-E[Y])]`
7.  **Recognize the terms:**
    *   The first term is the definition of `Var(X)`.
    *   The second term is the definition of `Var(Y)`.
    *   The third term `E[(X-E[X])(Y-E[Y])]` is the definition of the **covariance** of X and Y.
8.  **Apply the independence assumption:** For independent variables, the covariance is 0. So, the entire third term becomes zero.
9.  **Final Result:** This leaves us with `Var(X - Y) = Var(X) + Var(Y)`.

***

### Analyzing the Distribution of the Sum/Difference of Normal Variables

This is a special and very powerful case that combines all the previous rules.

#### Theory
If you have two **independent** random variables, `X` and `Y`, and both follow a **Normal distribution**, then their sum or difference will **also follow a Normal distribution**.

*   `X ~ N(μ_X, σ_X)` (X is Normal with mean `μ_X` and std dev `σ_X`)
*   `Y ~ N(μ_Y, σ_Y)` (Y is Normal with mean `μ_Y` and std dev `σ_Y`)

To define the new Normal distribution for the sum (`T = X+Y`) or difference (`D = X-Y`), we just need to find its mean and standard deviation using the rules we've learned.

**Distribution of the Sum (`T = X + Y`):**
*   **Mean:** `μ_T = μ_X + μ_Y`
*   **Standard Deviation:** `σ_T = √(σ²_X + σ²_Y)`
*   So, `T ~ N(μ_X + μ_Y,  √(σ²_X + σ²_Y))`

**Distribution of the Difference (`D = X - Y`):**
*   **Mean:** `μ_D = μ_X - μ_Y`
*   **Standard Deviation:** `σ_D = √(σ²_X + σ²_Y)` (Remember to add variances!)
*   So, `D ~ N(μ_X - μ_Y,  √(σ²_X + σ²_Y))`

This is incredibly useful because it allows us to answer probabilistic questions about the combined variable (e.g., "What is the probability that the total commute time exceeds 30 minutes?").

***

### Python Code Illustration

This code simulates the process, proving that the rules accurately predict the outcome.

In [2]:
import numpy as np

# --- Setup: Define the two independent Normal random variables ---
# Let X be the time to pack a box
mu_X = 120  # seconds
sigma_X = 10  # seconds

# Let Y be the time to label the box
mu_Y = 30   # seconds
sigma_Y = 5   # seconds

print("--- Variable Definitions ---")
print(f"Packing Time (X): Mean={mu_X}, Std Dev={sigma_X}")
print(f"Labeling Time (Y): Mean={mu_Y}, Std Dev={sigma_Y}\n")

# --- Step 1: Simulate the process thousands of times ---
num_simulations = 100000
X_samples = np.random.normal(mu_X, sigma_X, num_simulations)
Y_samples = np.random.normal(mu_Y, sigma_Y, num_simulations)

# Create the sum and difference variables from the samples
T_samples = X_samples + Y_samples  # Total time
D_samples = X_samples - Y_samples  # Time difference

# --- Step 2: Calculate the results from the simulated data ---
print("--- Results from Live Simulation ---")
print(f"Total Time (T = X+Y): Mean={np.mean(T_samples):.2f}, Std Dev={np.std(T_samples):.2f}")
print(f"Time Difference (D = X-Y): Mean={np.mean(D_samples):.2f}, Std Dev={np.std(D_samples):.2f}\n")

# --- Step 3: Calculate the results using our theoretical rules ---
# For the Sum (T)
mu_T = mu_X + mu_Y
var_T = sigma_X**2 + sigma_Y**2
sigma_T = np.sqrt(var_T)

# For the Difference (D)
mu_D = mu_X - mu_Y
var_D = sigma_X**2 + sigma_Y**2  # Variances still add!
sigma_D = np.sqrt(var_D)

print("--- Results from Theoretical Rules ---")
print(f"Total Time (T): Mean={mu_T:.2f}, Std Dev={sigma_T:.2f}")
print(f"Time Difference (D): Mean={mu_D:.2f}, Std Dev={sigma_D:.2f}\n")

print("Note how the simulated results are extremely close to the theoretical predictions.")


--- Variable Definitions ---
Packing Time (X): Mean=120, Std Dev=10
Labeling Time (Y): Mean=30, Std Dev=5

--- Results from Live Simulation ---
Total Time (T = X+Y): Mean=149.98, Std Dev=11.20
Time Difference (D = X-Y): Mean=90.02, Std Dev=11.16

--- Results from Theoretical Rules ---
Total Time (T): Mean=150.00, Std Dev=11.18
Time Difference (D): Mean=90.00, Std Dev=11.18

Note how the simulated results are extremely close to the theoretical predictions.


### The Binomial Distribution

### Binomial Variables

#### Theory
A **binomial random variable** (`X`) counts the number of successes in a fixed number of independent trials. For a situation to be classified as a binomial setting, it must meet four conditions, which can be remembered with the acronym **BINS**:

*   **B**inary: Each trial can only have two outcomes, which we label "success" or "failure". (e.g., heads/tails, defective/not defective, made shot/missed shot).
*   **I**ndependent: The outcome of one trial has no influence on the outcome of any other trial. (This is where the 10% rule comes in, see next section).
*   **N**umber: There is a fixed number of trials, `n`.
*   **S**ame Probability: The probability of success, `p`, is the same for each trial. The probability of failure is `1-p`.

If a random variable `X` follows these conditions, we write `X ~ Bin(n, p)`.

### The 10% Rule for Assuming "Independence"

#### Theory
The "Independent" condition is the most difficult one to meet in real life, especially when sampling from a population. If you sample **without replacement**, you are technically violating independence. For example, if you draw a card from a deck, the probability for the next draw changes.

However, the **10% Rule** (also called the 10% Condition) states that if your sample size `n` is no more than 10% of the total population size `N` (i.e., `n ≤ 0.10N`), the change in probability from one trial to the next is so small that we can safely *assume* the trials are independent.

#### Calculation Example
*   **Violates Independence:** A bag contains 10 red and 10 blue marbles (Population N=20). You draw `n=5` marbles. Since 5 is more than 10% of 20 (5 > 2), you cannot use the binomial model. The probability changes significantly with each draw.
*   **Meets 10% Rule:** An assembly line produces 50,000 chips (Population N=50,000). You randomly select `n=100` chips to check for defects. Since 100 is much less than 10% of 50,000 (100 < 5000), you can treat the trials as independent and use the binomial model, even though you are sampling without replacement.

#### Real-Life Usage
This rule is the foundation of most political polling and large-scale quality control. It allows statisticians to use the much simpler binomial calculations on data that is technically dependent.

---

### Calculating Binomial Probabilities (Generalizing *k* successes in *n* attempts)

#### Theory
Once we've established a binomial setting, we often want to find the probability of getting exactly `k` successes in `n` trials.

Let's think about one specific outcome: getting `k` successes and then `n-k` failures. The probability would be `p^k * (1-p)^(n-k)`.

However, there are usually many different ways to arrange `k` successes among `n` trials. The number of ways is given by the **combination formula**, "n choose k":
**C(n, k) = n! / [k! * (n-k)!]**

The **Binomial Probability Formula** combines these two parts:
**P(X = k) = C(n, k) * p^k * (1-p)^(n-k)**
This reads: "(The number of ways to arrange the successes) * (The probability of any one of those specific arrangements)".

---

### Free Throw Binomial Probability Distribution

This is a classic example that ties everything together. Let's say a basketball player shoots **n=4** free throws, and their probability of making any given shot is **p=0.75**.
Let `X` = the number of made shots. This is a binomial setting: `X ~ Bin(n=4, p=0.75)`.

Let's find the probability of making exactly **k=3** shots.
*   `P(X=3) = C(4, 3) * (0.75)³ * (0.25)¹`
*   `C(4, 3)` = 4! / (3! * 1!) = 4
*   `P(X=3)` = 4 * (0.421875) * (0.25) = **0.421875**

We can do this for all possible values of `k` (0, 1, 2, 3, 4) to construct the full probability distribution:

| `k` (Made Shots) | `P(X=k)` Calculation | Probability |
| :---: | :--- | :---: |
| 0 | C(4,0) * (0.75)⁰ * (0.25)⁴ | 0.0039 |
| 1 | C(4,1) * (0.75)¹ * (0.25)³ | 0.0469 |
| 2 | C(4,2) * (0.75)² * (0.25)² | 0.2109 |
| 3 | C(4,3) * (0.75)³ * (0.25)¹ | 0.4219 |
| 4 | C(4,4) * (0.75)⁴ * (0.25)⁰ | 0.3164 |
| **Total** | | **1.0000** |

---

### Mean, Variance, and Standard Deviation of a Binomial Variable

#### Theory
While you can use the general formulas for discrete random variables to find the mean and variance, the binomial distribution has simple, powerful shortcuts.

**Expected Value (Mean) of a Binomial Variable (`μ`)**:
The formula is intuitively simple:
**μ = n * p**
If you repeat the experiment many times, you expect, on average, `p` percent of the `n` trials to be successes.

**Variance (`σ²`) and Standard Deviation (`σ`) of a Binomial Variable**:
The formulas for spread are also straightforward:
*   **Variance: σ² = n * p * (1-p)**
*   **Standard Deviation: σ = √[n * p * (1-p)]**

#### Calculation Example (Continuing the free throw example)
For our player with `n=4` and `p=0.75`:

*   **Mean (Expected Value):**
    *   `μ = 4 * 0.75 = 3`
    *   **Interpretation:** On average, we expect this player to make 3 out of every 4 free throws in the long run.

*   **Variance:**
    *   `σ² = 4 * 0.75 * (0.25) = 0.75`

*   **Standard Deviation:**
    *   `σ = √0.75 ≈ 0.866`
    *   **Interpretation:** A typical performance for this player will deviate from the mean of 3 made shots by about 0.866 shots.

***

### Python Code Illustration


In [1]:
import pandas as pd
from scipy.stats import binom

# --- Binomial Variables and Probability Calculations ---
# Let's model our free throw shooter example.
n = 4  # Number of trials (shots)
p = 0.75 # Probability of success (making a shot)

# Question 1: What is the probability of making EXACTLY k=3 shots?
k = 3
# The .pmf() method stands for "Probability Mass Function" and calculates P(X=k)
prob_k_3 = binom.pmf(k, n, p)
print(f"--- Binomial Probability Calculation ---")
print(f"For a Bin(n={n}, p={p}) variable, the probability of k={k} successes is: {prob_k_3:.4f}\n")


# --- Constructing the Full Binomial Distribution ---
print(f"--- Full Binomial Distribution Table ---")
# We can create a list of all possible values of k
k_values = list(range(n + 1))

# Calculate the probability for each k
probabilities = [binom.pmf(k, n, p) for k in k_values]

# Display the distribution in a pandas DataFrame for clarity
dist_df = pd.DataFrame({'k (Made Shots)': k_values, 'P(X=k)': probabilities})
print(dist_df)
print("-" * 40)


# --- Mean, Variance, and Standard Deviation Shortcuts ---
print(f"--- Mean and Standard Deviation ---")
# We can use the shortcut formulas directly:
mean_formula = n * p
variance_formula = n * p * (1-p)
std_dev_formula = variance_formula**0.5 # square root

print("Calculated using the shortcut formulas:")
print(f"Mean (μ = n*p): {mean_formula:.4f}")
print(f"Variance (σ² = n*p*(1-p)): {variance_formula:.4f}")
print(f"Standard Deviation (σ): {std_dev_formula:.4f}\n")

# The scipy library also has built-in functions for these, which use the same formulas
mean_scipy = binom.mean(n, p)
variance_scipy = binom.var(n, p)
std_dev_scipy = binom.std(n, p)

print("Calculated using scipy's built-in functions:")
print(f"Mean: {mean_scipy:.4f}")
print(f"Variance: {variance_scipy:.4f}")
print(f"Standard Deviation: {std_dev_scipy:.4f}")


--- Binomial Probability Calculation ---
For a Bin(n=4, p=0.75) variable, the probability of k=3 successes is: 0.4219

--- Full Binomial Distribution Table ---
   k (Made Shots)    P(X=k)
0               0  0.003906
1               1  0.046875
2               2  0.210938
3               3  0.421875
4               4  0.316406
----------------------------------------
--- Mean and Standard Deviation ---
Calculated using the shortcut formulas:
Mean (μ = n*p): 3.0000
Variance (σ² = n*p*(1-p)): 0.7500
Standard Deviation (σ): 0.8660

Calculated using scipy's built-in functions:
Mean: 3.0000
Variance: 0.7500
Standard Deviation: 0.8660


### Geometric Random Variables

#### Theory
While a binomial variable counts the number of successes in a *fixed* number of trials, a **geometric random variable** (`X`) counts the number of trials required **until you get your first success**.

For a situation to be a geometric setting, it must meet these conditions (very similar to binomial):

*   **B**inary: Each trial results in "success" or "failure".
*   **I**ndependent: The outcome of one trial does not affect the next.
*   **F**irst Success: The variable of interest (`X`) is the number of the trial on which the *first* success occurs.
*   **S**ame Probability: The probability of success, `p`, is the same for each trial.

The possible values for a geometric random variable `X` are 1, 2, 3, ... going on forever. It's theoretically possible, though highly unlikely, that you wait a very long time for the first success.

#### Real-Life Usage
*   **Quality Control:** An inspector checks items from a production line. `X` is the number of items they inspect until they find the first defective one.
*   **Gaming:** A player opens loot boxes with a 5% chance of getting a rare item. `X` is the number of boxes they open to get their first rare item.
*   **Biology:** A scientist is trying to produce a certain genetic mutation. `X` is the number of attempts needed to achieve the first successful mutation.

---

### Probability for a Geometric Random Variable

#### Theory
This answers the question: "What is the probability that the first success occurs on exactly the `k`-th trial?"

The logic is very straightforward. For the first success to be on trial `k`, you must have `k-1` consecutive failures, followed immediately by one success.

*   The probability of failure is `(1-p)`.
*   The probability of `k-1` failures in a row is `(1-p)^(k-1)`.
*   The probability of the one success is `p`.

The **Geometric Probability Formula** is therefore:
**P(X = k) = (1-p)^(k-1) * p**

#### Calculation Example
Let's say you are rolling a standard six-sided die until you get a 6. The probability of success `p = 1/6`.
What is the probability that your first 6 appears on the **4th roll** (`k=4`)?

*   This means you must have 3 failures (not a 6) and then 1 success (a 6).
*   `P(Failure) = 1 - 1/6 = 5/6`.
*   `P(X = 4) = (5/6)³ * (1/6)`
*   `P(X = 4) = (125/216) * (1/6) = 125/1296 ≈ 0.0965`

There is about a 9.65% chance that it will take you exactly 4 rolls to get your first 6.

---

### Cumulative Geometric Probability

#### Theory
This answers the question: "What is the probability that the first success occurs **on or before** the `k`-th trial?" This is written as `P(X ≤ k)`.

You could calculate this the long way: `P(X=1) + P(X=2) + ... + P(X=k)`. But this is tedious.

The much easier way is to use the **complement rule**.
*   The complement of "getting at least one success by trial `k`" is "getting **no successes** in the first `k` trials".
*   In other words, the complement is having `k` consecutive failures.
*   The probability of `k` failures is `(1-p)^k`.

The **Cumulative Geometric Probability Formula** is:
**P(X ≤ k) = 1 - (1-p)^k**

#### Calculation Example
Using our die-rolling example (`p=1/6`), what is the probability that you get your first 6 **on or before the 4th roll**? (`P(X ≤ 4)`)

*   Using the formula: `P(X ≤ 4) = 1 - (1 - 1/6)⁴ = 1 - (5/6)⁴`
*   `P(X ≤ 4) = 1 - (625/1296) = 671/1296 ≈ 0.5177`

There is about a 51.8% chance you'll have rolled your first 6 by the fourth try.

---

### Geometric Distribution Mean and Standard Deviation

#### Theory
Like the binomial distribution, the geometric distribution has simple shortcuts for its mean (expected value) and standard deviation.

**Mean (Expected Value) `μ`**:
This formula is very intuitive. If an event has a `1/p` chance of happening, you'd expect to wait `p` trials for it to occur.
**μ = 1/p**

**Variance `σ²` and Standard Deviation `σ`**:
These formulas measure the spread, or variability, in the waiting time.
*   **Variance: σ² = (1-p) / p²**
*   **Standard Deviation: σ = √[(1-p) / p²]**

#### Calculation Example
For our die-rolling experiment (`p=1/6`):

*   **Mean (Expected Value):**
    *   `μ = 1 / (1/6) = 6`
    *   **Interpretation:** On average, you would expect to wait 6 rolls to get your first 6.

*   **Variance:**
    *   `σ² = (1 - 1/6) / (1/6)² = (5/6) / (1/36) = (5/6) * 36 = 30`

*   **Standard Deviation:**
    *   `σ = √30 ≈ 5.477`
    *   **Interpretation:** While the average waiting time is 6 rolls, a typical result will deviate from that average by about 5.5 rolls. This high standard deviation tells us that the waiting time is highly variable.

***

### Python Code Illustration
The `scipy.stats.geom` library is the perfect tool for this. Note that `scipy` defines the geometric variable as the number of *failures* before the first success, so to get `P(X=k)` we look for `k-1` failures.
