### Quiz 1
A marketing team is running two parallel campaigns:

- Campaign A promotes a new feature update

- Campaign B promotes a discount offer

From a small test group of $6$ users:

- Users who engaged with Campaign A: `{'Alice', 'Bob', 'Charlie', 'David'}`

- Users who engaged with Campaign B: `{'Charlie', 'Eve', 'Frank', 'Bob'}`



$$
P(A \cup B) = P(A) + P(B) - P(A \cap B)
$$

Where:

- $P(A\cup B)$ is the probability of engaging with at least one campaign

- $P(A)$ and $P(B)$ are probabilities of engaging with each campaign individually

- $P(A\cap B)$ is the probability of engaging with both campaigns

Write a Python function `probability_union(set1, set2, total_population)` to calculate the probability that a randomly selected user from this test group engaged with at least one of the campaigns.

In [None]:
# Users who engaged with each campaign
likes_a = {'Alice', 'Bob', 'Charlie', 'David'}
likes_b = {'Charlie', 'Eve', 'Frank', 'Bob'}

---

### Solution 1: The General Addition Rule

#### **1. The Concept: Union of Events**
This problem deals with the **Union** of two sets. In probability, the union ($A \cup B$) represents the event that **either** A occurs, **or** B occurs, **or both** occur.

However, we cannot simply add the probability of A and B together. If we did ($P(A) + P(B)$), we would double-count the people who did both. The **Inclusion-Exclusion Principle** corrects this by subtracting the overlap.

#### **2. The Formula**
$$
P(A \cup B) = P(A) + P(B) - P(A \cap B)
$$

#### **3. Why is this the correct approach?**
The question asks for the probability of a user engaging with **"at least one"** campaign. The phrase "at least one" is the hallmark of a Union problem. Since some users (Bob, Charlie) are in both lists, the events are **not mutually exclusive**, so the subtraction term $P(A \cap B)$ is required.

#### **4. Probability Math**
* **Total Population ($N$):** 6
* **Event A (Engaged A):** {Alice, Bob, Charlie, David} $\rightarrow Count = 4$
    * $P(A) = \frac{4}{6}$
* **Event B (Engaged B):** {Charlie, Eve, Frank, Bob} $\rightarrow Count = 4$
    * $P(B) = \frac{4}{6}$
* **Intersection $A \cap B$ (Engaged Both):** {Bob, Charlie} $\rightarrow Count = 2$
    * $P(A \cap B) = \frac{2}{6}$

**Calculation:**
$$
P(A \cup B) = \frac{4}{6} + \frac{4}{6} - \frac{2}{6} = \frac{6}{6} = 1.0
$$

**Answer:** 100% of the users engaged with at least one campaign.

In [None]:
def probability_union(set1, set2, total_population):
    # Calculate individual probabilities
    p_a = len(set1) / total_population
    p_b = len(set2) / total_population

    # Calculate intersection probability (Overlaps)
    intersection_count = len(set1.intersection(set2))
    p_intersection = intersection_count / total_population

    # Apply Formula: P(A) + P(B) - P(Intersection)
    p_union = p_a + p_b - p_intersection
    return p_union

# Test
total_users = len(likes_a.union(likes_b))
print(f"P(A U B): {probability_union(likes_a, likes_b, total_users):.2f}")

P(A U B): 1.00


In [None]:
len(likes_a.union (likes_b))

6

In [None]:
likes_a.intersection(likes_b)

{'Bob', 'Charlie'}

### Quiz 2
A weather analytics company reports the following forecast for a city:

- There is a $40$% chance of rain tomorrow.

- If it rains, the chance of thunder is $60$%.

- If it doesn't rain, the chance of thunder is just $5$% (e.g., due to dry thunderstorms).

Write a Python function `weather_thunder_probability(p_rain, p_thunder_given_rain, p_thunder_given_no_rain)` that calculates:

1. The probability of both rain and thunder using the multiplication rule:

$$
P(A \cap B) = P(A) \times P(B \mid A)
$$

Where:

- $(P(A \cap B))$: Probability of both A and B occurring  
- $(P(A))$: Probability of A  
- $(P(B \mid A))$: Probability of B given A

2. The probability of thunder overall, whether or not it rains:
Using the law of total probability:
$$
P(B) = P(A) \times P(B \mid A) + (1 - P(A)) \times P(B \mid A')
$$

Where:

- $(P(B))$: Overall probability of B  
- $(P(A)$): Probability of A  
- $(1 - P(A))$: Probability of A not occurring (i.e., \(P(A')\))  
- $(P(B \mid A))$: Probability of B given A  
- $(P(B \mid A'))$: Probability of B given not A

Return both results rounded to two decimal places.

---

### Solution 2: Multiplication Rule & Law of Total Probability

#### **1. The Concept: Total Probability**
This problem explores how to find the probability of a downstream event (Thunder) that depends on an upstream event (Rain). Because Thunder can happen in two distinct scenarios (during Rain OR during No Rain), we must calculate the probability of both scenarios separately and sum them up. This is the **Law of Total Probability**.

#### **2. The Formulas**
* **Multiplication Rule (for one path):** $P(A \text{ and } B) = P(A) \times P(B \mid A)$
* **Law of Total Probability (sum of paths):** $P(B) = P(A \cap B) + P(A^c \cap B)$

#### **3. Why is this the correct approach?**
The problem gives us conditional probabilities ($P(\text{Thunder} \mid \text{Rain})$). We cannot simply say "Thunder is 60%" because that 60% only applies *if* it rains. We must weight that 60% by the 40% chance that it actually rains.

#### **4. Probability Math**
**Given:**
* $P(Rain) = 0.40$
* $P(No Rain) = 0.60$
* $P(Thunder \mid Rain) = 0.60$
* $P(Thunder \mid No Rain) = 0.05$

**Step A: Intersection (Rain AND Thunder)**
$$
P(Rain \cap Thunder) = 0.40 \times 0.60 = 0.24
$$

**Step B: Intersection (No Rain AND Thunder)**
$$
P(No Rain \cap Thunder) = 0.60 \times 0.05 = 0.03
$$

**Step C: Total Probability (Overall Thunder)**
$$
P(Thunder) = 0.24 + 0.03 = 0.27
$$

In [None]:
def weather_thunder_probability(p_rain, p_thunder_given_rain, p_thunder_given_no_rain):
    # 1. Path 1: Rain AND Thunder
    p_rain_and_thunder = p_rain * p_thunder_given_rain

    # 2. Path 2: No Rain AND Thunder
    p_no_rain = 1 - p_rain
    p_no_rain_and_thunder = p_no_rain * p_thunder_given_no_rain

    # 3. Total Probability: Sum of both paths
    p_thunder_total = p_rain_and_thunder + p_no_rain_and_thunder

    return round(p_rain_and_thunder, 2), round(p_thunder_total, 2)

# Test
print(weather_thunder_probability(0.40, 0.60, 0.05))

### Quiz 3
You are working with a customer service analytics team that reviews user interactions logged as tuples of the form `(issue_type, channel)`. The company wants to analyse event relationships from this dataset:

```
interactions = [
    ('Billing', 'Email'), ('Technical', 'Phone'),
    ('Billing', 'Phone'), ('General', 'Chat'),
    ('Technical', 'Chat'), ('General', 'Phone'),
    ('Billing', 'Chat'), ('Technical', 'Email'),
    ('Billing', 'Phone'), ('General', 'Email')
]
```
The team is investigating whether there’s a meaningful relationship between:

- Users who report Billing issues

- Users who contact via the Phone channel

Write a function `check_billing_phone_overlap(data)` that:

- Determines whether any record matches both Billing issue and Phone channel

- Compares the actual overlap with the expected overlap (product of individual proportions)

- Returns whether:

  - There is any overlap (i.e., not disjoint)
$$
P(A \cap B) = 0
$$

This means:
- The occurrence of one event **excludes** the occurrence of the other.
- The events have **no overlap** in the sample space.

  - The overlap is what you’d expect if they were unrelated (i.e., independent)
$$
P(A \cap B) = P(A) \times P(B)
$$

This means:
- The probability of both events occurring equals the product of their individual probabilities.
- The events can **still overlap**, but their co-occurrence is purely by chance, not influence.

In [None]:
# Example interaction: (issue_type, channel)
interactions = [
    ('Billing', 'Email'), ('Technical', 'Phone'),
    ('Billing', 'Phone'), ('General', 'Chat'),
    ('Technical', 'Chat'), ('General', 'Phone'),
    ('Billing', 'Chat'), ('Technical', 'Email'),
    ('Billing', 'Phone'), ('General', 'Email')
]

---

### Solution 3: Disjoint vs. Independent Events

#### **1. The Concept: Independence**
This problem asks us to determine if the relationship between "Billing" issues and "Phone" usage is random or correlated.
* **Disjoint (Mutually Exclusive):** They never happen together ($P(A \cap B) = 0$).
* **Independent:** They happen together exactly as often as random chance predicts ($P(A \cap B) = P(A) \times P(B)$).
* **Dependent:** They happen together *more* or *less* often than random chance predicts.

#### **2. Why is this the correct approach?**
The problem asks us to compare the **actual** intersection (what really happened) with the **expected** intersection (what would happen if the variables were totally unconnected). This comparison is the standard test for statistical independence.

#### **3. Probability Math**
**Total Records:** 10

* **Step A: Marginal Probabilities**
    * Count(Billing) = 4 $\rightarrow P(Billing) = 0.4$
    * Count(Phone) = 3 $\rightarrow P(Phone) = 0.3$

* **Step B: Actual Joint Probability**
    * Count(Billing AND Phone) = 2
    * $P(Actual) = \frac{2}{10} = 0.2$

* **Step C: Expected Joint Probability (if Independent)**
    * $P(Expected) = P(Billing) \times P(Phone)$
    * $P(Expected) = 0.4 \times 0.3 = 0.12$

**Conclusion:**
Since $0.2 \neq 0$, they are **Not Disjoint**.
Since $0.2 \neq 0.12$, they are **Not Independent** (Dependent). Billing issues are *more likely* to result in a phone call than random chance would suggest.

In [None]:
def check_billing_phone_overlap(data):
    total = len(data)

    # 1. Marginal Counts
    p_billing = sum(1 for x in data if x[0] == 'Billing') / total
    p_phone = sum(1 for x in data if x[1] == 'Phone') / total

    # 2. Actual Joint Probability
    p_actual_joint = sum(1 for x in data if x[0] == 'Billing' and x[1] == 'Phone') / total

    # 3. Expected Joint Probability (Independence Benchmark)
    p_expected_joint = p_billing * p_phone

    print(f"Actual Joint Prob: {p_actual_joint}")
    print(f"Expected Joint Prob (if indep): {p_expected_joint:.2f}")

    return {
        'is_disjoint': p_actual_joint == 0,
        'is_independent': abs(p_actual_joint - p_expected_joint) < 0.001
    }

print(check_billing_phone_overlap(interactions))

### Quiz 4
A retail company is analysing its transaction data to understand the distribution of sales across different regions. Each transaction is stored as a tuple in the format:`(region, product_category)`

From a dataset of 10 recent transactions, the company wants to calculate the marginal probability that a randomly selected transaction came from the North region.

$$
P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}
$$

Where:

- $P(A)$: Probability of a transaction being from the "North" region

- Favorable outcomes: Transactions from "North"

- Total outcomes: All transactions in the dataset

Write a Python function `marginal_probability(data, condition_type, condition_value)` to calculate the marginal probability based on a given condition (e.g., region = `'North'`).

**Hint:**

Marginal probability is the probability of a single event occurring, irrespective of other variables.

In [None]:
transactions = [
    ('North', 'Electronics'), ('South', 'Clothing'),
    ('North', 'Clothing'), ('East', 'Electronics'),
    ('South', 'Electronics'), ('West', 'Clothing'),
    ('North', 'Clothing'), ('West', 'Electronics'),
    ('South', 'Electronics'), ('East', 'Clothing')
]

---

### Solution 4: Marginal Probability

#### **1. The Concept: Marginal Probability**
Marginal probability is the probability of an event occurring **irrespective of other variables**. Imagine a table with rows for "Regions" and columns for "Products". The totals are written in the **margins** of the table.

#### **2. The Formula**
$$
P(A) = \frac{\text{Count}(A)}{\text{Total Population}}
$$
In more formal terms, if we had multiple variables, we would sum the probabilities over all possible states of the other variables: $P(X) = \sum P(X, Y)$.

#### **3. Why is this a "Marginality" problem?**
The question asks for the probability of a transaction being from the **North** region. It **does not care** if the transaction was for Electronics or Clothing. It requires us to "marginalize out" (ignore) the product category and look only at the region total.

#### **4. Probability Math**
* **Total Outcomes ($N$):** 10 transactions.
* **Favorable Outcomes ($A$):** Transactions where Region == 'North'.
    1. ('North', 'Electronics')
    2. ('North', 'Clothing')
    3. ('North', 'Clothing')
    * Count = 3

**Calculation:**
$$
P(North) = \frac{3}{10} = 0.3
$$

In [None]:
def marginal_probability(data, index, value):
    total = len(data)
    # We sum up the count irrespective of the other items in the tuple
    count = sum(1 for item in data if item[index] == value)
    return count / total

# Index 0 is Region, Value is 'North'
print(f"Marginal Probability (North): {marginal_probability(transactions, 0, 'North')}")

### Quiz 5

A hospital's data team is analysing patient visits to understand how different age groups use various services. Each patient record is stored as a tuple: `(age_group, visit_type)`

They want to calculate the joint probability that a randomly selected patient is both a Senior and visited the Emergency department.

$$
P(A \cap B) = \frac{\text{Number of outcomes where both A and B occur}}{\text{Total number of outcomes}}
$$

Where:

- $P(A\cap B)$: Probability that both conditions (Senior and Emergency) are satisfied

- **Numerator**: Number of such records

- **Denominator**: Total number of records

Write a Python function `joint_probability(data, value1, value2)` to compute the joint probability from the dataset of `(age_group, visit_type)`.



In [None]:
# Hospital visit records: (age_group, visit_type)
hospital_data = [
    ('Senior', 'Emergency'), ('Adult', 'Routine'),
    ('Adult', 'Emergency'), ('Child', 'Routine'),
    ('Senior', 'Routine'), ('Adult', 'Emergency'),
    ('Child', 'Emergency'), ('Senior', 'Emergency')
]

---

### Solution 5: Joint Probability

#### **1. The Concept: Joint Probability**
Joint probability measures the likelihood of two events happening **at the same time** (simultaneously). In set theory, this corresponds to the **Intersection** ($A \cap B$).

#### **2. The Formula**
$$
P(A \cap B) = \frac{\text{Count}(A \text{ AND } B)}{\text{Total Population}}
$$

#### **3. Why is this a Joint Probability problem?**
The question uses the keyword **"Both"** and asks for "Senior" AND "Emergency". It is not asking "Given that they are a Senior..." (Conditional) nor is it asking "What is the chance of being a Senior regardless of visit type" (Marginal). It requires the specific combination of both attributes.

#### **4. Probability Math**
* **Total Records:** 8
* **Condition:** Age='Senior' AND Visit='Emergency'.
* **Scanning Data:**
    1. ('Senior', 'Emergency') $\rightarrow$ Yes
    2. ('Adult', 'Routine') $\rightarrow$ No
    3. ('Adult', 'Emergency') $\rightarrow$ No
    4. ('Child', 'Routine') $\rightarrow$ No
    5. ('Senior', 'Routine') $\rightarrow$ No (Matches Senior, but not Emergency)
    6. ('Adult', 'Emergency') $\rightarrow$ No
    7. ('Child', 'Emergency') $\rightarrow$ No
    8. ('Senior', 'Emergency') $\rightarrow$ Yes
    * **Favorable Count:** 2

**Calculation:**
$$
P(Senior \cap Emergency) = \frac{2}{8} = 0.25
$$

In [None]:
def joint_probability(data, val1, val2):
    total = len(data)
    # Count where both conditions in the tuple match
    count = sum(1 for x in data if x[0] == val1 and x[1] == val2)
    return count / total

print(f"Joint Probability: {joint_probability(hospital_data, 'Senior', 'Emergency')}")

### Quiz 6
An academic analytics team is studying student performance based on gender. The dataset stores each student record as a tuple: `(gender, passed)`

Where `passed = True` means the student passed, and `False` means the student failed.

The team wants to compute the probability that a student passed given that the student is female.

$$
P(B \mid A) = \frac{P(A \cap B)}{P(A)}
$$
Where:

- $P(B\mid A)$: Probability of event B occurring given that event A has occurred

- $P(A\cap B)$: Joint probability of both A and B occurring

- $P(A)$: Marginal probability of event A

Write a Python function `conditional_probability(data, given_key, given_value, target_key, target_value)` that calculates conditional probabilities for any given condition.

Use it to compute the probability that a student passed given that the student is female.

In [None]:
# Student data: (gender, passed)
student_data = [
    ('Male', True), ('Female', False), ('Female', True),
    ('Male', True), ('Female', True), ('Male', False),
    ('Female', True), ('Male', True)
]

---

### Solution 6: Conditional Probability

#### **1. The Concept: Reducing the Sample Space**
Conditional probability asks: "If we restrict our view to *only* a specific subset of the population (e.g., Females), what is the probability of the event (Passed)?"
Key Phrase: **"Given that..."**

#### **2. The Formula**
$$
P(B \mid A) = \frac{\text{Count}(A \cap B)}{\text{Count}(A)}
$$
Note that the denominator is no longer the Total Population ($N$). It is now the count of the sub-group ($A$).

#### **3. Why is this Conditional?**
The question does not ask for the probability that a random student passed. It asks for the probability that a student passed **given** they are female. We must ignore all Male students for this calculation.

#### **4. Probability Math**
* **Identify the Condition ($A$):** Gender = 'Female'.
    * Records matching 'Female': 4
    * *Denominator = 4*
* **Identify the Target within Condition ($A \cap B$):** 'Female' AND 'Passed'.
    * Records matching both: 3
    * *Numerator = 3*

**Calculation:**
$$
P(Passed \mid Female) = \frac{3}{4} = 0.75
$$

In [None]:
def conditional_probability(data, given_idx, given_val, target_idx, target_val):
    # 1. Filter: Reduce the universe to only the 'Given' condition
    subset = [x for x in data if x[given_idx] == given_val]

    # Safety check for division by zero
    if not subset: return 0

    # 2. Count: How many in this subset match the target?
    target_count = sum(1 for x in subset if x[target_idx] == target_val)

    # 3. Calculate Ratio
    return target_count / len(subset)

# Index 0 is Gender (Given 'Female'), Index 1 is Passed (Target True)
print(f"P(Passed | Female): {conditional_probability(student_data, 0, 'Female', 1, True)}")

### Quiz 7
A quality control inspector at a manufacturing plant selects two items from a batch of 100 products for inspection. The batch contains:

1. 10 defective items

2. 90 non-defective items

- The inspector draws one item at random, does not replace it, and then draws a second item.

- The team wants to compute the probability that the first item is defective and the second is non-defective.

$$P(A\cap B)=P(A)\times P(B\mid A)$$

Where:

- $P(A)$: Probability that the first item is defective

- $P(B\mid A)$: Probability that the second item is non-defective, given the first was defective

- $P(A\cap B)$: Probability that the first item is defective and the second is non-defective

Write a function `compound_event_probability()` to calculate the compound probability using the given conditions (without replacement).

---

### Solution 7: Compound Probability (Dependent Events)

#### **1. The Concept: Sampling Without Replacement**
This problem deals with **Dependent Events**. When you draw a card from a deck (or a product from a batch) and *do not put it back*, the probabilities for the next draw change. The denominator decreases, and the numerator might change depending on what was picked first.

#### **2. The Formula**
$$
P(First \cap Second) = P(First) \times P(Second \mid First)
$$

#### **3. Why is this Dependent?**
Since the inspector "does not replace" the first item, the total number of items drops from 100 to 99 for the second draw. This means the second event's probability depends entirely on the outcome of the first event.

#### **4. Probability Math**
* **Start:** 10 Defective, 90 Non-Defective. Total = 100.

* **Event A (First is Defective):**
    * $P(A) = \frac{10}{100} = 0.1$

* **Event B (Second is Non-Defective, GIVEN First was Defective):**
    * Items remaining: 99.
    * Non-defective items remaining: 90 (since we picked a defective one first).
    * $P(B \mid A) = \frac{90}{99} \approx 0.909$

* **Compound Probability:**
    $$
    P(A \cap B) = \frac{10}{100} \times \frac{90}{99} = \frac{1}{10} \times \frac{10}{11} = \frac{1}{11}
    $$
    $$
    P(A \cap B) \approx 0.0909
    $$

In [None]:
def compound_event_probability(total, defective, non_defective):
    # 1. P(A): Probability first is defective
    p_first_defective = defective / total

    # 2. P(B | A): Probability second is non-defective given first was defective
    # Total reduced by 1 because of 'without replacement'
    total_remaining = total - 1
    # Non-defectives count is unchanged
    p_second_non_defective = non_defective / total_remaining

    # 3. Multiply
    return p_first_defective * p_second_non_defective

print(f"Compound Probability: {compound_event_probability(100, 10, 90):.4f}")

### Quiz 8
A healthcare team is analysing responses from a health behavior survey. Each entry includes a person’s gender and whether they are a smoker.

From the data, they want to understand patterns in smoking behavior and answer the following:

- What proportion of the survey participants are smokers?

- What proportion of the participants are female and also smokers?

- Among female participants, what proportion are smokers?

1. Probability of a Specific Condition

$$
P(A) = \frac{\text{Number of individuals with A}}{\text{Total number of individuals}}
$$

2. Probability of Two Conditions Happening Together

$$
P(A \cap B) = \frac{\text{Number of individuals with both A and B}}{\text{Total number of individuals}}
$$

3. Probability of One Condition Among a Group

$$
P(B \mid A) = \frac{P(A \cap B)}{P(A)} = \frac{\text{Number with both A and B}}{\text{Number with A}}
$$

Write three Python functions and round the results to $2$ decimal places:

- `get_marginal(data, key, value)`

- `get_joint(data, key1, val1, key2, val2)`

- `get_conditional(data, given_key, given_val, target_key, target_val)`

In [None]:
# Survey dataset
survey_data = [
    {'gender': 'Male', 'smoker': True},
    {'gender': 'Female', 'smoker': False},
    {'gender': 'Female', 'smoker': True},
    {'gender': 'Male', 'smoker': True},
    {'gender': 'Female', 'smoker': True}
]

---

### Solution 8: Review of Probability Types

#### **1. The Concepts**
This quiz reviews the three pillars of basic probability:
1.  **Marginal:** One event, Total Population. (e.g., "How many Smokers?")
2.  **Joint:** Two events, Total Population. (e.g., "How many Female Smokers?")
3.  **Conditional:** One event, Sub-group Population. (e.g., "If Female, is she a smoker?")

#### **2. Probability Math**
**Data:**
1. (M, T)
2. (F, F)
3. (F, T)
4. (M, T)
5. (F, T)
Total $N = 5$.

* **Part 1: Marginal (Smoker = True)**
    * Count(True) = 4 (Rows 1, 3, 4, 5)
    * $P(S) = \frac{4}{5} = 0.80$

* **Part 2: Joint (Female AND Smoker)**
    * Count(F and T) = 2 (Rows 3, 5)
    * $P(F \cap S) = \frac{2}{5} = 0.40$

* **Part 3: Conditional (Smoker | Female)**
    * Condition: Female. Subset = {Row 2, Row 3, Row 5}. Count = 3.
    * Target in Subset: Smoker. Count = 2 (Rows 3, 5).
    * $P(S \mid F) = \frac{2}{3} \approx 0.67$

In [None]:
def get_marginal(data, key, value):
    count = sum(1 for x in data if x[key] == value)
    return round(count / len(data), 2)

def get_joint(data, key1, val1, key2, val2):
    count = sum(1 for x in data if x[key1] == val1 and x[key2] == val2)
    return round(count / len(data), 2)

def get_conditional(data, given_key, given_val, target_key, target_val):
    subset = [x for x in data if x[given_key] == given_val]
    if not subset: return 0.0
    count = sum(1 for x in subset if x[target_key] == target_val)
    return round(count / len(subset), 2)

print(f"Marginal: {get_marginal(survey_data, 'smoker', True)}")
print(f"Joint: {get_joint(survey_data, 'gender', 'Female', 'smoker', True)}")
print(f"Conditional: {get_conditional(survey_data, 'gender', 'Female', 'smoker', True)}")