In [1]:
import pandas as pd
import numpy as np

### CASE STUDY 1: Customer Purchases

> Concept: Basic Probability, Conditional Probability, Independence

> Scenario: You work for an e-commerce company. You have data on customers’ gender, whether they saw an ad, and whether they made a purchase.

> **Practice Questions:**

**1. Easy:** Find the probability a customer made a purchase.

**2. Intermediate:** Find $P(\text{Purchased} \mid \text{Saw Ad})$.

**3. Hard:** Check if ‘seeing the ad’ and ‘purchasing’ are independent events.

**4. Advanced:** What is the probability that a randomly selected customer is female and made a purchase, given that she saw the ad?  
*Express as* $P(\text{Female} \cap \text{Purchased} \mid \text{Saw Ad})$.

**5. Advanced:** Among customers who did **not** see the ad, what is the probability that they made a purchase?  
*Express as* $P(\text{Purchased} \mid \text{Did Not See Ad})$.

**6. Challenge:** What is the odds ratio of purchasing for customers who saw the ad versus those who did not see the ad?  
*Hint: Odds ratio = (P(Purchased|Saw Ad) / (1 - P(Purchased|Saw Ad))) $\div$ (P(Purchased|Did Not See Ad) / (1 - P(Purchased|Did Not See Ad)))*

**7. Challenge:** If two customers are selected at random (with replacement), what is the probability that both are male and both made a purchase?

**8. Expert:** Using a chi-squared test, determine if there is a statistically significant association between seeing the ad and making a purchase.

In [2]:
customer_purchase_data = pd.read_excel("C:/Users/leste/Downloads/Probability_Case_Studies.xlsx", sheet_name="Customer_Purchases")

In [3]:
customer_purchase_data

Unnamed: 0,Customer_ID,Gender,Saw_Ad,Purchased
0,1,Male,Yes,Yes
1,2,Female,No,No
2,3,Male,Yes,No
3,4,Male,Yes,No
4,5,Male,No,No
...,...,...,...,...
195,196,Female,Yes,No
196,197,Female,No,No
197,198,Female,Yes,Yes
198,199,Male,No,No


> Solution 1

In [4]:
customer_purchase_data['Purchased'].value_counts()

No     141
Yes     59
Name: Purchased, dtype: int64

In [5]:
Customer_purchased = customer_purchase_data.query('Purchased == "Yes"')['Purchased'].count()
Customer_purchased

59

In [6]:
Total_customer_count = customer_purchase_data['Customer_ID'].count()
Total_customer_count

200

In [7]:
Probability_of_purchase = Customer_purchased / Total_customer_count
Probability_of_purchase

0.295

> Solution 2

In [8]:
Customer_saw_ad_purchase = customer_purchase_data.query('Saw_Ad == "Yes" and Purchased == "Yes"')['Purchased'].count()
Customer_saw_ad_purchase

37

In [9]:
Customer_saw_ad = customer_purchase_data.query('Saw_Ad == "Yes"')['Saw_Ad'].count()
Customer_saw_ad

114

In [10]:
Probability_of_saw_ad_purchase = Customer_saw_ad_purchase / Customer_saw_ad
Probability_of_saw_ad_purchase

0.32456140350877194

> Solution 3 

1. Checking the product of the probabilities

2. Checking the probability of the events happening together

> If 1 & 2 then the events are independant

In [11]:
Probability_of_saw_ad = Customer_saw_ad / Total_customer_count
Probability_of_saw_ad

0.57

In [12]:
Probability_of_purchase

0.295

In [13]:
Probability_of_saw_ad_purchase

0.32456140350877194

> Both are not independant events

>  Probability of seeing of Ad is not equal to the Probability of seeing of Ad provided he purchased
which means the ad has likely influenced the purchase

"""
ODDS RATIO — CONCEPTUAL LOGIC

The odds ratio (OR) measures how strongly an exposure or condition 
is associated with an outcome.

It compares the odds of the outcome occurring in one group 
to the odds of it occurring in another group.

------------------------------------------------------------
General 2x2 setup:

                Outcome = Yes     Outcome = No
Group 1 (Exposed)        a               b
Group 2 (Unexposed)      c               d

------------------------------------------------------------
Step 1: Compute odds for each group

    Odds(Group 1) = a / b
    Odds(Group 2) = c / d

Step 2: Compute the odds ratio

    OR = (a/b) / (c/d)
       = (a * d) / (b * c)

------------------------------------------------------------
Interpretation:

- OR = 1  → No association between exposure and outcome
- OR > 1  → Exposure increases the odds of the outcome
- OR < 1  → Exposure decreases the odds of the outcome

------------------------------------------------------------
Key Logic:

1. The "odds" quantify how likely an event is to happen versus not happen
   within each group.

2. The "odds ratio" compares these odds across two groups to determine 
   the strength and direction of association.

3. One condition (the outcome) is held constant, while comparing 
   how group membership affects the odds of that outcome.
"""


> Solution 4 

In [32]:
Female_and_purchased_also_ad = customer_purchase_data.query('Purchased == "Yes" and Gender == "Female" and Saw_Ad == "Yes"')['Purchased'].count()
Female_and_purchased_also_ad

17

In [33]:
People_didnt_see_ad = customer_purchase_data.query('Saw_Ad == "No"')['Saw_Ad'].count()
People_didnt_see_ad

86

In [34]:
# Probability of Female and Purchased
Female_and_purchased_also_ad / (Total_customer_count - People_didnt_see_ad)

0.14912280701754385

> Solution 5

In [35]:
Purchased_and_didnt_see_ad = customer_purchase_data.query('Purchased == "Yes" and Saw_Ad == "No"')['Purchased'].count()
Purchased_and_didnt_see_ad

22

In [30]:
People_saw_ad = customer_purchase_data.query('Saw_Ad == "Yes"')['Saw_Ad'].count()
People_saw_ad

114

In [31]:
#Probability of Purchased given didn't see ad (So the people who didn't see ad is a condition that needs to be applied)
Purchased_and_didnt_see_ad / (Total_customer_count - People_saw_ad)

0.2558139534883721