🧪 Chi-Square Test for Independence: Gender vs Product Preference
This project explores whether there is a relationship between a customer's gender and their preferred product (Product A or B) using a statistical method called the Chi-Square Test for Independence.
________________________________________
📌 Objective
To determine if product preference depends on gender. If a significant relationship is found, it can help businesses improve their marketing strategies by targeting the right audience more effectively.
________________________________________
📁 Dataset Description
The dataset includes:
•	Gender: The gender of the customer (Male or Female)
•	Preferred Product: The product they chose (Product A or Product B)
There are 12 customers in the sample.


In [5]:
# %load test1.py
# Step 1: Import necessary libraries
import pandas as pd
import numpy as np
from scipy.stats import chi2

In [13]:
# Step 2: Load the dataset
df = pd.read_csv("product.csv")
df.head()

Unnamed: 0,Customer ID,Gender,Preferred Product
0,1,Male,Product A
1,2,Male,Product A
2,3,Male,Product B
3,4,Male,Product A
4,5,Male,Product B


In [17]:
# Step 3: Create a contingency table
contingency_table = pd.crosstab(df["Gender"], df["Preferred Product"]) #shows how many people fall into each category combination.
contingency_table

Preferred Product,Product A,Product B
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,2,4
Male,4,2


In [29]:
# Step 4: Calculate row and column totals
row_totals = contingency_table.sum(axis=1)
col_totals = contingency_table.sum(axis=0)
grand_total = contingency_table.values.sum()
print(row_totals)
print(col_totals)
print(grand_total)

Gender
Female    6
Male      6
dtype: int64
Preferred Product
Product A    6
Product B    6
dtype: int64
12


In [33]:
# Step 5: Compute expected frequencies manually
expected_frequencies = np.outer(row_totals, col_totals) / grand_total
expected_frequencies

array([[3., 3.],
       [3., 3.]])

In [37]:
# Step 6: Compute the Chi-Square statistic manually
chi_square_statistic = ((contingency_table - expected_frequencies) ** 2 / expected_frequencies).sum().sum()
chi_square_statistic

1.3333333333333333

In [41]:
# Step 7: Compute p-value manually
dof = (contingency_table.shape[0] - 1) * (contingency_table.shape[1] - 1)
p_value = 1 - chi2.cdf(chi_square_statistic, df=dof)
p_value

0.24821307898992362

In [43]:
# Step 8: Display observed and expected frequencies
print("Observed Frequency Table:")
print(contingency_table)

print("\nExpected Frequencies (Calculated Manually):")
expected_df = pd.DataFrame(expected_frequencies, columns=contingency_table.columns, index=contingency_table.index)
print(expected_df)

# Step 9: Display test statistic
print("\nManual Chi-square Statistic:", round(chi_square_statistic, 2))
print("P-value (Calculated Manually):", round(p_value, 4))
print("Degrees of Freedom:", dof)

Observed Frequency Table:
Preferred Product  Product A  Product B
Gender                                 
Female                     2          4
Male                       4          2

Expected Frequencies (Calculated Manually):
Preferred Product  Product A  Product B
Gender                                 
Female                   3.0        3.0
Male                     3.0        3.0

Manual Chi-square Statistic: 1.33
P-value (Calculated Manually): 0.2482
Degrees of Freedom: 1


In [45]:
# Step 10: Interpretation
alpha = 0.05
if p_value < alpha:
    print("\nWe reject the null hypothesis. Gender influences product preference.")
else:
    print("\nWe fail to reject the null hypothesis. Gender does not significantly influence product preference.")



We fail to reject the null hypothesis. Gender does not significantly influence product preference.
