3).Sales of products in four different regions is tabulated for males and females. Find if male-female buyer rations are similar across regions.

    East West North South

Males	50	142	131	70

Females	550	351	480	350


# 1.Business Problem

### 1.1 Objective
To analyze and determine whether there is a significant difference in the male-female buyer ratios across four regions (East, West, North, South) to guide business decisions and marketing strategies.

### 1.2 Constraints
Ensure data accuracy and integrity

Interpret results at a 5% significance level (α = 0.05)

Handle missing or inconsistent data appropriately

Maintain computational efficiency for large datasets

In [26]:
#import necessory libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import chi2_contingency

In [4]:
#Load the Dataset
BuyerRatio = pd.read_csv("BuyerRatio.csv")
BuyerRatio

Unnamed: 0,Observed Values,East,West,North,South
0,Males,50,142,131,70
1,Females,435,1523,1356,750


# 2.Data Pre-processing

In [5]:
## Display basic information about the dataset
BuyerRatio.head()

Unnamed: 0,Observed Values,East,West,North,South
0,Males,50,142,131,70
1,Females,435,1523,1356,750


In [6]:
BuyerRatio.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Observed Values  2 non-null      object
 1   East             2 non-null      int64 
 2   West             2 non-null      int64 
 3   North            2 non-null      int64 
 4   South            2 non-null      int64 
dtypes: int64(4), object(1)
memory usage: 212.0+ bytes


In [7]:
# Check for missing values
BuyerRatio.isnull().sum()

Observed Values    0
East               0
West               0
North              0
South              0
dtype: int64

In [8]:
# Ensure correct data types
BuyerRatio.dtypes

Observed Values    object
East                int64
West                int64
North               int64
South               int64
dtype: object

In [9]:
# Validate data consistency
{col: BuyerRatio[col].unique() for col in BuyerRatio.columns}

{'Observed Values': array(['Males', 'Females'], dtype=object),
 'East': array([ 50, 435], dtype=int64),
 'West': array([ 142, 1523], dtype=int64),
 'North': array([ 131, 1356], dtype=int64),
 'South': array([ 70, 750], dtype=int64)}

In [12]:
# Handle missing values (if any)
if BuyerRatio.isnull().sum().sum() > 0:
    BuyerRatio.fillna(0, inplace=True)
BuyerRatio

Unnamed: 0,Observed Values,East,West,North,South
0,Males,50,142,131,70
1,Females,435,1523,1356,750


In [11]:
# Check for duplicate rows
print("Duplicate Rows:", BuyerRatio.duplicated().sum())

Duplicate Rows: 0


In [16]:
#Drop duplicate rows if found
BuyerRatio.drop_duplicates(inplace=True)
BuyerRatio

Unnamed: 0,observed_values,east,west,north,south
0,Males,50,142,131,70
1,Females,435,1523,1356,750


In [17]:
# Ensure column names are standardized
BuyerRatio.columns = [col.strip().lower().replace(" ", "_") for col in BuyerRatio.columns]
BuyerRatio

Unnamed: 0,observed_values,east,west,north,south
0,Males,50,142,131,70
1,Females,435,1523,1356,750


In [18]:
# Verify data after cleaning
BuyerRatio.head()

Unnamed: 0,observed_values,east,west,north,south
0,Males,50,142,131,70
1,Females,435,1523,1356,750


# 3.Model Building

In [23]:
# 3.1 Partition the dataset
observed = BuyerRatio.iloc[:, 1:].values
observed

array([[  50,  142,  131,   70],
       [ 435, 1523, 1356,  750]], dtype=int64)

In [24]:
# 3.2 Model(s) - Chi-square test chosen for categorical data comparison


In [27]:
# Perform the chi-square test
chi2_stat, p_value, dof, expected = chi2_contingency(observed)


In [29]:
# 3.4 Model Evaluation
# Display results
print("Chi-square Statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:")
print(expected)


Chi-square Statistic: 1.595945538661058
p-value: 0.6603094907091882
Degrees of Freedom: 3
Expected Frequencies:
[[  42.76531299  146.81287862  131.11756787   72.30424052]
 [ 442.23468701 1518.18712138 1355.88243213  747.69575948]]


In [30]:
# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the Null Hypothesis: Male-female buyer ratios are NOT similar across regions.")
else:
    print("Fail to Reject the Null Hypothesis: Male-female buyer ratios are similar across regions.")

Fail to Reject the Null Hypothesis: Male-female buyer ratios are similar across regions.


# 4. Result - Business Impact

- Helps in understanding regional differences in gender-based buying patterns
- Supports targeted marketing strategies
- Provides insights for inventory planning and customer segmentation
