5.)	Fantaloons Sales managers commented that % of males versus females walking into the store differ based on day of the week. Analyze the data and determine whether there is evidence at 5 % significance level to support this hypothesis.

### 1.1 Objective
To determine whether the percentage of males versus females walking into the store differs based on the day of the week.

### 1.2 Constraints
Analyze the data at a 5% significance level (α = 0.05)

Ensure accurate data preprocessing and cleaning

Handle missing values and duplicates appropriately

Ensure interpretability of statistical results


In [2]:
# Import necessary libraries
import pandas as pd
from scipy.stats import chi2_contingency

# 2. Data Pre-processing

In [15]:
#Load Dataset
Fantaloons = pd.read_csv("Fantaloons.csv")
Fantaloons

Unnamed: 0,Weekdays,Weekend
0,Male,Female
1,Female,Male
2,Female,Male
3,Male,Female
4,Female,Female
...,...,...
420,,
421,,
422,,
423,,


In [16]:
# Display basic information about the dataset
Fantaloons.head()

Unnamed: 0,Weekdays,Weekend
0,Male,Female
1,Female,Male
2,Female,Male
3,Male,Female
4,Female,Female


In [17]:
Fantaloons.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 425 entries, 0 to 424
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Weekdays  400 non-null    object
 1   Weekend   400 non-null    object
dtypes: object(2)
memory usage: 6.8+ KB


In [18]:
# Check for missing values
Fantaloons.isnull().sum()

Weekdays    25
Weekend     25
dtype: int64

In [19]:
# Handle missing values (if any)
if Fantaloons.isnull().sum().sum() > 0:
    Fantaloons.fillna('Unknown', inplace=True)


In [20]:
# Verify no missing values remain
Fantaloons.isnull().sum()

Weekdays    0
Weekend     0
dtype: int64

In [21]:
# Check for duplicate rows and remove them
print("Duplicate Rows Before Removal:", Fantaloons.duplicated().sum())
Fantaloons.drop_duplicates(inplace=True)
print("Duplicate Rows After Removal:", Fantaloons.duplicated().sum())

Duplicate Rows Before Removal: 420
Duplicate Rows After Removal: 0


In [22]:
# Standardize column names
Fantaloons.columns = [col.strip().lower().replace(" ", "_") for col in Fantaloons.columns]

In [23]:
# Display cleaned data
print("Cleaned Data Head:\n", Fantaloons.head())

Cleaned Data Head:
     weekdays  weekend
0       Male   Female
1     Female     Male
4     Female   Female
17      Male     Male
400  Unknown  Unknown


In [33]:
# Verify column names
print("Column Names:", Fantaloons.columns)

Column Names: Index(['weekdays', 'weekend'], dtype='object')


In [34]:
# Print all column names to verify
print("Column Names in Dataset:", Fantaloons.columns.tolist())

# Display a few rows to inspect the dataset structure
print(Fantaloons.head())

Column Names in Dataset: ['weekdays', 'weekend']
    weekdays  weekend
0       Male   Female
1     Female     Male
4     Female   Female
17      Male     Male
400  Unknown  Unknown


In [36]:
# Display unique values for each column to identify gender columns
for col in Fantaloons.columns:
    print(f"Unique values in '{col}':", Fantaloons[col].unique())



Unique values in 'weekdays': ['Male' 'Female' 'Unknown']
Unique values in 'weekend': ['Female' 'Male' 'Unknown']


In [37]:
# Identify gender columns dynamically
gender_columns = Fantaloons.columns[:2]  # Assuming the first two columns represent gender
print("Assumed Gender Columns:", gender_columns)

Assumed Gender Columns: Index(['weekdays', 'weekend'], dtype='object')


In [38]:
# Convert gender data to numeric values (1 for Male, 0 for Female)
for col in gender_columns:
    Fantaloons[col] = Fantaloons[col].str.strip().str.lower().map({'male': 1, 'female': 0})

In [39]:
# Check conversion success
for col in gender_columns:
    print(f"Converted {col} Values:", Fantaloons[col].unique())

Converted weekdays Values: [ 1.  0. nan]
Converted weekend Values: [ 0.  1. nan]


In [42]:
# Create a contingency table
contingency_table = pd.DataFrame({
    'Weekdays': Fantaloons[gender_columns[0]].value_counts(),
    'Weekend': Fantaloons[gender_columns[1]].value_counts()
}).T
contingency_table = contingency_table.fillna(0).astype(int)
print("Contingency Table:\n", contingency_table)

Contingency Table:
           0.0  1.0
Weekdays    2    2
Weekend     2    2


In [43]:
# Perform Chi-square test
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)

In [44]:
# Display results
print("Chi-square Statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)

Chi-square Statistic: 0.0
p-value: 1.0
Degrees of Freedom: 1
Expected Frequencies:
 [[2. 2.]
 [2. 2.]]


In [45]:
# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the Null Hypothesis: The percentage of males vs females varies based on the day of the week.")
else:
    print("Fail to Reject the Null Hypothesis: No significant difference in male vs female percentage across days of the week.")

Fail to Reject the Null Hypothesis: No significant difference in male vs female percentage across days of the week.


# 4.Result - Business Impact

If p-value < 0.05 (Significant Difference Found):

Targeted Marketing: Design gender-specific promotions for different days.

Optimized Staffing: Adjust staff based on gender traffic patterns.

If p-value ≥ 0.05 (No Significant Difference Found):

Uniform Strategy: Implement consistent marketing and staffing across all days.