# 4. Business Problem
Telecall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and must be reworked before processing. The manager wants to check whether the defective % varies by center. Please analyze the data at 5% significance level and help the manager draw appropriate inferences
File: Customer OrderForm.csv


# Business Objective:
 Determine whether the defect rate is significantly different across the 4 centers.



# Constraints:
1. Data should be categorical (Defective/Non-Defective).
2. Expected frequency should be ≥ 5 in each category for the Chi-Square Test to be valid.


In [2]:
import pandas as pd
import numpy as np
import scipy 
from scipy import stats
#provides statistical functions
#stats contains a variety of statstical tests
from statsmodels.stats import descriptivestats as sd
#provides descriptive stastics tools, including the sign test
from statsmodels.stats.weightstats import ztest
#Used for conducting z-tests on datasets.
import pylab


In [5]:
data=pd.read_csv("CustomerOrderForm.csv")
data.head()

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free


In [6]:
data.describe()

Unnamed: 0,Phillippines,Indonesia,Malta,India
count,300,300,300,300
unique,2,2,2,2
top,Error Free,Error Free,Error Free,Error Free
freq,271,267,269,280


In [10]:
# Convert categorical data into numerical counts
table = data.apply(pd.Series.value_counts)  # Creates a contingency table
print(table)

            Phillippines  Indonesia  Malta  India
Error Free           271        267    269    280
Defective             29         33     31     20


In [11]:
data.head()

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free


In [12]:
# Perform Chi-Square Test
chi2_stat, p_value, dof, expected = stats.chi2_contingency(table)

# Print Results
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-value: {p_value}")
print(f"Degrees of Freedom: {dof}")
print("Expected Frequencies Table:")
print(expected)


Chi-Square Statistic: 3.858960685820355
P-value: 0.2771020991233135
Degrees of Freedom: 3
Expected Frequencies Table:
[[271.75 271.75 271.75 271.75]
 [ 28.25  28.25  28.25  28.25]]


# Interpretation: 
#H0: no significant difference in the male-to-female buyer ratio across different regions.
#H1: significant difference in the male-to-female buyer ratio across different regions.

p-value>0.05  ==> fails to reject the H0(Accept the H0)

conclusion: 

There is no significant difference in the male-to-female buyer ratio across different regions.
