4.)	Telecall uses 4 centers around the globe to process customer order forms. They audit a certain % of the customer order forms. Any error in order form renders it defective and must be reworked before processing. The manager wants to check whether the defective % varies by center. Please analyze the data at 5% significance level and help the manager draw appropriate inferences

# 1.Business Problem

### 1.1 Objective
To determine whether the defective percentage of customer order forms varies across the four telecall centers globally.


### 1.2 Constraints
Ensure data accuracy and consistency

Analyze at a 5% significance level (α = 0.05)

Handle missing or inconsistent data appropriately

Ensure interpretability of statistical results

In [1]:
# Import necessary libraries
import pandas as pd
from scipy.stats import chi2_contingency


In [4]:
#Load the Dataset
CustomerOrderform = pd.read_csv("CustomerOrderform.csv")
CustomerOrderform

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free
...,...,...,...,...
310,,,,
311,,,,
312,,,,
313,,,,


# 2. Data Pre-processing



In [5]:
# Display basic information about the dataset
CustomerOrderform.head()

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free


In [6]:
CustomerOrderform.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 315 entries, 0 to 314
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Phillippines  300 non-null    object
 1   Indonesia     300 non-null    object
 2   Malta         300 non-null    object
 3   India         300 non-null    object
dtypes: object(4)
memory usage: 10.0+ KB


In [8]:
# Check for missing values
CustomerOrderform.isnull().sum()


Phillippines    15
Indonesia       15
Malta           15
India           15
dtype: int64

In [11]:
# Ensure correct data types
CustomerOrderform.dtypes

Phillippines    object
Indonesia       object
Malta           object
India           object
dtype: object

In [12]:
# Validate data consistency
{col: CustomerOrderform[col].unique() for col in CustomerOrderform.columns}

{'Phillippines': array(['Error Free', 'Defective', nan], dtype=object),
 'Indonesia': array(['Error Free', 'Defective', nan], dtype=object),
 'Malta': array(['Defective', 'Error Free', nan], dtype=object),
 'India': array(['Error Free', 'Defective', nan], dtype=object)}

In [13]:
# Handle missing values (if any)
if CustomerOrderform.isnull().sum().sum() > 0:
    CustomerOrderform.fillna('Unknown', inplace=True)

In [16]:
# Verify no missing values remain
CustomerOrderform.isnull().sum()

Phillippines    0
Indonesia       0
Malta           0
India           0
dtype: int64

In [17]:
# Check for duplicate rows
CustomerOrderform.duplicated().sum()


304

In [18]:
# Drop duplicate rows if found
CustomerOrderform.drop_duplicates(inplace=True)

In [19]:
## Verify duplicates are removed
CustomerOrderform.duplicated().sum()

0

In [20]:
# Ensure column names are standardized
CustomerOrderform.columns = [col.strip().lower().replace(" ", "_") for col in CustomerOrderform.columns]

In [21]:
# Verify data after cleaning
print("Cleaned Data Head:\n", CustomerOrderform.head())

Cleaned Data Head:
   phillippines   indonesia       malta       india
0   Error Free  Error Free   Defective  Error Free
1   Error Free  Error Free  Error Free   Defective
2   Error Free   Defective   Defective  Error Free
3   Error Free  Error Free  Error Free  Error Free
6   Error Free   Defective  Error Free  Error Free


In [22]:
# Convert defect status to numeric values (1 for defective, 0 for not defective)
def map_defects(value):
    return 1 if value.strip().lower() == 'defective' else 0

CustomerOrderform = CustomerOrderform.applymap(map_defects)


In [24]:
# Verify conversion
CustomerOrderform.head()

Unnamed: 0,phillippines,indonesia,malta,india
0,0,0,1,0
1,0,0,0,1
2,0,1,1,0
3,0,0,0,0
6,0,1,0,0


In [25]:
print("Unique Values After Conversion:\n", {col: CustomerOrderform[col].unique() for col in CustomerOrderform.columns})

Unique Values After Conversion:
 {'phillippines': array([0, 1], dtype=int64), 'indonesia': array([0, 1], dtype=int64), 'malta': array([1, 0], dtype=int64), 'india': array([0, 1], dtype=int64)}


# 3. Model Building

In [32]:
# 3.1 Partition the dataset
# Sum defectives and non-defectives for each center
observed = pd.DataFrame({
    'Defective': CustomerOrderform.sum(axis=0),
    'Non_Defective': CustomerOrderform.shape[0] - CustomerOrderform.sum(axis=0)
}).T.values

In [33]:
# Ensure no zero rows/columns to prevent chi2_contingency errors
observed = observed[:, observed.sum(axis=0) > 0]

In [27]:
# 3.2 Model(s) - Chi-square test chosen for categorical data comparison

In [34]:
# Perform the chi-square test
chi2_stat, p_value, dof, expected = chi2_contingency(observed)

In [35]:
# 3.4 Model Evaluation
# Display results
print("Chi-square Statistic:", chi2_stat)
print("p-value:", p_value)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:")
print(expected)

Chi-square Statistic: 0.419047619047619
p-value: 0.9362801961345747
Degrees of Freedom: 3
Expected Frequencies:
[[3.5 3.5 3.5 3.5]
 [7.5 7.5 7.5 7.5]]


In [36]:
# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the Null Hypothesis: The defective percentage varies across centers.")
else:
    print("Fail to Reject the Null Hypothesis: The defective percentage does NOT vary across centers.")

Fail to Reject the Null Hypothesis: The defective percentage does NOT vary across centers.


# 4.Result - Business Impact
- Provides insights into quality control across different centers
- Helps identify centers with higher defect rates for targeted improvement
- Assists in optimizing operational efficiency and reducing rework costs