# 1. Business Problem
A F&B manager wants to determine whether there is any significant difference 
in the diameter of the cutlet between two units. 
A randomly selected sample of cutlets was collected from both units and measured? 
Analyze the data and draw inferences at 5% significance level. 
Please state the assumptions and tests that you carried out to check validity of the assumptions.
File: Cutlets.csv

# 1.1 Objective
The goal is to determine whether there is a significant difference in the diameter of cutlets 
between two production units. 
This will help the F&B manager assess consistency in the manufacturing process.

# 1.2 Constraints
Limited sample size, which might affect statistical power.
Assumption of normality and equal variance needs to be validated 
before applying statistical tests.

In [16]:
import pandas as pd
import numpy as np
import scipy 
from scipy import stats
#provides statistical functions
#stats contains a variety of statstical tests
from statsmodels.stats import descriptivestats as sd
#provides descriptive stastics tools, including the sign test
from statsmodels.stats.weightstats import ztest
#Used for conducting z-tests on datasets.
import pylab


In [17]:
data= pd.read_csv("Cutlets.csv")

In [18]:
data.head()

Unnamed: 0,Unit A,Unit B
0,6.809,6.7703
1,6.4376,7.5093
2,6.9157,6.73
3,7.3012,6.7878
4,7.4488,7.1522


In [19]:
data.describe()

Unnamed: 0,Unit A,Unit B
count,35.0,35.0
mean,7.019091,6.964297
std,0.288408,0.343401
min,6.4376,6.038
25%,6.8315,6.7536
50%,6.9438,6.9399
75%,7.28055,7.195
max,7.5169,7.5459


In [20]:
data.columns=["Unit A","Unit B"]

# Normality Test

In [21]:

data.isnull().sum()


Unit A    16
Unit B    16
dtype: int64

In [22]:
data.dropna(inplace=True)


In [23]:
data.isnull().sum()

Unit A    0
Unit B    0
dtype: int64

In [26]:
print('shapiro_A: ',stats.shapiro(data['Unit A']))

shapiro_A:  ShapiroResult(statistic=0.9649456489968531, pvalue=0.31997821996861)


In [27]:
print('shapiro_B: ',stats.shapiro(data['Unit B']))

shapiro_B:  ShapiroResult(statistic=0.9727301795873082, pvalue=0.5225029843840996)


# Normality Results

1. Unit A: p-value = 0.3199 (> 0.05) → Fails to reject the null hypothesis (data is normally distributed).
2. Unit B: p-value = 0.5225 (> 0.05) → Fails to reject the null hypothesis (data is normally distributed).

Since both Unit A and Unit B are normally distributed, can proceed with a 2-sample T-test.

In [28]:
# Variance Test

In [30]:
levene_test=scipy.stats.levene(data['Unit A'],data['Unit B'])
print('levene_test(Variance): ',levene_test)

levene_test(Variance):  LeveneResult(statistic=0.6650897638632386, pvalue=0.4176162212502553)


In [31]:
#p-value= 0.4176
#H0=variance equal
#H1=variance unequal
#pvalue=0.4176 > 0.05 Fail to reject null hypothesis(H0 is accpetd)

# 2 Sample T-test

In [33]:
TwoSampleTtest=scipy.stats.ttest_ind(data['Unit A'],data['Unit B'])
print("Two sample T test: ",TwoSampleTtest)

Two sample T test:  TtestResult(statistic=0.7228688704678063, pvalue=0.4722394724599501, df=68.0)


<b>Interpreting the Two-Sample t-test Results

t-statistic = 0.7229
p-value = 0.4722 (> 0.05)
df (degrees of freedom) = 68

<b>Conclusion:
Since the p-value (0.4722) is greater than 0.05, we fail to reject the null hypothesis.

This means there is no significant difference between the mean diameters of cutlets from Unit A and Unit B.

In simple words, the cutlets from both units have approximately the same average diameter.