## Cutlets Case

In this case the hypotheses are as follows:

Null (H0) - Both the cutlet dimensions are same.

Alternate (H1) - There is some significant difference between the cutlet dimensions

If p-value < 0.05, then alternate hypothesis is true, else null hypothesis is true.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('Cutlets.csv')
df

Unnamed: 0,Unit A,Unit B
0,6.809,6.7703
1,6.4376,7.5093
2,6.9157,6.73
3,7.3012,6.7878
4,7.4488,7.1522
5,7.3871,6.811
6,6.8755,7.2212
7,7.0621,6.6606
8,6.684,7.2402
9,6.8236,7.0503


In [3]:
import scipy
zcal, pval = scipy.stats.ttest_ind(df['Unit A'], df['Unit B'])

In [4]:
pval

0.4722394724599501

Since p-value > 0.05, the null hypothesis is true, which means there is no significant difference between the two cutlet dimensions.

## LabTAT Case

Since there are more than two groups that should be compared, the ANOVA test should be used here. The hypotheses are:

Null (H0) - No significant difference between the different groups

Alternate (H1) - At least one of the groups is different from the rest.

In [5]:
df = pd.read_csv('LabTAT.csv')
df

Unnamed: 0,Laboratory 1,Laboratory 2,Laboratory 3,Laboratory 4
0,185.35,165.53,176.70,166.13
1,170.49,185.91,198.45,160.79
2,192.77,194.92,201.23,185.18
3,177.33,183.00,199.61,176.42
4,193.41,169.57,204.63,152.60
...,...,...,...,...
115,178.49,170.66,193.80,172.68
116,176.08,183.98,215.25,177.64
117,202.48,174.54,203.99,170.27
118,182.40,197.18,194.52,150.87


In [6]:
import scipy.stats as stats
pval = stats.f_oneway(df.iloc[:, 0], df.iloc[:, 1], df.iloc[:, 2], df.iloc[:, 3])[1]

if pval < 0.05:
    print('Reject null hypothesis. There are some differences between the laboratories')
else:
    print('Accept null hypothesis. All the flower species are same')

Reject null hypothesis. There are some differences between the laboratories


In [7]:
pval

2.1156708949992414e-57

Since the p-value is much lesser than 0.05, the alternate hypothesis is true which means at least one of the lab stats is different from the others.

## Product Sales Case

Here, the proportions of buyer ratios of male to female should be checked for 4 different regions. So, the Chi-Square test should be used. The hypotheses would be:

Null (H0): No significant difference between the buyer ratio proportions across the 4 regions.

Alternate (H1): There is some significant difference between the buyer ratio proportions across the 4 regions.

In [36]:
df = pd.read_csv('BuyerRatio.csv', index_col=0)
df

Unnamed: 0_level_0,East,West,North,South
Observed Values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Males,50,142,131,70
Females,435,1523,1356,750


In [37]:
chisquare_result = stats.chi2_contingency(df)
chisquare_result

Chi2ContingencyResult(statistic=1.595945538661058, pvalue=0.6603094907091882, dof=3, expected_freq=array([[  42.76531299,  146.81287862,  131.11756787,   72.30424052],
       [ 442.23468701, 1518.18712138, 1355.88243213,  747.69575948]]))

In [38]:
if (chisquare_result.pvalue > 0.05):
    print('P-value is > 0.05. Accept Null Hypothesis which means that the proportion of buyer ratios of male to female across the 4 regions is same')
else:
    print('P-value is < 0.05. Reject Null Hypothesis which means that there is difference in proportion of buyer ratios of male to female across the 4 regions.')

P-value is > 0.05. Accept Null Hypothesis which means that the proportion of buyer ratios of male to female across the 4 regions is same


## Customer Order Case

Here, the proportions of defective to error free items should be checked for 4 different countries. So, the Chi-Square test should be used. The hypotheses would be:

Null (H0): No significant difference between the proportions across the 4 countries.

Alternate (H1): There is some significant difference between the proportions across the 4 countries.

In [8]:
df = pd.read_csv('CustomerOrderForm.csv')
df

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free
...,...,...,...,...
295,Error Free,Error Free,Error Free,Error Free
296,Error Free,Error Free,Error Free,Error Free
297,Error Free,Error Free,Defective,Error Free
298,Error Free,Error Free,Error Free,Error Free


In [25]:
df['Phillippines'].value_counts()

Error Free    271
Defective      29
Name: Phillippines, dtype: int64

In [26]:
df['Indonesia'].value_counts()

Error Free    267
Defective      33
Name: Indonesia, dtype: int64

In [27]:
df['Malta'].value_counts()

Error Free    269
Defective      31
Name: Malta, dtype: int64

In [28]:
df['India'].value_counts()

Error Free    280
Defective      20
Name: India, dtype: int64

In [29]:
data = {'Phillippines': {'0': 271, '1': 29}, 'Indonesia': {'0': 267, '1': 33}, 'Malta': {'0': 269, '1': 31}, 'India': {'0': 280, '1': 20}}
data

{'Phillippines': {'0': 271, '1': 29},
 'Indonesia': {'0': 267, '1': 33},
 'Malta': {'0': 269, '1': 31},
 'India': {'0': 280, '1': 20}}

In [30]:
data = pd.DataFrame(data)
data

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,271,267,269,280
1,29,33,31,20


In [33]:
chisquare_result = stats.chi2_contingency(data)
chisquare_result

Chi2ContingencyResult(statistic=3.858960685820355, pvalue=0.2771020991233135, dof=3, expected_freq=array([[271.75, 271.75, 271.75, 271.75],
       [ 28.25,  28.25,  28.25,  28.25]]))

In [34]:
if (chisquare_result.pvalue > 0.05):
    print('P-value is > 0.05. Accept Null Hypothesis which means that the proportion of defective items across the 4 countries is same')
else:
    print('P-value is < 0.05. Reject Null Hypothesis which means that there is difference in proportion of defective items across the 4 countries.')

P-value is > 0.05. Accept Null Hypothesis which means that the proportion of defective items across the 4 countries is same
