## Q1. A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions.!

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import norm

In [2]:
cutlet_data = pd.read_csv('Cutlets.csv')

In [3]:
cutlet_data

Unnamed: 0,Unit A,Unit B
0,6.809,6.7703
1,6.4376,7.5093
2,6.9157,6.73
3,7.3012,6.7878
4,7.4488,7.1522
5,7.3871,6.811
6,6.8755,7.2212
7,7.0621,6.6606
8,6.684,7.2402
9,6.8236,7.0503


### Solution:

* H0: There is no difference between the diameter of the cutlet between 2 units
* Ha: There is significance difference between the diameter of the cutlet between 2 units

## 2 Sample T Test can be applied

In [4]:
cutlet_data.dtypes

Unit A    float64
Unit B    float64
dtype: object

In [5]:
cutlet_data['Unit A'].mean()

7.01909142857143

In [6]:
cutlet_data['Unit B'].mean()

6.964297142857142

In [7]:
unit_A = pd.Series(cutlet_data['Unit A'])
unit_A

0     6.8090
1     6.4376
2     6.9157
3     7.3012
4     7.4488
5     7.3871
6     6.8755
7     7.0621
8     6.6840
9     6.8236
10    7.3930
11    7.5169
12    6.9246
13    6.9256
14    6.5797
15    6.8394
16    6.5970
17    7.2705
18    7.2828
19    7.3495
20    6.9438
21    7.1560
22    6.5341
23    7.2854
24    6.9952
25    6.8568
26    7.2163
27    6.6801
28    6.9431
29    7.0852
30    6.7794
31    7.2783
32    7.1561
33    7.3943
34    6.9405
Name: Unit A, dtype: float64

In [8]:
unit_B = pd.Series(cutlet_data['Unit B'])
unit_B

0     6.7703
1     7.5093
2     6.7300
3     6.7878
4     7.1522
5     6.8110
6     7.2212
7     6.6606
8     7.2402
9     7.0503
10    6.8810
11    7.4059
12    6.7652
13    6.0380
14    7.1581
15    7.0240
16    6.6672
17    7.4314
18    7.3070
19    6.7478
20    6.8889
21    7.4220
22    6.5217
23    7.1688
24    6.7594
25    6.9399
26    7.0133
27    6.9182
28    6.3346
29    7.5459
30    7.0992
31    7.1180
32    6.6965
33    6.5780
34    7.3875
Name: Unit B, dtype: float64

In [9]:
_,pval = stats.ttest_ind(a=unit_A, b=unit_B) # 2 sample t test
print(_)
print(pval)

0.7228688704678063
0.4722394724599501


In [10]:
#level of significance is 5%. i.e, at 5% level of significance, do we reject or not?
if pval<0.05:
    print('We can reject the null hypothesis and we can claim that there is significance difference in the diameter of the cutlet between the 2 units')
else:
    print('Do not reject the null hypothesis and we can claim that there is no significance difference in the diameter of the cutlet between the 2 units')

Do not reject the null hypothesis and we can claim that there is no significance difference in the diameter of the cutlet between the 2 units


## Q2. A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch.
   
## Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.


In [11]:
lab_data = pd.read_csv('LabTAT.csv')

In [12]:
lab_data

Unnamed: 0,Laboratory 1,Laboratory 2,Laboratory 3,Laboratory 4
0,185.35,165.53,176.70,166.13
1,170.49,185.91,198.45,160.79
2,192.77,194.92,201.23,185.18
3,177.33,183.00,199.61,176.42
4,193.41,169.57,204.63,152.60
...,...,...,...,...
115,178.49,170.66,193.80,172.68
116,176.08,183.98,215.25,177.64
117,202.48,174.54,203.99,170.27
118,182.40,197.18,194.52,150.87


### Solution:

* Null Hypothesis H0: No variance
* Alternate hypothesis Ha: has variance 

## ANOVA test can be applied

In [13]:
fvalue, pvalue = stats.f_oneway(lab_data['Laboratory 1'], lab_data['Laboratory 2'], lab_data['Laboratory 3'], lab_data['Laboratory 4'])
print(fvalue, pvalue)

118.70421654401437 2.1156708949992414e-57


In [14]:
#level of significance is 5%. i.e, at 5% level of significance, do we reject or not?
if pvalue<0.05:
    print('We can reject the null hypothesis and we can say that there is variance or difference in the avg TAT')
else:
    print('Do not reject the null hypothesis and we can say that there is no variance or difference in the avg TAT')

We can reject the null hypothesis and we can say that there is variance or difference in the avg TAT


## Q3. Sales of products in four different regions is tabulated for males and females. Find if male-female buyer rations are similar across regions.



In [15]:
buyer_ratio_data = pd.read_csv('BuyerRatio.csv')
buyer_ratio_data

Unnamed: 0,Observed Values,East,West,North,South
0,Males,50,142,131,70
1,Females,435,1523,1356,750


In [16]:
buyer_ratio_data.dtypes

Observed Values    object
East                int64
West                int64
North               int64
South               int64
dtype: object

In [19]:
buyer_ratio_data.describe()

Unnamed: 0,East,West,North,South
count,2.0,2.0,2.0,2.0
mean,242.5,832.5,743.5,410.0
std,272.236111,976.514465,866.205807,480.832611
min,50.0,142.0,131.0,70.0
25%,146.25,487.25,437.25,240.0
50%,242.5,832.5,743.5,410.0
75%,338.75,1177.75,1049.75,580.0
max,435.0,1523.0,1356.0,750.0


### SOLUTION:

* H0: All proportions are equal
* Ha: Not all proportions are equal

## CHI-SQUARE Test can be applied

In [22]:
del buyer_ratio_data['Observed Values']
buyer_ratio_data

Unnamed: 0,East,West,North,South
0,50,142,131,70
1,435,1523,1356,750


In [25]:
chi2_score,pval,dof,expected_table = stats.chi2_contingency(buyer_ratio_data)

In [26]:
print('chi2 value: ',round(chi2_score,5))
print('p value: ',round(pval,5))
print('degree of freedom: ',dof)
print('expected table: \n',expected_table)

chi2 value:  1.59595
p value:  0.66031
degree of freedom:  3
expected table: 
 [[  42.76531299  146.81287862  131.11756787   72.30424052]
 [ 442.23468701 1518.18712138 1355.88243213  747.69575948]]


In [28]:
#level of significance is 5%. i.e, at 5%level of significance, do we reject or not?
if pval<0.05:
    print('We can reject the null hypothesis and we can claim that all proportions are equal')
else:
    print('Do not reject the null hypothesis and we can claim that not all proportions are equal')

Do not reject the null hypothesis and we can claim that not all proportions are equal


## Q4. TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain %  of the customer order forms. Any error in order form renders it defective and has to be reworked before processing.  The manager wants to check whether the defective %  varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences

In [38]:
from scipy.stats import norm
from scipy.stats import chi2_contingency

In [39]:
cust_data = pd.read_csv('Costomer+OrderForm.csv')

In [40]:
cust_data

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free
...,...,...,...,...
295,Error Free,Error Free,Error Free,Error Free
296,Error Free,Error Free,Error Free,Error Free
297,Error Free,Error Free,Defective,Error Free
298,Error Free,Error Free,Error Free,Error Free


In [41]:
cust_data.Phillippines.value_counts()

Error Free    271
Defective      29
Name: Phillippines, dtype: int64

In [42]:
cust_data.Indonesia.value_counts()

Error Free    267
Defective      33
Name: Indonesia, dtype: int64

In [43]:
cust_data.Malta.value_counts()

Error Free    269
Defective      31
Name: Malta, dtype: int64

In [44]:
cust_data.India.value_counts()

Error Free    280
Defective      20
Name: India, dtype: int64

In [45]:
# Make a contingency table
obs_table=np.array([[271,267,269,280],[29,33,31,20]])
obs_table

array([[271, 267, 269, 280],
       [ 29,  33,  31,  20]])

### Solution:

* Ho = defectives of all countries are equal 
* Ha = at lest one defective is not equal

## CHI-SQUARE TEST

In [49]:
chi2_score,pval,dof,expected_table = stats.chi2_contingency(observed=obs_table)
print('chi2 value: ',round(chi2_score,5))
print('p value: ',round(pval,5))
print('degree of freedom: ',dof)
print('expected table: \n',expected_table)

chi2 value:  3.85896
p value:  0.2771
degree of freedom:  3
expected table: 
 [[271.75 271.75 271.75 271.75]
 [ 28.25  28.25  28.25  28.25]]


In [50]:
#level of significance is 5%. i.e, at 5%level of significance, do we reject or not?
if pval<0.05:
    print('We can reject the null hypothesis and we can claim that defectives of all countries are equal')
else:
    print('Do not reject the null hypothesis and we can claim that atleast one defective is not equal')

Do not reject the null hypothesis and we can claim that atleast one defective is not equal
