## Hypothesis Testing Exercise

### A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions. 
Minitab File : Cutlets.mtw


In [1]:
import pandas as pd
from scipy import stats

In [2]:
# Load the data
data = pd.read_csv("C:\\Users\\SHUBHAM GARKAL\\Downloads\\Cutlets.csv")

In [3]:
# assume data in two columns named 'Unit1' and 'Unit2'
unit1_data = data['Unit A'] 
unit2_data = data['Unit B'] 

In [4]:
# Assumptions:
# 1. Both samples are independent.
# 2. Both samples are randomly selected.
# 3. Both populations have similar variances (homogeneity of variances assumption). We'll test this assumption.

# Levene's test for equality of variances
statistic, p_value = stats.levene(unit1_data, unit2_data)
print("Levene's test p-value:", p_value)

Levene's test p-value: 0.4176162212502553


In [5]:
# If p-value > 0.05, we can assume equal variances (homogeneity of variances)

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(unit1_data, unit2_data)

print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: 0.7228688704678063
p-value: 0.4722394724599501


In [6]:
# Conclusion
alpha = 0.05
if p_value < alpha:
    print("Reject null hypothesis: There is a significant difference in the diameter of cutlets between two units.")
else:
    print("Fail to reject null hypothesis: There is no significant difference in the diameter of cutlets between two units.")

Fail to reject null hypothesis: There is no significant difference in the diameter of cutlets between two units.


### A hospital wants to determine whether there is any difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list. They collected a random sample and recorded TAT for reports of 4 laboratories. TAT is defined as sample collected to report dispatch.
  Analyze the data and determine whether there is any difference in average TAT among the different laboratories at 5% significance level.
    Minitab File: LabTAT.mtw


In [8]:
import pandas as pd
from scipy import stats

In [9]:
# Load the data
df = pd.read_csv("C:\\Users\\SHUBHAM GARKAL\\Downloads\\LabTAT.csv")

In [10]:
lab1 = pd.DataFrame(df.iloc[:,0])
lab2 = pd.DataFrame(df.iloc[:,1])
lab3 = pd.DataFrame(df.iloc[:,2])
lab4 = pd.DataFrame(df.iloc[:,3])

In [11]:
# Perform one-way ANOVA test
statistic, p_value = stats.f_oneway(lab1, lab2, lab3, lab4)

In [12]:
# Print the results
print("F-statistic:", statistic)
print("P-value:", p_value)

F-statistic: [118.70421654]
P-value: [2.11567089e-57]


In [13]:
# Interpret the results
alpha = 0.05  # Significance level
if p_value < alpha:
    print("There is significant evidence to reject the null hypothesis.")
    print("There is a difference in average TAT among the different laboratories.")
else:
    print("There is no significant evidence to reject the null hypothesis.")
    print("There is no difference in average TAT among the different laboratories.")

There is significant evidence to reject the null hypothesis.
There is a difference in average TAT among the different laboratories.


### Sales of products in four different regions is tabulated for males and females. Find if male-female buyer rations are similar across regions.
	East	West	North	South
Males	50	142	131	70
Females	550	351	480	350

H0 = All proportions are equal    Ha = Not all Proportions are equal     
1. Check p-value    2. If p-Value < alpha, we reject Null Hypothesis


In [16]:
import pandas as pd
from scipy.stats import chi2_contingency

In [17]:
# Load the data
data = pd.read_csv("C:\\Users\\SHUBHAM GARKAL\\Downloads\\BuyerRatio.csv")
data

Unnamed: 0,Observed Values,East,West,North,South
0,Males,50,142,131,70
1,Females,435,1523,1356,750


In [18]:
# Define the observed frequencies
observed = [[50, 142, 131, 70],
            [550, 351, 480, 350]]

In [19]:
# Perform chi-square test
chi2_stat, p_val, _, _ = chi2_contingency(observed)

In [20]:
# Define the significance level (alpha)
alpha = 0.05

In [21]:
# Print p-value
print("p-value:", p_val)

p-value: 2.682172557281901e-17


In [22]:
# Check if p-value is less than alpha
if p_val < alpha:
    print("Reject Null Hypothesis")
else:
    print("Fail to Reject Null Hypothesis")

Reject Null Hypothesis


### TeleCall uses 4 centers around the globe to process customer order forms. They audit a certain %  of the customer order forms. Any error in order form renders it defective and has to be reworked before processing.  The manager wants to check whether the defective %  varies by centre. Please analyze the data at 5% significance level and help the manager draw appropriate inferences
Minitab File: CustomerOrderForm.mtw



In [23]:
import pandas as pd
from scipy.stats import chi2_contingency

In [24]:
# Load the data
data = pd.read_csv("C:\\Users\\SHUBHAM GARKAL\\Downloads\\Costomer+OrderForm.csv")

In [25]:
# Count the occurrences of Defective and Error Free for each center
center_counts = data.apply(pd.Series.value_counts)

In [26]:
# Perform chi-square test for independence
chi2, p, dof, expected = chi2_contingency(center_counts)

In [27]:
# Output the results
print("Chi-square statistic:", chi2)
print("p-value:", p)
print("Degrees of freedom:", dof)
print("Expected frequencies table:")
print(pd.DataFrame(expected, index=center_counts.index, columns=center_counts.columns))

Chi-square statistic: 3.858960685820355
p-value: 0.2771020991233135
Degrees of freedom: 3
Expected frequencies table:
            Phillippines  Indonesia   Malta   India
Error Free        271.75     271.75  271.75  271.75
Defective          28.25      28.25   28.25   28.25


In [28]:
# Interpret the results
alpha = 0.05
if p < alpha:
    print("Reject the null hypothesis: There is evidence that the defective percentage varies by center.")
else:
    print("Fail to reject the null hypothesis: There is no evidence that the defective percentage varies by center.")

Fail to reject the null hypothesis: There is no evidence that the defective percentage varies by center.
