# Unpaired T test

A human resource specialist working in a technology company is interested in the overwork time of different teams. To investigate whether there is a difference between overtime of the software development team and the test team, she selected 17 employees randomly in each of the two teams and recorded their weekly average overwork time in terms of an hour. The data is below.

test_team=[6.2, 7.1, 1.5, 2,3 , 2, 1.5, 6.1, 2.4, 2.3, 12.4, 1.8, 5.3, 3.1, 9.4, 2.3, 4.1]
software_team=[2.3, 2.1, 1.4, 2.0, 8.7, 2.2, 3.1, 4.2, 3.6, 2.5, 3.1, 6.2, 12.1, 3.9, 2.2, 1.2 ,3.4]

According to this information, conduct the hypothesis testing to check whether there is a difference between the overwork time of two teams by using a 0.05 significance level. Before doing hypothesis testing, check the related assumptions. Comment on the results

In [14]:
def check_normality(data):
    test_stat_normality, p_value_normality=stats.shapiro(data)
    print("p value:%.4f" % p_value_normality)
    if p_value_normality <0.05:
        print("Reject null hypothesis >> The data is not normally distributed")
    else:
        print("Fail to reject null hypothesis >> The data is normally distributed")   

In [15]:
def check_variance_homogeneity(group1, group2):
    test_stat_var, p_value_var= stats.levene(group1,group2)
    print("p value:%.4f" % p_value_var)
    if p_value_var <0.05:
        print("Reject null hypothesis >> The variances of the samples are different.")
    else:
        print("Fail to reject null hypothesis >> The variances of the samples are same.")

In [11]:
test_team=np.array([6.2,  7.1,  1.5,  2,3 ,  2,  1.5,  6.1,  2.4,  2.3, 12.4,  1.8,  5.3,  3.1, 9.4,  2.3, 4.1])
developer_team=np.array([2.3,  2.1,  1.4,  2.0, 8.7,  2.2,  3.1,  4.2,  3.6, 2.5,  3.1,  6.2, 12.1,  3.9,  2.2, 1.2 ,3.4])

H0 : The data is normally distributed.
H1 : The data is not normally distributed.

In [16]:
check_normality(test_team)
check_normality(developer_team)

p value:0.0046
Reject null hypothesis >> The data is not normally distributed
p value:0.0005
Reject null hypothesis >> The data is not normally distributed


H0 : The variances of the samples are the same.
    
H1 : The variances of the samples are different.

In [18]:
check_variance_homogeneity(test_team, developer_team)

p value:0.5410
Fail to reject null hypothesis >> The variances of the samples are same.


H0 : u1=u2 or u1-u2=0 or The mean of the samples are same.
    
H1 : u1!=u2 or u1-u2!=0 or The mean of the samples are different.

In [19]:
ttest,pvalue = stats.mannwhitneyu(test_team,developer_team, alternative="two-sided")
print("p-value:%.4f" % pvalue)
if pvalue <0.05:
    print("Reject null hypothesis")
else:
    print("Fail to recejt null hypothesis")

p-value:0.8226
Fail to recejt null hypothesis


### At this significance level, it can be said that there is no statistically significant difference between the average overwork time of the two teams.

# Paired T test

H0: It signifies that the mean pre-test and post-test scores are equal
    
HA: It signifies that the mean pre-test and post-test scores are not equal

In [20]:
# Importing library 
import scipy.stats as stats 

# pre holds the mileage before 
# applying the different engine oil 
pre = [30, 31, 34, 40, 36, 35, 
	34, 30, 28, 29] 

# post holds the mileage after 
# applying the different engine oil 
post = [30, 31, 32, 38, 32, 31, 
		32, 29, 28, 30] 

# Performing the paired sample t-test 
stats.ttest_rel(pre, post) 

Ttest_relResult(statistic=2.584921310565987, pvalue=0.029457853822895275)

The test statistic comes out to be equal to 2.584 and the corresponding two-sided p-value is 0.029.

### As the p-value comes out to be equal to 0.029 which is less than 0.05 hence we reject the null hypothesis

# Z test- one sample

Example: Suppose we want to test whether or not girls, on average, score higher than 600 on the SAT verbal section. Suppose we also know that the standard deviation for girls SAT verbal section scores is 100.We collect the data of 32 girls by using random samples and record their marks. Finally, we also set our alpha (⍺) value (significance level) to be 0.05.

In [21]:
import math 
import numpy as np
from statsmodels.stats.weightstats import ztest
from scipy.stats import norm

sample_marks = [650,730,510,670,480,800,690,530,590,620,710,670,640,780,650,490,800,600,510,700,750,340,650,987,345,654,500,900,867,450,324,435]

# Method 1 : Using Z-score

sample_mean = np.mean(sample_marks)
sample_size = np.count_nonzero(sample_marks)
population_mean = 600
population_std = 100
alpha = 0.05
z_score = (sample_mean-population_mean)/(population_std/math.sqrt(sample_size))
critical_value = 1.645 # from z table
if(z_score<critical_value):
    print('Null hypothesis is accepted!')
else:
    print('Null hypothesis is rejected. \nAlternate hypothesis is accepted!')
    
    
# Method 2: Using built in function of ztest

ztest_score, pval = ztest(sample_marks,value=population_mean,alternative='larger')
print('Z-test Score:',ztest_score,'\nP-value:',pval)
if(pval>alpha):
     print('Null hypothesis is accepted!')
else:
    print('Null hypothesis is rejected. \nAlternate hypothesis is accepted!')   
 
 
# Method 3: Creating a function 

def ztest(x,mu,sigma,n):
    deno = sigma/math.sqrt(n)
    z = (x-mu)/deno
    p = 2*(1-norm.cdf(abs(z)))
    return z,p
  
s_mean = np.mean(sample_marks)
p_mean = 600
p_std = 100
s_size = np.count_nonzero(sample_marks)

ztest(s_mean,p_mean,p_std,s_size)

ztest(641,600,100,20)

Null hypothesis is accepted!
Z-test Score: 0.8987357385558196 
P-value: 0.18439671803663865
Null hypothesis is accepted!


(1.8335757415498277, 0.06671699590108493)

# Two sample z test

Example: Suppose we want to test if Girls on an average score 10 marks more than the boys. Suppose we also know that the standard deviation for girl’s Score is 100 and for boy’s score is 90. We collect the data of 32 girls and 32 boys by using random samples and record their marks. Finally, we also set our alpha (⍺) value (significance level) to be 0.05.

In [22]:
import math 
import numpy as np
from statsmodels.stats.weightstats import ztest

sample_marks1 = [650,730,510,670,480,800,690,530,590,620,710,670,640,780,650,490,800,600,510,700,750,340,650,987,345,654,500,900,867,450,324,435]
sample_marks2 = [630,720,462,631,440,783,673,519,543,579,677,649,632,768,615,463,781,563,488,650,670,780,640,654,510,654,899,760,234,657,789,324]

sample_mean1 = np.mean(sample_marks1)
sample_mean2 = np.mean(sample_marks2)
sample_size1 = np.count_nonzero(sample_marks1)
sample_size2 = np.count_nonzero(sample_marks2)
population_mean_diff = 10
population_std1 = 100
population_std2 = 90
alpha = 0.05

# Method 1: Using built in function of ztest

z,p = ztest(x1=sample_marks1,x2=sample_marks2,value=population_mean_diff,alternative='larger')
print('Z-score:',z,'\nP-value:',p)

if(p>alpha):
    print('Null hypothesis is accepted!')
else:
    print('Null hypothesis is rejected. \nAlternate hypothesis is accepted!')    
    
# Method 2: Calculating Z-score  

zscore = ((sample_mean1-sample_mean2)-(population_mean_diff))/(math.sqrt((population_std1**2/sample_size1)+(population_std2**2/sample_size2)))
critical_value = 1.645 # from z table

if(zscore<critical_value):
    print('Null hypothesis is accepted!')
else:
    print('Null hypothesis is rejected. \nAlternate hypothesis is accepted!')

Z-score: -0.11110075484097665 
P-value: 0.5442317749137925
Null hypothesis is accepted!
Null hypothesis is accepted!
