# Hypothesis Testing

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm

Z-Score function

In [2]:
def z_score(sample_size, sample_mean, pop_mean, pop_std):
    numerator = sample_mean - pop_mean
    denomenator = pop_std / sample_size**0.5
    return numerator / denomenator

t-score function

In [3]:
from scipy.stats import t

We use t-static for sample size < 30 and z-static for sample size > 30

**Two tailed t-test:** 
$$ |t| > t_{n-1,\frac{\alpha}{2}} \Rightarrow Accept \ H_1 \ or \ Reject \ H_0 $$
**Right tailed t-test:** 
$$ t > t_{n-1,\alpha} \Rightarrow Accept \ H_1 \ or \ Reject \ H_0 $$
**Left tailed t-test:** 
$$ t < t_{n-1,\alpha} \Rightarrow Accept \ H_1 \ or \ Reject \ H_0 $$

**Critical rejection regions**   
1. $H_1 (\mu < \mu_0) $ : Reject $H_0$ if $t$ < -$t_\alpha$   
2. $H_1 (\mu > \mu_0) $ : Reject $H_0$ if $t$ > -$t_\alpha$   
3. $H_1 (\mu \neq \mu_0) $ : Reject $H_0$ if $|t| > t_{\frac{\alpha}{2}}$

### **Ques.** Pista House selling Hyderabadi Chicken Dum biryani claims that each parcel packet has $500$ grams of biryani (also mentioned on the label of packet). You are sceptic of their claims and believe that on average each packet does not contain $500$ grams of biryani. How do you prove your claim?

**STEP : 1**   
$H_0$ : Packet contains 500 gms. of biryani (Null hypothesis) **We reject this**    
$H_1$ : Packet does not contain 500 gms. of biryani (Alternate hypothesis)   
$N$ (Sample size) : 10   
$$[490,480,473,492,422,463,478,434,499,467]$$
$ \bar{x} $ (Sample mean) =  469.8   
$ \mu_0 $ (Population mean) = 500 *Given*    
$ s $ (Population Std.) = 50 *Assumption*

**STEP : 2**   
Level of significance = 0.01 (1% Confidence interval)   
N = 10 

**STEP : 3**  
Apply t-test static criteria (3)  


In [6]:
# TWO-TAIL TEST 
alpha = 1 - 0.99
t_critical = t.ppf(1 - alpha/2 , df = 9)
print("Critical value is : ", t_critical)

Critical value is :  3.2498355440153697


We reject if : 
1. t-score > 3.2498
2. t-score < -3.2498

**STEP : 4**  
We are getting t-score as -1.9100

**STEP : 5**   
We fail to reject this claim.

### **Ques.2**  You have developed a new Natural Language Processing Algorithms and done a user study. You claim that the average rating given by the users is greater than 4 on a scale of 1 to 5. How do you prove this to your client?

**STEP : 1**   
$H_0$ : Average rating is > 4 (Null hypothesis) **We reject this**    
$H_1$ : Average rating should be < 4 (Alternate hypothesis)   
$N$ (Sample size) : 10   
$$[4,2,5,1,4,2,3,1,4,2]$$
$ \bar{x} $ (Sample mean) =  2.8   
$ \mu_0 $ (Population mean) = 4 *Given*    
$ s $ (Population Std.) = 1 *Assumption*

**STEP : 2**   
Level of significance = 0.01 (1% Confidence interval)   
N = 10 

**STEP : 3**  
Apply t-test static criteria (1)  

In [8]:
# ONE-TAIL TEST
from scipy.stats import t
alpha2 = 1 - 0.01
t_critical = t.ppf(1 - alpha2, df = 9)
print("Critical value is : ", t_critical)

Critical value is :  -2.8214379233005493


We reject if : 
1. t-score < -2.82144

**STEP : 4**  
We are getting t-score as -3.794742 which is less than t-critical

**STEP : 5**   
We reject this claim.

### **Ques.3** TATA has developed a better fuel management system for the SUV segment. They claim that with this system, on average the SUV's mileage is at least 15 km/litre?

**STEP : 1**   
$H_0$ : Average mileage $\geq $ 15 km/litre (Null hypothesis) **We reject this**    
$H_1$ : Average mileage should be $\leq$ 15 (Alternate hypothesis)   
$N$ (Sample size) : 10   
$$[14.25,14.58,13.29,16.24,16.77,18.24,13.24,14.22,15.25,16.11]$$
$ \bar{x} $ (Sample mean) =  15.219   
$ \mu_0 $ (Population mean) = 15 *Given*    
$ s $ (Population Std.) = 1 *Assumption*

**STEP : 2**   
Level of significance = 0.01 (1% Confidence interval)   
N = 10 

**STEP : 3**  
Apply t-test static criteria (1) 

In [9]:
# ONE-TAIL TEST
from scipy.stats import t
alpha3 = 1 - 0.01
t_critical = t.ppf(1 - alpha3, df = 9)
print("Critical value is : ", t_critical)

Critical value is :  -2.8214379233005493


We reject if :

t-score < -2.82144

**STEP : 4**  
We are getting t-score as 0.7004 which is more than t-critical

**STEP : 5**   
We fail reject this claim.

### **Ques.4** You have developed a new Machine Learning Application and claim that on average it takes less than 100 ms to predict for any future datapoint. How do you convince your client about this claim?

**STEP : 1**   
$H_0$ : Average time < 100 ms (Null hypothesis) **We reject this**    
$H_1$ : Average timw should be > 10 (Alternate hypothesis)   
$N$ (Sample size) : 10   
$$[99.25,98.77,96.48,102.24,103.24,101.98,95.24,91.02,97.48,104.99]$$
$ \bar{x} $ (Sample mean) =  99.069   
$ \mu_0 $ (Population mean) = 100 *Given*    
$ s $ (Population Std.) = 1 *Assumption*

**STEP : 2**   
Level of significance = 0.01 (1% Confidence interval)   
N = 10 

**STEP : 3**  
Apply t-test static criteria (2) 

In [10]:
# ONE-TAIL TEST
from scipy.stats import t
alpha4 = 1 - 0.01
t_critical = t.ppf(1 - alpha4, df = 9)
print("Critical value is : ", t_critical)

Critical value is :  -2.8214379233005493


We reject if :

t-score > -2.82144

**STEP : 4**  
We are getting t-score as -2.94408 which is less than t-critical

**STEP : 5**   
We fail reject this claim.

$$ * \ THE \ \ \  END * \ $$

#### **FOOT NOTES**

In [1]:
# ONE-TAIL TEST
from scipy.stats import t
alpha = 1 - 0.95
t_critical = t.ppf(1 - alpha, df = 19)
print(t_critical)

1.729132811521367


In [2]:
# TWO-TAIL
beta = 1 - 0.95
t_critical2 = t.ppf(1 - beta/2 , df = 19)
print(t_critical2)

2.093024054408263
