<a href="https://colab.research.google.com/github/ada-nai/nptel-dap/blob/master/4/1_hypothesis_testing_1sample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hpothesis Testing

In [None]:
from scipy import stats
import numpy as np

In [None]:
x = [10, 12, 20, 21, 22, 24, 18, 15]
  

## t-test ($σ$ not known)
$
H_0 : \mu = 15 \\
H_a : \mu \neq 15
$

In [None]:
# mu=15 -> assumed mean
stats.ttest_1samp(x, 15)

Ttest_1sampResult(statistic=1.5623450931857947, pvalue=0.1621787560592894)

### Problem: Icecream sales
In an ice cream parlor, the `icecream.xlsx`  data represents the number of ice-creams sold in 20 days \\
Test hypothesis $Ho: \mu < 10$. 
Use $\alpha = 0.05$ to test the hypothesis.

In [None]:
import pandas as pd

icecream = pd.read_excel("https://github.com/ada-nai/nptel-dap/blob/master/4/data/icecream.xlsx?raw=true")

In [None]:
sales = icecream['Number of ice cream sold']

In [None]:
stats.ttest_1samp(sales, 10)

Ttest_1sampResult(statistic=-0.35843385854878496, pvalue=0.7239703579964252)

Above `pvalue` obtained is for two-tailed test. Thus we need to divide by 2 

In [None]:
pvalue=0.7239703579964252
pvalue/2

0.3619851789982126

$0.36 > 0.05$. Thus we fail to reject $H_0$

## Proportion Test (z-test)

In [None]:
from statsmodels.stats.proportion import proportions_ztest

count = 67
sample_size = 120
P = 0.5

  import pandas.util.testing as tm


In [None]:
proportions_ztest(count, sample_size, P)

(1.286806739751111, 0.1981616572238455)

The output is (z-statistic, p-value)

## Errors in Hypothesis Testing
- Type I error
- Type II error

### Calculating Type I error

In [None]:
# Define function to calculate alpha value
def get_alpha(x, mu, std_error_mean):
  z = (x-mu)/std_error_mean
  if (z<0):
    alpha = stats.norm.cdf(z)
  else:
    alpha = 1 - stats.norm.cdf(z)
  return alpha

In [None]:
x = 48.5
mu = 50 
sem = 0.79

get_alpha(x, mu, sem)

0.02879971774715278

This is Type I error for one tailed test. For total error, we need to multiply by 2

In [None]:
0.02879971774715278*2

0.05759943549430556

- This implies that 5.76% of all random samples would lead to rejection of $H_0: \mu=50$
- Type I errors can be reduced by widening the acceptance region

### Calculating Type II error


#### Problem: Service time  

The mean response time for a random sample of 40 food orders is 13.25 minutes. The population standard deviation is believed to be 3.2 minutes. The restaurant owner wants to perform a hypothesis test, with $\alpha=0.05$  level of significance, to determine whether the service goal of 12 minutes or less is being achieved.

$
H_0: \mu \leq 12; H_a: \mu > 12 \\
n=40, \sigma=3.2 \\
Reject \text{  } H_0 \text{  } if \text{  } z \geq z_{\alpha} = 1.645
$

Using z-value, we can get a threshold for $\bar x$ for it to be in acceptance region

Accept $H_0$ if $\bar x <= 12.8323$

In [None]:
def type2(assumed_mean, true_mean, sigma, n, alpha):
  z = stats.norm.ppf(alpha)
  xbar = assumed_mean + ((z*sigma)/np.sqrt(n))
  z2 = (xbar - true_mean) / (sigma/np.sqrt(n))
  if (assumed_mean > true_mean):
    beta = 1-stats.norm.cdf(z2)
  else:
    beta = stats.norm.cdf(z2)
  return beta

In [None]:
  def get_type2(assumed_mean):
    

    beta = type2(assumed_mean, true_mean, sigma, n, alpha)

    return beta

In [None]:
true_mean = 12
sigma = 3.2
n = 40
alpha = 0.05

assumed_means = [14, 13.6, 13.2, 12.8323, 12.8, 12.4, 12.0001, 12]

print('assumed_mean : beta')
for mean in assumed_means:
  beta = type2(mean, true_mean, sigma, n, alpha)
  print(f'{mean}: {beta}')

assumed_mean : beta
14: 0.010499750448532241
13.6: 0.06457982995225997
13.2: 0.2336575101104159
12.8323: 0.49995065746353273
12.8: 0.5254013387545549
12.4: 0.8035262335707292
12.0001: 0.9499796127157129
12: 0.050000000000000024
