# Hypothesis Testing (T - Test)

## Example : <br>Process Control at the call center
### Performance of a call center is monitored by the average call duration.<br> Data from the 18 months shows that on the day when process runs normally, <br> 𝜇 = <i>4 minutes</i>, σ = <i>3 minutes</i> <br>Cannot monitor each and every call due to limited resources so radomly sapled 50 call per day.<br> Hence, n = <i>50 calls per day</i>.<br><br>We already know sample mean everyday will be different - <i>Inherent invariability</i>.<br>But, when should you be alarmed and conclude that system is not behaving normally - <i>External invariability</i>.<br><br>Pragmatic Approch: <br>System behaves normally when 𝜇 = <i>4 minutes</i><br>So, we should look for deviation on either side of 𝜇.<br>
| Day | Mean Call Duration |
| :- | :-: |
| 1 | 3.7 |
| 2 | 4.1 |
| 3 | 3.5 |
| 4 | 4.2 |
| 5 | 3.9 |
| 6 | 4.1 |
| 7 | 4.2 |
| 8 | 3.8 |
| 9 | 3.7 |
| 10 | 4.6 |
| 11 | 3.7 |
| 12 | 4.6 |
| 13 | 4.0 |
| 14 | 4.2 |
| 15 | 3.8 |
| 16 | 4.4 |
| 17 | 5.3 |
| 18 | 6.1 |
| 19 | 7.2 |
| 20 | 6.5 |


In [1]:
from scipy import stats
import pandas as pd
import numpy as np
import seaborn as sns

##### Formula : 2 * stats.t.cdf((x̄-𝜇)/(s/sqrt(n))) , n-1) # n = sample size DF = n-1

In [2]:
m = [3.7, 4.1, 3.5, 4.2, 3.9, 4.1, 4.2, 3.8, 3.7, 4.6, 3.7, 4.6, 4.0, 4.2, 3.8, 4.4, 5.3, 6.1, 7.2, 6.5]
data = pd.Series(m)
mean = 4
n = 50
std = 3


### T- Value Calculations

In [3]:
# 𝜇 = 4
# n = 50 samples per day
# σ = 3
# T value = (x̄-𝜇) / (σ/√n)

Tval = []
for i in data:
    Tval.append((i - mean)/(std/np.sqrt(n)))
Tval

[-0.7071067811865471,
 0.235702260395515,
 -1.1785113019775793,
 0.4714045207910321,
 -0.23570226039551606,
 0.235702260395515,
 0.4714045207910321,
 -0.4714045207910321,
 -0.7071067811865471,
 1.4142135623730943,
 -0.7071067811865471,
 1.4142135623730943,
 0.0,
 0.4714045207910321,
 -0.4714045207910321,
 0.9428090415820642,
 3.0641293851417055,
 4.949747468305832,
 7.542472332656508,
 5.892556509887896]

### P - Value Calculations

In [4]:
pVal = []
n1 = n -1
for i1 in Tval:
    if i1 < 0:
        pVal.append(2 * stats.t.cdf(i1, n1))
    else:
        pVal.append(2 * stats.t.cdf(-(i1), n1))
  

### P- Values for all 20 days

In [5]:
day = []
for c,i in enumerate(pVal,0):
    day.append(c)
    print(c,":", i)

0 : 0.48284957070830226
1 : 0.8146461024627443
2 : 0.24428433153451604
3 : 0.6394441249021059
4 : 0.8146461024627435
5 : 0.8146461024627443
6 : 0.6394441249021059
7 : 0.6394441249021059
8 : 0.48284957070830226
9 : 0.16362201811838478
10 : 0.48284957070830226
11 : 0.16362201811838478
12 : 1.0
13 : 0.6394441249021059
14 : 0.6394441249021059
15 : 0.3504040460736161
16 : 0.003543589847173727
17 : 9.189352704609621e-06
18 : 9.626040426487692e-10
19 : 3.4248372786078755e-07


## From above values we can conclude that we don't have to take any measures on <u>days 1 to 16</u> as <i> p > alpha(0.05).</i> But, clearly from <u>days 17 to 20</u> its not normal observation as <i>p < 0.05 </i>

In [6]:
Hp = {'Day': day,
         'Mean Call duration': m,
          'T-Value': Tval,
         'P-Value':pVal}
df_HP = pd.DataFrame(Hp, columns = ['Day', 'Mean Call duration', 'T-Value', 'P-Value'])
df_HP

Unnamed: 0,Day,Mean Call duration,T-Value,P-Value
0,0,3.7,-0.707107,0.4828496
1,1,4.1,0.235702,0.8146461
2,2,3.5,-1.178511,0.2442843
3,3,4.2,0.471405,0.6394441
4,4,3.9,-0.235702,0.8146461
5,5,4.1,0.235702,0.8146461
6,6,4.2,0.471405,0.6394441
7,7,3.8,-0.471405,0.6394441
8,8,3.7,-0.707107,0.4828496
9,9,4.6,1.414214,0.163622


In [7]:
#df_HP = df_HP.set_index(df_HP['Day'])
#sns.distplot(df_HP['P-Value'])

#### Using scipy method

In [8]:
(stats.ttest_1samp(dist,4)[1])/2 # always gives us 2 tail test , 0: gives t value, 1: gives p value

NameError: name 'dist' is not defined