# Hypothesis Testing (T - Test)

## Example : <br>Process Control at the call center
### Performance of a call center is montored by the average call duration.<br> Data from the 18 months shows that on the day when process runs normally, <br> 𝜇 = <i>4 minutes</i>, σ = <i>3 minutes</i> <br>Cannot monitor each and every call due to limited resources so radomly sapled 50 call per day.<br> Hence, n = <i>50 calls per day</i>.<br><br>We already know sample mean everyday will be different - <i>Inherent invariability</i>.<br>But, when should you be alarmed and conclude that system is not behaving normally - <i>External invariability</i>.<br><br>Pragmatic ApprochL system behaves normally when 𝜇 = <i>4 minutes</i><br>So, we should look for deviation on either side of 𝜇.<br>
| Day | Mean Call Duration |
| :- | :-: |
| 1 | 3.7 |
| 2 | 4.1 |
| 3 | 3.5 |
| 4 | 4.2 |
| 5 | 3.9 |
| 6 | 4.1 |
| 7 | 4.2 |
| 8 | 3.8 |
| 9 | 3.7 |
| 10 | 4.6 |
| 11 | 3.7 |
| 12 | 4.6 |
| 13 | 4.0 |
| 14 | 4.2 |
| 15 | 3.8 |
| 16 | 4.4 |
| 17 | 5.3 |
| 18 | 6.1 |
| 19 | 7.2 |
| 20 | 6.5 |


In [106]:
from scipy import stats
import pandas as pd
import numpy as np
import seaborn as sns

Formula : 2 * stats.t.cdf(t-test(x-u/s/sqrt(n)) , n-1) # n = sample size DF = n-1

In [121]:
data = pd.Series([3.7, 4.1, 3.5, 4.2, 3.9, 4.1, 4.2, 3.8, 3.7, 4.6, 3.7, 4.6, 4.0, 4.2, 3.8, 4.4, 5.3, 6.1, 7.2, 6.5])
mean = 4
n = 50
std = 3


### T- Value Calculations

In [122]:
# 𝜇 = 4
# n = 50 samples per day
# σ = 3
# T value = (x̄-𝜇) / (σ/√n)

Tval = []
for i in data:
    Tval.append((i - mean)/(std/np.sqrt(n)))
Tval

[-0.7071067811865471,
 0.235702260395515,
 -1.1785113019775793,
 0.4714045207910321,
 -0.23570226039551606,
 0.235702260395515,
 0.4714045207910321,
 -0.4714045207910321,
 -0.7071067811865471,
 1.4142135623730943,
 -0.7071067811865471,
 1.4142135623730943,
 0.0,
 0.4714045207910321,
 -0.4714045207910321,
 0.9428090415820642,
 3.0641293851417055,
 4.949747468305832,
 7.542472332656508,
 5.892556509887896]

### P - Value Calculations

In [123]:
pVal = []
n1 = n -1
for i1 in val:
    if i1 < 0:
        pVal.append(2 * stats.t.cdf(i1, n1))
    else:
        pVal.append(2 * stats.t.cdf(-(i1), n1))
  

#### P- Values for all 20 days

In [124]:
for c,i in enumerate(pVal,1):
    print(c,":", i)

1 : 0.482853747118985
2 : 0.8146478467018572
3 : 0.24428879234928788
4 : 0.6394473297757453
5 : 0.8146478467018564
6 : 0.8146478467018572
7 : 0.6394473297757453
8 : 0.6394473297757453
9 : 0.482853747118985
10 : 0.16362597826984224
11 : 0.482853747118985
12 : 0.16362597826984224
13 : 1.0
14 : 0.6394473297757453
15 : 0.6394473297757453
16 : 0.35040862513981663
17 : 0.003543881264713143
18 : 9.190845091379827e-06
19 : 9.62851076804498e-10
20 : 3.4255224123818994e-07


### From above values we can conclude that we don't have to take any measures on <u>days 1 to 16</u> as <i> p > alpha(0.05)</i> <br> But clearly from <u>days 17 to 20</u> its not normal observation as <i>p < 0.05 </i>

#### Using scipy method

In [115]:
(stats.ttest_1samp(dist,4)[1])/2 # always gives us 2 tail test , 0: gives t value, 1: gives p value

0.024036076761831338