# T-Test
It is statistical test which is used to compare the means of two groups to determine whether estimation operation has an affect on the population of interest or its different from each other.
 - Also known as Student’s T-test. 
 - A small sample is generally regarded as one of size $n<30$. A t-test is necessary for small samples because their distributions are not normal.

## One Sample T-Test
This test describe whether sample mean and population means are different or not.
Which is given by following Equation:

$\LARGE t = \frac{\bar x - \mu}{s_{\bar x}}~~~~~~~~~~~~~~~$          where $\LARGE s_{\bar x} = \frac{s}{\sqrt n}$
 - $\large \mu$ - Proposed constant of the population mean
 - $\large \bar x $ - Sample mean
 - $\large n$ - Sample size
 - $\large s$ - Sample Standard deviation
 - $\large s_{\bar x}$ - Estimated standard Error of the mean
 
Note: P Low NULL GO  (if p_value is less than significance level reject Null hypothesis)

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import ttest_1samp

In [2]:
np.random.seed(42)
population_size = 50
population_marks = np.random.randint(low = 20, high = 100, size = population_size)
population_marks

array([71, 34, 91, 80, 40, 94, 94, 43, 22, 41, 72, 21, 49, 57, 21, 83, 79,
       40, 52, 95, 77, 41, 68, 78, 61, 79, 99, 34, 81, 81, 66, 81, 70, 74,
       83, 22, 70, 26, 40, 92, 58, 37, 23, 79, 33, 28, 72, 21, 79, 90])

In [3]:
population_mean = np.mean(population_marks)
print('Population mean:', population_mean)

Population mean: 60.44


In [4]:
np.random.seed(42)
sample_size = 20
sample_marks = np.random.choice(population_marks, sample_size)
sample_marks

array([40, 81, 21, 23, 43, 77, 40, 52, 68, 72, 72, 78, 22, 92, 78, 91, 41,
       34, 78, 79])

In [5]:
ttest_1sample, p_value = ttest_1samp(sample_marks, popmean = population_mean)
print('T-test statistics:', ttest_1sample)
print('P-value:', p_value)

T-test statistics: -0.2501082477102846
P-value: 0.8051880032951816


In [6]:
alpha =  0.05 # 5%  level of significance
if p_value < alpha:
    print('Reject Null hypothesis, sample mean and population means has significant difference.')
else:
    print('Accept Null hypothesis, sample mean and population means has no significant difference.')

Accept Null hypothesis, sample mean and population means has no significant difference.


## Two Sample t-test or Independent Sample t-test
This test compare the means of two independent groups to determine whether there is statistical evidence that the associated population means are signigicantly different or not.

Equation for this defined as :
$\LARGE t = \frac{\bar x_1 - \bar x_2}{\sqrt {s^2 \bigl( \frac{1}{n_1} + \frac{1}{n_2} \bigr)}}$


$\large s^2 = \LARGE \frac{\sum_{i=1}^{n_1}(x_i - \bar x_1)^2 + \sum_{j=1}^{n_2}(x_j - \bar x_2)^2}{n_1 + n_2 - 2}$

 - $ \bar x_1$ - Mean of first sample
 - $\bar x_2$ - Mean of second sample
 - $s^2$ - Standard Error of Samples
 - $n_1 and n_2$- number of samples 
 
<b>Note</b>: It is compulsary for Independent sample t-test that sampel should be distributed normally with
having equal homogeneity with its variance.

In [7]:
np.random.seed(42)
from scipy.stats import poisson
school_marks = poisson.rvs(loc = 18, mu = 35, size = 1500)
student_age = poisson.rvs(loc = 10, mu = 10, size = 60)

In [8]:
school_marks

array([51, 57, 46, ..., 59, 50, 55])

In [9]:
student_age

array([20, 13, 20, 24, 17, 22, 21, 16, 23, 22, 23, 22, 26, 14, 23, 23, 18,
       26, 21, 18, 22, 22, 18, 19, 18, 26, 19, 14, 16, 19, 17, 23, 23, 22,
       19, 18, 18, 17, 19, 16, 20, 20, 21, 16, 20, 18, 23, 19, 23, 21, 17,
       21, 20, 16, 19, 13, 22, 17, 17, 19])

In [10]:
student_age.mean()

19.65

In [11]:
from scipy.stats import ttest_ind
Ittest_statistics, p_value = ttest_ind(a = school_marks, b = student_age, equal_var = True)
print('Ittest_statistics:', Ittest_statistics)
print('P value:', p_value)

Ittest_statistics: 42.058523205930854
P value: 6.04814143830998e-259


In [12]:
alpha =  0.05 # 5%  level of significance
if p_value < alpha:
    print('Reject Null hypothesis, mean of two samples has significant difference.')
else:
    print('Accept Null hypothesis,  mean of two samples has no significant difference.')

Reject Null hypothesis, mean of two samples has significant difference.


## Paired T-test 

This test compare how the different samples from the same group are associated, Such kind of test is useful when dealing with attribute dependence on another attribute, pre and post analysis of same observation.

Paired T-test formul : $\LARGE t = \frac{\bar x_{pre} - \bar x_{post}}{s_{\bar x}}~~~~~~~~~~~~~~~$          where $\LARGE s_{\bar x} = \frac{s}{\sqrt n}$
 - $\large n$ - Sample size
 - $\large s$ - Sample Standard deviation
 - $\large s_{\bar x}$ - Estimated standard Error of the mean

In [13]:
test1 = sample_marks

In [14]:
np.random.seed(42)
test2 = np.random.randint(low = 33, high = 100,size = sample_size)
test2

array([84, 47, 93, 53, 56, 35, 54, 85, 34, 62, 70, 34, 96, 92, 53, 65, 90,
       54, 81, 91])

In [15]:
from scipy.stats import ttest_rel

In [16]:
tstatistics, p_value = ttest_rel(a = test1, b = test2)
print("Paired T test statistics:", tstatistics )
print('P value:', p_value)

Paired T test statistics: -0.9202544001495981
P value: 0.3689768842592085


In [17]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print("reject the null hypothesis")
else:
    print("accept the null hypothesis")

accept the null hypothesis
