# T-test (Parametric Statistical Test)
### One Sample T-Test in Python

>The one-sample t-test is a statistical hypothesis test that can be used to see if the mean of an unknown population differs from a given or known value.

#### Describe Hypothesis for t Test

>null hypothesis: the mean of the areas is 5000.\
>alternative hypothesis: the mean of the areas is not  5000.

In [2]:
import scipy.stats as stats
import pandas as pd
  
# loading the csv file
data = pd.read_csv('./data/areas.csv')

t_stat, p = stats.ttest_1samp(a=data, popmean=5000)
print(p)
if p > 0.05:
    print("As the p_value for the given problem is more than 0.05 which is the alpha value, we accept the null hypothesis")
else:
     print("As the p_value for the given problem is not more than 0.05 which is the alpha value, we reject the null hypothesis")

[0.44457628]
As the p_value for the given problem is more than 0.05 which is the alpha value, we accept the null hypothesis


### Two Sample Paired t-Test (Parametric Statistical Test)

Assumption of Paired Sample T-test

In order for the paired sample t-test results to be trusted, the following assumptions need to be met:

- The dependent variable (DV) must be continuous which is measured on an interval or ratio scale
- The observations are independent
- The DV should be approximately normally distributed
- The paired sample t-test is robust to this violation. If there is a violation of normality, as long as it’s not in a major violation the test results can be considered valid
- The DV should not contain any significant outliers

The hypothesis being test is:

- Null hypothesis (H0): ud = 0, which translates to the mean difference between sample 1 and sample 2 is equal to 0.
- Alternative hypothesis (HA): ud ≠ 0, which translates to the mean difference between sample 1 and sample 2 is not equal to 0.
  
If the p-value is less than what is tested at, most commonly 0.05, one can reject the null hypothesis.

>Case Statemnet:
>
>Let us consider that we want to know whether an engine oil significantly impacts the car’s mileage of different brands. In order to test this, we have 10 cars in a garage doped with original engine oil initially. We have noted their mileage for 100 kilometers each. Then, we have each of the cars doped with another engine oil (different from the original one). Then, the mileage of the cars is calculated for 100 kilometers each. To compare the difference between the mean mileage of the first and second test, we use a paired samples t-test because for each car their first test score can be paired with their second test score. Conducting paired sample T-test is a step-by-step process.

We need two arrays to hold pre and post-mileage of the cars.

In [6]:
import scipy.stats as stats
import pandas as pd
import numpy as np

# pre holds the mileage before applying
# the different engine oil
pre = [88, 82, 84, 93, 75, 78, 84, 87,
       95, 91, 83, 89, 77, 68, 91]
  
# post holds the mileage before applying 
# the different engine oil
post = [91, 84, 88, 90, 79, 80, 88, 90, 
        90, 96, 88, 89, 81, 74, 92]

# Performing the paired sample t-test
t_stat, p = stats.ttest_rel(pre, post)
if p > 0.05:
    print("As two sided p_value for the given problem is more than 0.05, we accept the null hypothesis which is diff b/w mean of samples is 0")
else:
     print("As two sided p_value for the given problem is not more than 0.05, we reject the null hypothesis which is diff b/w mean of samples is not 0")


As two sided p_value for the given problem is not more than 0.05, we reject the null hypothesis which is diff b/w mean of samples is not 0


>Analyzing the output.
>
>The paired samples t-test follows the null and alternative hypotheses:
>
>- H0: It signifies that the mean pre-test and post-test scores are equal
>- HA: It signifies that the mean pre-test and post-test scores are not equal
>
>As the p-value comes out to be equal to 0.010 which is less than 0.05 hence we reject the null hypothesis. So, we have enough proof to claim that the true mean test score is different for cars before and after applying the different engine oil.

### Two Sample UnPaired t-Test (Parametric Statistical Test)

The two hypotheses for this particular two sample Unpaired t-test are as follows:

- H0: µ1 = µ2 (the two population means are equal)

- HA: µ1 ≠µ2 (the two population means are not equal)

>NOTE:\
>As a rule of thumb, we can assume the populations have equal variances if the ratio of the larger sample variance to the smaller sample variance is less than 4:1. 

> Case Statement:\
>Researchers want to know whether or not two different species of plants have the same mean height. To test this, they collect a simple random sample of 20 plants from each species.
>
>Use the following steps to conduct a two sample t-test to determine if the two species of plants have the same height.

In [8]:
import numpy as np
import scipy.stats as stats

group1 = np.array([14, 15, 15, 16, 13, 8, 14, 17, 16, 14, 19, 20, 21, 15, 15, 16, 16, 13, 14, 12])
group2 = np.array([15, 17, 14, 17, 14, 8, 12, 19, 19, 14, 17, 22, 24, 16, 13, 16, 13, 18, 15, 13])

#find variance for each group
g1_var, g2_var = round(np.var(group1),2), round(np.var(group2),2)
var_ratio=g2_var/g1_var
if var_ratio < 4:
    print("by the thumb rule, We assumed that the variances are equal")
else:
     print("by the thumb rule, We assumed that the variances are not equal")


by the thumb rule, We assumed that the variances are equal


In [10]:
#perform two sample t-test with equal variances
stat, p = stats.ttest_ind(a=group1, b=group2, equal_var=True)
def get_significance(p):
    """Returns the significance of a p-values as a string of stars."""
    if p <= 0.001:
        return '***'
    elif p <= 0.01:
        return '**'
    elif p <= 0.05:
        return '*'
    elif p <= 0.1:
        return '.'
    else:
        return 'No Significance'


def round_p_value(p):
    """Round a small p-value so that it is human-readable."""
    if p < 0.001:
        return '<0.001'
    else:
        return f'{p:5.3}'


p_rounded = round_p_value(p)
significance = get_significance(p)
print(f'The p-value is {p_rounded} ({significance})')

The p-value is  0.53 (No Significance)


>p-value of our test (0.53005) is greater than alpha = 0.05, we fail to reject the null hypothesis of the test. We do not have sufficient evidence to say that the mean height of plants between the two populations is different.

When we want to compare the means of two independent groups, we can choose between using two different tests:

- Student’s t-test: this test assumes that both groups of data are sampled from populations that follow a normal distribution and that both populations have the same variance.

- Welch’s t-test: this test assumes that both groups of data are sampled from populations that follow a normal distribution, but it does not assume that those two populations have the same variance.