## t-test

#### What is T-Test?
The t-test is named after William Sealy Gosset’s Student’s t-distribution, created while he was writing under the pen name “Student.”

A t-test is a type of inferential statistic test used to determine if there is a significant difference between the means of two groups. It is often used when data is normally distributed and population variance is unknown.

The t-test is used in hypothesis testing to assess whether the observed difference between the means of the two groups is statistically significant or just due to random variation.

![image.png](attachment:b4cb3942-4177-465b-b082-78dd6cf4ec48.png)

#### Assumptions in T-test
**Independence**: The observations within each group must be independent of each other. This means that the value of one observation should not influence the value of another observation. Violations of independence can occur with repeated measures, paired data, or clustered data.<br>
**Normality**: The data within each group should be approximately normally distributed i.e the distribution of the data within each group being compared should resemble a normal (bell-shaped) distribution. This assumption is crucial for small sample sizes (n < 30).<br>
**Homogeneity of Variances (for independent samples t-test)**: The variances of the two groups being compared should be equal. This assumption ensures that the groups have a similar spread of values. Unequal variances can affect the standard error of the difference between means and, consequently, the t-statistic.<br>
**Absence of Outliers**: There should be no extreme outliers in the data as outliers can disproportionately influence the results, especially when sample sizes are small.<br>

**Example Problem**
The weights of 25 obese people were taken before enrolling them into the nutrition camp. The population mean weight is found to be 45 kg before starting the camp. After finishing the camp, for the same 25 people, the sample mean was found to be 75 with a standard deviation of 25. Did the fitness camp work?

In [1]:
import scipy.stats as stats
import numpy as np

In [2]:
population_mean = 45
sample_mean = 75
sample_std = 25
sample_size = 25
alpha = 0.05
df = sample_size - 1

In [3]:
t_test = (sample_mean-population_mean)/(sample_std/np.sqrt(sample_size))

In [4]:
t_test

6.0

In [5]:
t_critical = stats.t.ppf(1-alpha,df)

In [6]:
t_critical

1.7108820799094275

In [7]:
if t_test > t_critical:
    print("The fitness camp had a effect")
else:
    print("The fitness camp did not have a effect")

The fitness camp had a effect


In [8]:
p_value = 1 - stats.t.cdf(t_test,df)

In [9]:
p_value

1.703654035845048e-06

In [10]:
if p_value < alpha:
    print("The fitness camp had a effect")
else:
    print("The fitness camp did not have a effect")

The fitness camp had a effect
