# t-test

## Definition
#### A t-test is a type of inferential statistic test used to determine if there is a significant difference between the means of two groups. It is often used when data is normally distributed and population variance is unknown. The t-test is used in hypothesis testing to assess whether the observed difference between the means of the two groups is statistically significant or just due to random variation.
#### It is employed in statistical inference, especially when there is a **limited** sample size or when the population standard deviation is **unknown**.

### Key terms in t-Test
The most used key terms in T-test are as follows:

#### **T-statistic**: The t-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score
If the t-value is large => the two groups belong to different groups. 
If the t-value is small => the two groups belong to the same group.
#### **T-Distribution**: The t-distribution, commonly known as the Student’s t-distribution, is a probability distribution with tails that are thicker than those of the normal distribution. It is employed in statistical inference when working with small sample sizes and population standard deviations are unknown. The t-distribution gets closer to the normal distribution as the sample size rises.  It plays a crucial role in hypothesis testing and estimating population parameters with limited data.
#### **Degree of freedom (df)**: The degree of freedom represents the number of values in a calculation that is free to vary. The degree of freedom (df)  tells us the number of independent variables used for calculating the estimate between 2 sample groups.
In a t-test, the degree of freedom is calculated as the total sample size minus 1 i.e
**df= (n-1)**     , where “n” is the number of observations in the sample. 

Suppose, we have 2 samples A and B. The df would be calculated as **df = (nA-1) + (nB -1)**
#### **Significance level (α)**: It is the probability of rejecting the null hypothesis when it is true. In simpler terms, it tells us about the percentage of risk involved in saying that a difference exists between two groups when in reality it does not.

Types of t-tests
There are three types of t-tests, and they are categorized as dependent and independent t-tests.

1. **One sample t-test**: Test the mean of a single group against a known mean.
2. **Independent samples t-test**: compares the means for two groups.
3. **Paired sample t-test**: compares means from the same group at different times (say, one year apart).


# 1. One Sample t-test

One sample t-test is one of the widely used t-tests for comparison of the sample mean of the data to a particularly given value. Used for comparing the sample mean to the true/population mean.

Used when:
1. the sample size is small. (under 30) data is collected randomly. 
2. data is approximately normally distributed.

#### formula

t = (x_bar - μ)/(σ/sqrt(n))

where t = t-value

x_bar = sample mean

μ = true/population mean

σ = standard deviation

n = sample size

### Steps

***Step 1*** - Define the null (h0) and alternative (h1) hypothesis.

***Step 2*** - Calculate sample mean. (if not given) 
     [population mean, standard deviation, n is given]

***Step 3*** - Put the values found in Step 1 into above formula of One sample t-test and calculate t-value. (tcal)

***Step 4*** - Calculate degree of freedom (df). 

***Step 5*** - Take α = 0.05 if not given. Use the value of df and α and find ttable from above t-table 
        in one tailed.

***Step 6*** - Compare values of t found in Step-3 and Step-5

## Q 1. Ragini was playing cricket and she claims that she scored around  ____ runs per match. One of her fans are saying otherwise.
#### 1. State the Null and alternative hypothesis.
#### 2. Is her fan telling the truth where significance level is 0.05

In [1]:
import numpy as np
import scipy.stats as stats
from numpy.random import randn
import seaborn as sns 
from scipy.stats import norm
import scipy

### 1) Null_Hypothesis= Ho: u = 26.35
### 2) Alternate_Hypothesis = Hi: u! = 26.35 (Two_tailed_test)
### where, Ho = Null Hypothesis, Hi = Alternate Hypothesis and u = mean

In [4]:
# Population Data
my_cricket_score=[22,38,19,15,48,11,10,49,47,38,10,25,46,10,21,24,29,36,25,25,30,15,7,40,33,24,11,30]

In [9]:
# Population size
len(my_cricket_score)

28

In [19]:
# Population Mean
population_mean = np.mean(my_cricket_score)
population_mean

26.357142857142858

In [20]:
# Population standard deviation
pop_std = np.std(my_cricket_score)

In [21]:
# Sample Data
sample_size = 15
sample_score = np.random.choice(my_cricket_score,sample_size)
sample_score

array([21, 15, 24, 48, 25, 40, 38, 24, 10, 29, 30, 11, 29, 24, 30])

In [35]:
# Sample Mean
sample_mean = np.mean(sample_score)
sample_mean

26.533333333333335

### t statistics

In [36]:
# Calculate the t-statistic
t_stats = (sample_mean-population_mean)/((pop_std)/(np.sqrt(sample_size)))
t_stats

0.054150368261580935

### Degree of Freedom

In [26]:
df = sample_size-1
df

14

In [49]:
# Significance level
significance_value = 0.05
alpha = (significance_value/2) # 2 tailed test

### t critical

In [50]:
confidence_interval = 0.95
t_critical = stats.t.ppf(1-alpha,df)
t_critical

2.1447866879169273

### Upper and Lower limits forming Decision Boundary

In [38]:
t_crit_upper = +(t_critical)
t_crit_lower = -(t_critical)
print(t_crit_lower)
print(t_crit_upper)

-2.1447866879169273
2.1447866879169273


## Conclusion

### Conclusion based on t value

In [None]:
if t_crit_lower>t_stats or t_stats>t_crit_upper:
    print("reject the null hyphothesis")
else:
    print("fail to reject the null hyphothesis")

Since the t_stats falls between the lower and upper limits of t_critical, we fail to reject Null hypothesis. This means that Ragini was right.

### p Value

In [37]:
p_value = 1 - stats.t.cdf(t_stats, df)
p_value

0.4787902556449969

In [47]:
p_value=norm.sf(abs(t_stats))
p_value

0.4784076815070038

In [52]:
# for two tailed test
p_val_2_tail = p_value*2
p_val_2_tail

0.9568153630140076

### Conclusion based on p Value

1) p_value>Significance value= We fail to reject the hypothesis
2) p_value < Significance value = We reject Null Hypothesis

In [51]:
if p_val_2_tail<significance_value:
    print("reject the null hyphothesis")
    
else:
    print("fail to reject the null hyphothesis")

fail to reject the null hyphothesis


##  Other Way to Perform One Sample t-test

In [33]:
t_val, p_val = scipy.stats.ttest_1samp(sample_score,population_mean)
t_val, p_val

(0.06571158681525265, 0.9485365883215872)

#

## 1. Independent Samples t-test

https://www.geeksforgeeks.org/t-test/