<h2 style='font-size: 45px; color: green; font-family: Dubai; font-weight: 600'>T-Tests for Pairwise Comparison</h2>

A t-test is a statistical method used to determine whether there is a significant difference between the means of two groups. It is commonly applied in hypothesis testing when comparing the means of two datasets, particularly when the data is approximately normally distributed and the sample sizes are small. The t-test evaluates whether the observed difference in means is due to random variation or a true effect.

<span style='font-size: 35px; color: crimson; font-family: Dubai; font-weight: 600'>Types of T-Tests</span>

1. ***Independent Samples T-Test***  
   - ***Purpose:*** Compares the means of two independent groups (e.g., comparing test scores of two different classes).  
   - ***When to Use:*** When the groups are unrelated, and the observations in one group do not influence those in the other.  
   - ***Assumptions:***  
     - The data is normally distributed.  
     - Homogeneity of variances (equal variance across groups).  
     - Observations are independent.  

2. ***Paired Samples T-Test*** (also known as Dependent or Repeated Measures T-Test)  
   - ***Purpose:*** Compares the means of two related groups (e.g., before-and-after measurements in a single group).  
   - ***When to Use:*** When the data comes from the same subjects measured at two different times or under two different conditions.  
   - ***Assumptions:***  
     - The differences between paired observations are normally distributed.  
     - The observations within each group are dependent.  

3. ***One-Sample T-Test***  
   - ***Purpose:*** Compares the mean of a single sample to a known value or population mean (e.g., testing if the average test score in a class differs from a national average).  
   - ***When to Use:*** When you want to test if the sample mean significantly deviates from a specified value.  
   - ***Assumptions:***  
     - The sample is randomly drawn.  
     - The data is approximately normally distributed.  

<span style='font-size: 35px; color: crimson; font-family: Dubai; font-weight: 600'>When to Use a T-Test</span>
 
- Use a t-test when comparing means and the sample size is relatively small (<30).  
- Ensure assumptions such as normality, independence, and equal variance (for independent t-tests) are met.  
- For larger datasets or non-normal data, consider alternative methods like the Mann-Whitney U test (for independent groups) or Wilcoxon signed-rank test (for paired groups).  

### ***Import Required Libraries***

In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind, ttest_rel, ttest_1samp

print('Libraries loaded Succesfully')

Libraries loaded Succesfully


In [7]:
df = pd.read_csv('Datasets/Hypothesis 101.csv')
df.head()

Unnamed: 0,Fertilizer,Yield (tones/ha),Days to Maturity,Biomass,Dry matter,Irrigation
0,C,38.87497,6.26185,2.525738,75.436679,Furrow Irrigation
1,A,29.921425,7.194156,2.594134,51.828069,Sprinkler Irrigation
2,C,41.152436,7.329974,3.76043,49.058974,Drip Irrigation
3,C,42.161544,7.137822,3.340263,46.778604,Furrow Irrigation
4,A,36.715841,6.61532,3.701663,57.817993,Furrow Irrigation


<span style='font-size: 20px; color: crimson; font-family: Dubai; font-weight: 600'>Independent samples data</span>


In [11]:
group1 = np.random.normal(50, 10, 30)  # Mean=50, StdDev=10, Size=30
group2 = np.random.normal(55, 10, 30)  # Mean=55, StdDev=10, Size=30

# 1. Independent Samples T-Test
ind_t_stat, ind_p_value = ttest_ind(group1, group2)
print("Independent Samples T-Test")
print(f"t-statistic: {ind_t_stat:.3f}, p-value: {ind_p_value:.3f}\n")

Independent Samples T-Test
t-statistic: -2.736, p-value: 0.008



<span style='font-size: 20px; color: crimson; font-family: Dubai; font-weight: 600'>Paired samples data (e.g., before and after)</span>


In [12]:
before = np.random.normal(70, 5, 30)  # Mean=70, StdDev=5, Size=30
after = before + np.random.normal(-2, 1, 30)  # Slight decrease from "before"

# 2. Paired Samples T-Test
paired_t_stat, paired_p_value = ttest_rel(before, after)
print("Paired Samples T-Test")
print(f"t-statistic: {paired_t_stat:.3f}, p-value: {paired_p_value:.3f}\n")

Paired Samples T-Test
t-statistic: 15.030, p-value: 0.000



<span style='font-size: 20px; color: crimson; font-family: Dubai; font-weight: 600'>One-sample data</span>

In [15]:
# One-sample data
sample = np.random.normal(100, 15, 30)  # Mean=100, StdDev=15, Size=30
population_mean = 105  # Known or expected population mean


# 3. One-Sample T-Test
one_sample_t_stat, one_sample_p_value = ttest_1samp(sample, population_mean)
print("One-Sample T-Test")
print(f"t-statistic: {one_sample_t_stat:.3f}, p-value: {one_sample_p_value:.3f}\n")

One-Sample T-Test
t-statistic: -2.063, p-value: 0.048



In [17]:
results = {
    "Test Type": ["Independent Samples", "Paired Samples", "One Sample"],
    "t-Statistic": [ind_t_stat, paired_t_stat, one_sample_t_stat],
    "p-Value": [ind_p_value, paired_p_value, one_sample_p_value],
    "Interpretation": [
        "Significant difference" if ind_p_value < 0.05 else "No significant difference",
        "Significant difference" if paired_p_value < 0.05 else "No significant difference",
        "Significant difference" if one_sample_p_value < 0.05 else "No significant difference",
    ],
}

df_results = pd.DataFrame(results)
df_results

Unnamed: 0,Test Type,t-Statistic,p-Value,Interpretation
0,Independent Samples,-2.736171,0.008235422,Significant difference
1,Paired Samples,15.03007,3.189897e-15,Significant difference
2,One Sample,-2.062856,0.04818223,Significant difference


<span style='font-size: 20px; color: crimson; font-family: Dubai; font-weight: 600'>iterative_t_tests</span>

In [9]:
from itertools import combinations

def iterative_t_tests(df, group_column, value_column):
    unique_groups = df[group_column].unique()
    group_combinations = list(combinations(unique_groups, 2))

    
    results = []
    
    for group1, group2 in group_combinations:
        # Filter data for the two groups
        group1_data = df[df[group_column] == group1][value_column]
        group2_data = df[df[group_column] == group2][value_column]
        
        # Perform t-test
        t_stat, p_value = ttest_ind(group1_data, group2_data, equal_var=False)
        
        # Append results
        results.append({
            'Group 1': group1,
            'Group 2': group2,
            'T-Statistic': t_stat,
            'P-Value': p_value
        })
    
    # Convert results to a DataFrame
    results_df = pd.DataFrame(results)
    return results_df

# Example usage
t_test_results = iterative_t_tests(df, group_column='Fertilizer', value_column='Yield (tones/ha)')
t_test_results

Unnamed: 0,Group 1,Group 2,T-Statistic,P-Value
0,C,A,16.204128,1.7263679999999997e-38
1,C,B,6.66334,2.906105e-10
2,A,B,-7.941152,1.983441e-13
