<h1 style='font-size: 25px; color: crimson; font-family: Colonna MT; font-weight: 600; text-align: center'>Independent (Two-Sample) T-Test</h1>

---

The independent two-sample t-test compares the means of two independent groups to see if they are significantly different. This test is commonly used when you are comparing two distinct groups that have no relationship, such as two different treatment groups in an experiment or two separate populations. 

**Example**: <span style='color: green'>*A researcher might want to compare the average peformance in growth and yield of wheat grown with organic fertilizer versus synthetic fertilizer. The two samples (wheat growth and yields with organic fertilizer and synthetic fertilizer) are independent of each other, and the test would assess whether the two fertilizers produce significantly different yields.*</span>


**Assumptions for the independent t-test:**
    
1. The data in both groups should be independently sampled.

2. Each group should ideally follow a normal distribution.

3. The variances of the two groups should be equal (homogeneity of variance). If this assumption is violated, alternative tests such as Welch’s t-test may be used.


**Disclaimer**: <span style='color: red; font-weight: 600;'>*This test was performed under assumptions that data adhere all requirements for parametric test*</span>


<h4 style='font-size: 18px; color: blue; font-family: Colonna MT; font-weight: 600'>1.0: Import required libraries</h4>

In [53]:
from scipy.stats import ttest_ind
from itertools import combinations
import pandas as pd

In [54]:
# Importing Datasets
pd.set_option('display.max_columns', 8)
filepath = "Datasets/Fertilizer and Light Exposure Experiment Dataset.csv"
df = pd.read_csv(filepath)
display(df)

Unnamed: 0,Fertilizer,Light Exposure,Plant Height (cm),Leaf Area (cm²),...,Biomass (g),Flower Count (number),Seed Yield (g),Stomatal Conductance (mmol/m²/s)
0,Control,Full Sun,58.56,185.74,...,11.99,19.54,6.69,242.41
1,Organic,Full Shade,46.70,138.80,...,8.67,15.37,6.17,233.66
2,Control,Partial Shade,58.33,203.84,...,9.50,16.39,5.41,230.07
3,Control,Full Shade,42.73,140.47,...,10.35,12.45,4.26,154.25
4,Organic,Full Shade,41.82,129.78,...,10.55,15.14,4.64,200.54
...,...,...,...,...,...,...,...,...,...
115,Synthetic,Partial Shade,65.24,228.35,...,10.94,21.14,7.48,254.78
116,Organic,Partial Shade,63.56,179.53,...,10.47,16.11,6.17,234.22
117,Control,Partial Shade,62.75,180.21,...,12.41,17.99,6.18,278.97
118,Control,Full Shade,39.60,144.32,...,9.15,13.72,4.46,186.87


<h4 style='font-size: 18px; color: blue; font-family: Colonna MT; font-weight: 600'>2.0: Perform Basic T-Test</h4>

In [73]:
group_column = 'Fertilizer'
group1 = 'Organic'
group2 = 'Synthetic'
Variable = 'Plant Height (cm)'

group1_data = df[df[group_column] == group1][Variable]
group2_data = df[df[group_column] == group2][Variable]
t_stat, p_value = ttest_ind(group1_data, group2_data, equal_var=False)
print(f"{Variable}: Test Statistic: {'-'*30} {t_stat:.2f}\n{Variable}: P-values : {'-'*35} {p_value:.4f}")

Plant Height (cm): Test Statistic: ------------------------------ 1.04
Plant Height (cm): P-values : ----------------------------------- 0.3005


In [77]:
def ttest(df, Variable, i, group1, group2):
    group1_data = df[df[group_column] == group1][Variable]
    group2_data = df[df[group_column] == group2][Variable]
    t_stat, p_value = ttest_ind(group1_data, group2_data, equal_var=False)
    print(f"{i} {Variable}: Test Statistic: {'-'*30} {t_stat:.2f}\n{i} {Variable}: P-values : {'-'*35} {p_value:.4f}")  


group1, group2 = 'Organic', 'Synthetic'
Variables = ['Plant Height (cm)', 'Leaf Area (cm²)', 'Chlorophyll Content (SPAD units)']
for i, Variable in enumerate(Variables):
    ttest(df, Variable, i, group1, group2)

0 Plant Height (cm): Test Statistic: ------------------------------ 1.04
0 Plant Height (cm): P-values : ----------------------------------- 0.3005
1 Leaf Area (cm²): Test Statistic: ------------------------------ 0.85
1 Leaf Area (cm²): P-values : ----------------------------------- 0.3981
2 Chlorophyll Content (SPAD units): Test Statistic: ------------------------------ 1.34
2 Chlorophyll Content (SPAD units): P-values : ----------------------------------- 0.1850


<h4 style='font-size: 18px; color: blue; font-family: Colonna MT; font-weight: 600'>2.0: Automate Test over Multiple Variables</h4>

In [64]:
def Independent_ttest(df, group_column, Variables):
    unique_groups = df[group_column].unique()
    group_combinations = list(combinations(unique_groups, 2))
    results = []
    for column in Variables:
        for group1, group2 in group_combinations:
            group1_data = df[df[group_column] == group1][column]
            group2_data = df[df[group_column] == group2][column]
            t_stat, p_value = ttest_ind(group1_data, group2_data, equal_var=False)
            
            results.append({
                'Group': group_column,
                'Parameter': column,
                'Group 1': group1,
                'Group 2': group2,
                'T-Statistic': t_stat,
                'P-Value': p_value,
                'Interpretation': 'Significant' if p_value < 0.05 else 'Not Significant'
            })
        
    results_df = pd.DataFrame(results)
    return results_df

group_col = 'Fertilizer'
Variables = ['Plant Height (cm)', 'Leaf Area (cm²)', 'Chlorophyll Content (SPAD units)']
Results = Independent_ttest(df, group_column=group_col, Variables=Variables)
display(Results)

Unnamed: 0,Group,Parameter,Group 1,Group 2,T-Statistic,P-Value,Interpretation
0,Fertilizer,Plant Height (cm),Control,Organic,-3.54,0.0,Significant
1,Fertilizer,Plant Height (cm),Control,Synthetic,-2.52,0.01,Significant
2,Fertilizer,Plant Height (cm),Organic,Synthetic,1.04,0.3,Not Significant
3,Fertilizer,Leaf Area (cm²),Control,Organic,-2.81,0.01,Significant
4,Fertilizer,Leaf Area (cm²),Control,Synthetic,-1.81,0.08,Not Significant
5,Fertilizer,Leaf Area (cm²),Organic,Synthetic,0.85,0.4,Not Significant
6,Fertilizer,Chlorophyll Content (SPAD units),Control,Organic,-2.44,0.02,Significant
7,Fertilizer,Chlorophyll Content (SPAD units),Control,Synthetic,-0.91,0.36,Not Significant
8,Fertilizer,Chlorophyll Content (SPAD units),Organic,Synthetic,1.34,0.18,Not Significant


<h4 style='font-size: 18px; color: blue; font-family: Colonna MT; font-weight: 600'>1.0: Automate Test over Multiple Categories and Variables</h4>

In [65]:
def Independent_ttest(df, group_cols, Variables):
    results = []
    for category in group_cols:
        unique_groups = df[category].unique()
        group_combinations = list(combinations(unique_groups, 2))
        
        for column in Variables:
            for group1, group2 in group_combinations:
                group1_data = df[df[category] == group1][column]
                group2_data = df[df[category] == group2][column]
                t_stat, p_value = ttest_ind(group1_data, group2_data, equal_var=False)
                
                results.append({
                    'Group': category,
                    'Parameter': column,
                    'Group 1': group1,
                    'Group 2': group2,
                    'T-Statistic': t_stat,
                    'P-Value': p_value,
                    'Interpretation': 'Significant' if p_value < 0.05 else 'Not Significant'
                })
        
    results_df = pd.DataFrame(results)
    return results_df

group_col = ['Fertilizer', 'Light Exposure']
Variables = ['Plant Height (cm)', 'Leaf Area (cm²)', 'Chlorophyll Content (SPAD units)']
Results = Independent_ttest(df, group_cols=group_col, Variables=Variables)
display(Results)

Unnamed: 0,Group,Parameter,Group 1,Group 2,T-Statistic,P-Value,Interpretation
0,Fertilizer,Plant Height (cm),Control,Organic,-3.54,0.0,Significant
1,Fertilizer,Plant Height (cm),Control,Synthetic,-2.52,0.01,Significant
2,Fertilizer,Plant Height (cm),Organic,Synthetic,1.04,0.3,Not Significant
3,Fertilizer,Leaf Area (cm²),Control,Organic,-2.81,0.01,Significant
4,Fertilizer,Leaf Area (cm²),Control,Synthetic,-1.81,0.08,Not Significant
5,Fertilizer,Leaf Area (cm²),Organic,Synthetic,0.85,0.4,Not Significant
6,Fertilizer,Chlorophyll Content (SPAD units),Control,Organic,-2.44,0.02,Significant
7,Fertilizer,Chlorophyll Content (SPAD units),Control,Synthetic,-0.91,0.36,Not Significant
8,Fertilizer,Chlorophyll Content (SPAD units),Organic,Synthetic,1.34,0.18,Not Significant
9,Light Exposure,Plant Height (cm),Full Sun,Full Shade,14.74,0.0,Significant


---

This analysis was performed by **Jabulente**, a passionate and dedicated data scientist with a strong commitment to using data to drive meaningful insights and solutions. For inquiries, collaborations, or further discussions, please feel free to reach out via.  

---

<div align="center">  
    
[![GitHub](https://img.shields.io/badge/GitHub-Jabulente-black?logo=github)](https://github.com/Jabulente)  [![LinkedIn](https://img.shields.io/badge/LinkedIn-Jabulente-blue?logo=linkedin)](https://linkedin.com/in/jabulente-208019349)  [![Email](https://img.shields.io/badge/Email-jabulente@hotmail.com-red?logo=gmail)](mailto:Jabulente@hotmail.com)  

</div>

<h1 style='font-size: 35px; color: Tomato; font-family: Colonna MT; font-weight: 700; text-align: center'>THE END</h1>