<h1 style='font-size: 25px; color: crimson; font-family: Colonna MT; font-weight: 600; text-align: center'>Cohen’s (d)</h1>
<hr>


**Cohen’s d** is a widely used measure of **effect size** that quantifies the **difference between two group means** in terms of **standard deviation units**. It helps us understand how large or meaningful the difference is, beyond just knowing whether it’s statistically significant. Cohen's d is especially useful when comparing the means of two independent groups (such as treatment vs. control) and provides an intuitive sense of the magnitude of that difference.

The formula for Cohen’s d is:


<span style='font-size: 20px; color: crimson; font-weight: 600; padding-left: 100px'>$d = \frac{M_1 - M_2}{SD_{pooled}}$</span>




where $M_1$ and $M_2$ are the means of the two groups, and $SD_{pooled}$ is the pooled standard deviation, which combines variability from both groups.

Interpretation guidelines often follow Cohen’s original benchmarks:

* **Small effect**: $d = 0.2$
* **Medium effect**: $d = 0.5$
* **Large effect**: $d = 0.8$ or greater

**Example**: Suppose a researcher is studying the effect of a new fertilizer on crop yield. The average yield for the treated group is 60 kg, and for the control group, it is 50 kg. If the pooled standard deviation is 10 kg, then Cohen’s d = (60 - 50) / 10 = **1.0**, which indicates a **large effect** of the fertilizer.

Cohen’s d is especially valuable when communicating results to a non-technical audience because it gives a sense of *how much* change occurred in practical terms, not just whether the change is statistically reliable.



**"To implement Cohen's d in Python, I designed modular and reusable scripts that efficiently compute pairwise effect sizes between groups. This approach ensures scalability, flexibility, and ease of interpretation, ultimately helping to generate insightful and meaningful results from the data."**

<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>1.0. Import Required Libraries</h1>

In [1]:
import warnings
import pandas as pd
import numpy as np

warnings.simplefilter("ignore")
pd.set_option('display.max_columns', 10)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
print("....Libraries Loaded Successfully....")

....Libraries Loaded Successfully....


<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>2.0. Import and Preprocessing Dataset</h1>

In [2]:
filepath = 'Datasets/Fertilizer and Light Exposure Experiment Dataset.csv'
df = pd.read_csv(filepath)
df.sample(10)

Unnamed: 0,Fertilizer,Plant Height (cm),Leaf Area (cm²),Chlorophyll Content (SPAD units),Root Length (cm),Biomass (g),Seed Yield (g)
35,Orgarnic,46.6,138.18,28.43,18.22,10.08,4.09
1,Orgarnic,46.7,138.8,34.69,17.66,8.67,6.17
96,Synthetic + Organic,64.65,185.95,39.39,25.73,11.25,6.44
18,Orgarnic,44.32,126.33,27.94,16.16,10.02,4.06
23,Synthetic + Organic,67.96,200.59,43.53,24.21,14.64,7.93
55,Synthetic,59.43,184.08,53.49,23.06,12.29,6.24
86,Synthetic + Organic,63.26,174.56,36.44,23.86,10.3,4.94
63,Synthetic,66.18,229.27,45.0,29.32,17.3,9.49
46,Synthetic + Organic,65.49,186.33,39.11,26.23,12.85,7.3
73,Synthetic,69.08,230.95,54.3,29.89,17.58,7.95


<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>3.0.Dataset Column Profiling </h1>

In [3]:
def column_summary(df):
    summary_data = []
    
    for col_name in df.columns:
        col_dtype = df[col_name].dtype
        num_of_nulls = df[col_name].isnull().sum()
        num_of_non_nulls = df[col_name].notnull().sum()
        num_of_distinct_values = df[col_name].nunique()
        
        if num_of_distinct_values <= 10:
            distinct_values_counts = df[col_name].value_counts().to_dict()
        else:
            top_10_values_counts = df[col_name].value_counts().head(10).to_dict()
            distinct_values_counts = {k: v for k, v in sorted(top_10_values_counts.items(), key=lambda item: item[1], reverse=True)}

        summary_data.append({
            'col_name': col_name,
            'col_dtype': col_dtype,
            'num_of_nulls': num_of_nulls,
            'num_of_non_nulls': num_of_non_nulls,
            'num_of_distinct_values': num_of_distinct_values,
            'distinct_values_counts': distinct_values_counts
        })
    
    summary_df = pd.DataFrame(summary_data)
    return summary_df


summary_df = column_summary(df)
display(summary_df)

Unnamed: 0,col_name,col_dtype,num_of_nulls,num_of_non_nulls,num_of_distinct_values,distinct_values_counts
0,Fertilizer,object,0,120,3,"{'Orgarnic': 44, 'Synthetic': 40, 'Synthetic +..."
1,Plant Height (cm),float64,0,120,120,"{58.56151388665052: 1, 46.696826238466286: 1, ..."
2,Leaf Area (cm²),float64,0,120,120,"{185.73856643236127: 1, 138.7980608962804: 1, ..."
3,Chlorophyll Content (SPAD units),float64,0,120,120,"{46.5196207922374: 1, 34.69363266870892: 1, 51..."
4,Root Length (cm),float64,0,120,120,"{24.31891050096943: 1, 17.6585349528435: 1, 33..."
5,Biomass (g),float64,0,120,120,"{11.994074041165357: 1, 8.667791843721698: 1, ..."
6,Seed Yield (g),float64,0,120,120,"{6.687959618540082: 1, 6.165373569255893: 1, 8..."


<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>4.0. Cohen’s (d)</h1>

In [5]:
import pandas as pd
import numpy as np
from itertools import combinations

def compute_cohens_d(df, numerical_columns, group_column):
    def cohens_d(group1, group2):
        mean1, mean2 = np.mean(group1), np.mean(group2)
        std1, std2 = np.std(group1, ddof=1), np.std(group2, ddof=1)
        pooled_std = np.sqrt((std1**2 + std2**2) / 2)
        return (mean1 - mean2) / pooled_std if pooled_std != 0 else np.nan

    def interpret_d(d):
        abs_d = abs(d)
        if abs_d < 0.2:
            return "Small effect size"
        elif abs_d < 0.5:
            return "Medium effect size"
        elif abs_d < 0.8:
            return "Large effect size"
        else:
            return "Very large effect size"

    results = []

    unique_groups = df[group_column].dropna().unique()
    for var in numerical_columns:
        for group_a, group_b in combinations(unique_groups, 2):
            group1 = df[df[group_column] == group_a][var].dropna()
            group2 = df[df[group_column] == group_b][var].dropna()

            if not group1.empty and not group2.empty:
                d = cohens_d(group1, group2)
                results.append({
                    "Variable": var,
                    "Group Comparison": f"{group_a} vs {group_b}",
                    "Cohen's d": d,
                    "Interpretation": interpret_d(d)
                })

    return pd.DataFrame(results)

# Usage example:
numeric_variables = df.select_dtypes(include=np.number).columns.tolist()
results = compute_cohens_d(df, numerical_columns=numeric_variables, group_column='Fertilizer')
results

Unnamed: 0,Variable,Group Comparison,Cohen's d,Interpretation
0,Plant Height (cm),Synthetic vs Orgarnic,3.27,Very large effect size
1,Plant Height (cm),Synthetic vs Synthetic + Organic,1.08,Very large effect size
2,Plant Height (cm),Orgarnic vs Synthetic + Organic,-2.77,Very large effect size
3,Leaf Area (cm²),Synthetic vs Orgarnic,3.15,Very large effect size
4,Leaf Area (cm²),Synthetic vs Synthetic + Organic,1.36,Very large effect size
5,Leaf Area (cm²),Orgarnic vs Synthetic + Organic,-2.86,Very large effect size
6,Chlorophyll Content (SPAD units),Synthetic vs Orgarnic,3.17,Very large effect size
7,Chlorophyll Content (SPAD units),Synthetic vs Synthetic + Organic,1.6,Very large effect size
8,Chlorophyll Content (SPAD units),Orgarnic vs Synthetic + Organic,-1.84,Very large effect size
9,Root Length (cm),Synthetic vs Orgarnic,2.81,Very large effect size


---

This analysis was performed by **Jabulente**, a passionate and dedicated data analyst with a strong commitment to using data to drive meaningful insights and solutions. For inquiries, collaborations, or further discussions, please feel free to reach out via.  

    
<div align="center">  
    
[![GitHub](https://img.shields.io/badge/GitHub-Jabulente-black?logo=github)](https://github.com/Jabulente)  [![LinkedIn](https://img.shields.io/badge/LinkedIn-Jabulente-blue?logo=linkedin)](https://linkedin.com/in/jabulente-208019349)  [![Email](https://img.shields.io/badge/Email-jabulente@hotmail.com-red?logo=gmail)](mailto:Jabulente@hotmail.com)  

</div>

<h1 style='font-size: 55px; color: Tomato; font-family: Colonna MT; font-weight: 700; text-align: center'>THE END</h1>