<h1 style='font-size: 25px; color: crimson; font-family: Colonna MT; font-weight: 600; text-align: center'>Partial Eta-squared (ηp²)</h1>
<hr>

- **Partial Eta-squared (ηp²)** is a commonly used effect size measure in the context of **ANOVA (Analysis of Variance)**. It represents the proportion of variance in a dependent variable that can be attributed to a specific independent variable or factor, while **controlling for the effects of other variables** in the model. Unlike simple Eta-squared, which measures the total variance explained by a factor, partial Eta-squared isolates the effect of each factor in the presence of others, making it especially useful in **repeated measures designs or multi-factorial experiments**.

- The value of partial ηp² ranges from 0 to 1, where higher values indicate a larger effect size. A value of 0.01 is often interpreted as a small effect, 0.06 as medium, and 0.14 or above as a large effect, although interpretation can vary slightly by field. For example, in an agricultural experiment testing the effect of fertilizer type on crop yield across different soil types, partial Eta-squared would allow us to determine how much of the yield variation is uniquely explained by fertilizer type, while accounting for the influence of soil variation. This helps researchers not only test for statistical significance but also understand the **practical importance** of each factor in explaining outcomes.

- Below is an implementation that demonstrates how to compute Partial Eta-squared (ηp²) — an effect size measure used in ANOVA (Analysis of Variance) to understand how much of the variance in a numeric dependent variable is explained by one or more categorical factors, while controlling for other effects.

- This implementation is written to be reusable and scalable, meaning it can work with any dataset, any number of numeric variables, and any number of categorical factors without needing to rewrite or duplicate code.

<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>1.0. Import Required Libraries</h1>

In [2]:
import warnings
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

warnings.simplefilter("ignore")
pd.set_option('display.max_columns', 10)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
print("....Libraries Loaded Successfully....")

....Libraries Loaded Successfully....


<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>2.0. Import and Preprocessing Dataset</h1>

In [3]:
filepath = 'Datasets/Fertilizer and Light Exposure Experiment Dataset.csv'
df = pd.read_csv(filepath)
df.sample(10)

Unnamed: 0,Fertilizer,Plant Height (cm),Leaf Area (cm²),Chlorophyll Content (SPAD units),Root Length (cm),Biomass (g),Seed Yield (g)
105,Synthetic + Organic,59.0,198.81,31.12,22.0,11.44,5.98
76,Orgarnic,47.88,119.0,23.99,18.79,8.19,4.38
57,Synthetic + Organic,63.42,192.58,46.46,27.69,11.44,6.82
62,Synthetic + Organic,60.41,208.83,51.09,27.38,9.7,7.03
30,Synthetic + Organic,78.31,201.94,38.29,27.72,14.29,6.26
36,Orgarnic,50.21,156.41,31.52,16.81,8.77,4.28
37,Synthetic,59.67,170.86,45.71,27.48,11.05,7.18
78,Synthetic + Organic,49.79,155.82,35.15,25.87,10.14,6.47
91,Synthetic,83.77,249.13,53.84,36.97,15.01,9.19
5,Synthetic,86.39,281.03,63.05,39.4,17.06,8.1


<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>3.0.Dataset Column Profiling </h1>

In [4]:
def column_summary(df):
    summary_data = []
    
    for col_name in df.columns:
        col_dtype = df[col_name].dtype
        num_of_nulls = df[col_name].isnull().sum()
        num_of_non_nulls = df[col_name].notnull().sum()
        num_of_distinct_values = df[col_name].nunique()
        
        if num_of_distinct_values <= 10:
            distinct_values_counts = df[col_name].value_counts().to_dict()
        else:
            top_10_values_counts = df[col_name].value_counts().head(10).to_dict()
            distinct_values_counts = {k: v for k, v in sorted(top_10_values_counts.items(), key=lambda item: item[1], reverse=True)}

        summary_data.append({
            'col_name': col_name,
            'col_dtype': col_dtype,
            'num_of_nulls': num_of_nulls,
            'num_of_non_nulls': num_of_non_nulls,
            'num_of_distinct_values': num_of_distinct_values,
            'distinct_values_counts': distinct_values_counts
        })
    
    summary_df = pd.DataFrame(summary_data)
    return summary_df


summary_df = column_summary(df)
display(summary_df)

Unnamed: 0,col_name,col_dtype,num_of_nulls,num_of_non_nulls,num_of_distinct_values,distinct_values_counts
0,Fertilizer,object,0,120,3,"{'Orgarnic': 44, 'Synthetic': 40, 'Synthetic +..."
1,Plant Height (cm),float64,0,120,120,"{58.56151388665052: 1, 46.696826238466286: 1, ..."
2,Leaf Area (cm²),float64,0,120,120,"{185.73856643236127: 1, 138.7980608962804: 1, ..."
3,Chlorophyll Content (SPAD units),float64,0,120,120,"{46.5196207922374: 1, 34.69363266870892: 1, 51..."
4,Root Length (cm),float64,0,120,120,"{24.31891050096943: 1, 17.6585349528435: 1, 33..."
5,Biomass (g),float64,0,120,120,"{11.994074041165357: 1, 8.667791843721698: 1, ..."
6,Seed Yield (g),float64,0,120,120,"{6.687959618540082: 1, 6.165373569255893: 1, 8..."


<h1 style='font-family: Colonna MT; font-weight: 600; font-size: 20px; text-align: left'>4.0. Partial Eta-squared (ηp²)</h1>

In [7]:
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

def interpret_eta_squared(eta_squared):
    """Interpretation based on Cohen's guidelines."""
    if eta_squared >= 0.14:
        return "Large effect size (≥ 14%)"
    elif eta_squared >= 0.06:
        return "Medium effect size (6% - 14%)"
    else:
        return "Small effect size (< 6%)"

def compute_partial_eta_squared(df, variables, categories):
    """
    Computes Partial Eta-squared (ηp²) for each numeric variable against each categorical factor.

    Parameters:
    - df (pd.DataFrame): The input DataFrame.
    - variables (list): List of numeric variable names.
    - categories (list): List of categorical variable names.

    Returns:
    - pd.DataFrame: A DataFrame with variable, factor, partial eta-squared, and interpretation.
    """
    results = []

    for var in variables:
        for cat in categories:
            import re
            def safe_rename(text): return re.sub(r'[^a-zA-Z0-9_]', "", text)
            
            # Create unique safe aliases for both variable and category
            safe_var = safe_rename(var) + "_var"
            safe_cat = safe_rename(cat) + "_cat"
            
            # Make a copy of the dataframe and apply both renames at once (non-destructive)
            temp_df = df.rename(columns={var: safe_var, cat: safe_cat})

            # Fit the ANOVA model
            formula = f'{safe_var} ~ C({safe_cat})'
            model = ols(formula, data=temp_df).fit()
            anova_table = anova_lm(model, typ=2)

            # Compute partial eta-squared
            ss_factor = anova_table.loc[f'C({safe_cat})', 'sum_sq']
            ss_error = anova_table.loc['Residual', 'sum_sq']
            eta_squared = ss_factor / (ss_factor + ss_error)

            results.append({
                "Variable": var,
                "Factor": cat,
                "Partial Eta-squared (ηp²)": eta_squared,
                "Interpretation": interpret_eta_squared(eta_squared)
            })

    return pd.DataFrame(results)

# Example usage
numeric_vars = df.select_dtypes(include=["float64", "int64"]).columns.tolist()
eta_squared_df = compute_partial_eta_squared(df, numeric_vars, categories=['Fertilizer'])
display(eta_squared_df)

Unnamed: 0,Variable,Factor,Partial Eta-squared (ηp²),Interpretation
0,Plant Height (cm),Fertilizer,0.68,Large effect size (≥ 14%)
1,Leaf Area (cm²),Fertilizer,0.69,Large effect size (≥ 14%)
2,Chlorophyll Content (SPAD units),Fertilizer,0.66,Large effect size (≥ 14%)
3,Root Length (cm),Fertilizer,0.65,Large effect size (≥ 14%)
4,Biomass (g),Fertilizer,0.64,Large effect size (≥ 14%)
5,Seed Yield (g),Fertilizer,0.71,Large effect size (≥ 14%)


---

This analysis was performed by **Jabulente**, a passionate and dedicated data analyst with a strong commitment to using data to drive meaningful insights and solutions. For inquiries, collaborations, or further discussions, please feel free to reach out via.  

    
<div align="center">  
    
[![GitHub](https://img.shields.io/badge/GitHub-Jabulente-black?logo=github)](https://github.com/Jabulente)  [![LinkedIn](https://img.shields.io/badge/LinkedIn-Jabulente-blue?logo=linkedin)](https://linkedin.com/in/jabulente-208019349)  [![Email](https://img.shields.io/badge/Email-jabulente@hotmail.com-red?logo=gmail)](mailto:Jabulente@hotmail.com)  

</div>

<h1 style='font-size: 55px; color: Tomato; font-family: Colonna MT; font-weight: 700; text-align: center'>THE END</h1>