<h1 style='font-size: 25px; color: crimson; font-family: Colonna MT; font-weight: 600; text-align: center'>Descriptive Statistics Analysis – Summarizing Data for Deeper Insights</h1>

---

<h4 style='font-size: 18px; font-weight: 600'>1.0: Import required libraries</h4>

In [1]:
import pandas as pd  
import numpy as np 
pd.set_option('display.float_format', lambda x: '%.2f' % x)
print("....Libraries Loaded Successfully....")

....Libraries Loaded Successfully....


<h4 style='font-size: 18px; font-weight: 600'>2.0: Import and Preprocessing Dataset</h4>

In [2]:
filepath = '../Datasets/Fertilizer and Light Exposure Experiment Dataset.csv'
df = pd.read_csv(filepath)
df.sample(10)

Unnamed: 0,Fertilizer,Light Exposure,Plant Height (cm),Leaf Area (cm²),Chlorophyll Content (SPAD units),Root Length (cm),Biomass (g),Flower Count (number),Seed Yield (g),Stomatal Conductance (mmol/m²/s)
2,Control,Partial Shade,58.33,203.84,40.82,27.0,9.5,16.39,5.41,230.07
23,Synthetic,Partial Shade,67.96,200.59,43.53,24.21,14.64,17.01,7.93,264.71
84,Organic,Partial Shade,63.95,181.28,42.08,24.24,11.66,17.81,5.71,226.86
88,Synthetic,Full Sun,75.54,256.82,51.7,33.79,16.53,25.81,8.14,294.83
106,Organic,Partial Shade,60.78,197.07,35.93,25.36,11.44,16.33,5.29,243.86
99,Organic,Full Sun,95.14,263.37,58.73,35.17,18.2,29.48,7.66,304.77
33,Synthetic,Full Shade,46.04,124.61,33.98,18.58,7.52,13.87,4.83,182.69
8,Control,Full Shade,40.86,114.49,38.28,20.74,11.2,14.89,5.14,171.97
13,Control,Partial Shade,50.64,186.67,37.16,19.91,11.81,19.08,6.14,248.57
79,Synthetic,Full Sun,70.68,205.59,58.26,29.93,16.71,22.68,6.61,339.57


<h4 style='font-size: 18px; font-weight: 600'>3.0: Group-wise Comparatives Analysis of Continuous variables</h4>

Now, let’s turn our attention to comparing the means of variables across different specified groups. This approach helps us understand how each variable behaves within various categories or groups. For instance, we might explore how the average outcome of a variable changes across different specie. Such comparisons allow us to identify any significant differences between groups, uncovering patterns or trends that could be crucial for deeper analysis. By analyzing these mean comparisons, we gain valuable insights into the relationships between variables and groups.

In [3]:
def summary_stats(df, group=''):
    Metrics = df.select_dtypes(include=np.number).columns.tolist()
    df_without_location = df.drop(columns=[group])
    grand_mean = df_without_location[Metrics].mean()
    sem = df_without_location[Metrics].sem()
    cv = df_without_location[Metrics].std() / df_without_location[Metrics].mean() * 100
    grouped = df.groupby(group)[Metrics].agg(['mean', 'sem']).reset_index()
    
    summary_df = pd.DataFrame()
    for col in Metrics:
        summary_df[col] = grouped.apply(
            lambda x: f"{x[(col, 'mean')]:.2f} ± {x[(col, 'sem')]:.2f}", axis=1
        )
    
    summary_df.insert(0, group, grouped[group])
    grand_mean_row = ['Grand Mean'] + grand_mean.tolist()
    sem_row = ['SEM'] + sem.tolist()
    cv_row = ['%CV'] + cv.tolist()
    
    summary_df.loc[len(summary_df)] = grand_mean_row
    summary_df.loc[len(summary_df)] = sem_row
    summary_df.loc[len(summary_df)] = cv_row
    
    return summary_df


results = summary_stats(df, group='Fertilizer')
results.T

Unnamed: 0,0,1,2,3,4,5
Fertilizer,Control,Organic,Synthetic,Grand Mean,SEM,%CV
Plant Height (cm),54.55 ± 1.58,65.61 ± 2.69,61.84 ± 2.42,60.58,1.36,24.65
Leaf Area (cm²),167.76 ± 4.32,194.26 ± 8.37,184.36 ± 8.09,181.90,4.19,25.25
Chlorophyll Content (SPAD units),39.76 ± 1.12,44.85 ± 1.77,41.60 ± 1.67,42.03,0.90,23.47
Root Length (cm),21.97 ± 0.58,25.15 ± 1.03,24.53 ± 0.86,23.86,0.50,22.78
Biomass (g),10.91 ± 0.26,12.56 ± 0.47,12.48 ± 0.56,11.97,0.26,24.12
Flower Count (number),17.13 ± 0.48,19.36 ± 0.82,18.12 ± 0.75,18.18,0.40,24.35
Seed Yield (g),5.68 ± 0.17,6.41 ± 0.26,6.43 ± 0.25,6.16,0.14,24.13
Stomatal Conductance (mmol/m²/s),221.38 ± 6.67,256.74 ± 9.06,247.81 ± 9.74,241.68,5.09,23.06


In [4]:
results = summary_stats(df, group='Light Exposure')
results.T

Unnamed: 0,0,1,2,3,4,5
Light Exposure,Full Shade,Full Sun,Partial Shade,Grand Mean,SEM,%CV
Plant Height (cm),45.39 ± 0.78,74.39 ± 1.81,63.78 ± 1.30,60.58,1.36,24.65
Leaf Area (cm²),136.79 ± 2.32,227.23 ± 5.95,186.68 ± 3.22,181.90,4.19,25.25
Chlorophyll Content (SPAD units),33.02 ± 0.62,52.17 ± 1.18,41.76 ± 0.88,42.03,0.90,23.47
Root Length (cm),18.52 ± 0.26,28.84 ± 0.77,24.84 ± 0.37,23.86,0.50,22.78
Biomass (g),9.29 ± 0.16,14.77 ± 0.38,12.12 ± 0.27,11.97,0.26,24.12
Flower Count (number),13.86 ± 0.21,22.76 ± 0.55,18.39 ± 0.30,18.18,0.40,24.35
Seed Yield (g),4.68 ± 0.07,7.66 ± 0.18,6.32 ± 0.11,6.16,0.14,24.13
Stomatal Conductance (mmol/m²/s),187.67 ± 3.29,294.33 ± 7.03,249.19 ± 4.95,241.68,5.09,23.06
