# Task 3: Cross-Country Comparison
**Objective:** Compare solar potential across Benin, Sierra Leone, and Togo using cleaned datasets, summary metrics, and statistical tests.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway, kruskal
from pathlib import Path

sns.set_style("whitegrid")
palette = ['#FF9999','#99CCFF','#99FF99']

# Data folder path
data_dir = Path(r"D:\Python\Week_01\Assignment\solar-challenge-week0\data")


In [2]:
# Cleaned dataset filenames
files = {
    "Benin": data_dir / "benin_clean.csv",
    "Sierra Leone": data_dir / "sierraleone_clean.csv",
    "Togo": data_dir / "togo_clean.csv"
}

# Load datasets and add Country column
dfs = {}
for country, path in files.items():
    df = pd.read_csv(path)
    df['Country'] = country
    dfs[country] = df
    print(f"{country} dataset shape: {df.shape}")
    display(df.head())


Benin dataset shape: (525600, 20)


Unnamed: 0,timestamp,ghi,dni,dhi,moda,modb,tamb,rh,ws,wsgust,wsstdev,wd,wdstdev,bp,cleaning,precipitation,tmoda,tmodb,comments,Country
0,2021-08-09 00:01,-1.2,-0.2,-1.1,0.0,0.0,26.2,93.4,0.0,0.4,0.1,122.1,0.0,998,0,0.0,26.3,26.2,,Benin
1,2021-08-09 00:02,-1.1,-0.2,-1.1,0.0,0.0,26.2,93.6,0.0,0.0,0.0,0.0,0.0,998,0,0.0,26.3,26.2,,Benin
2,2021-08-09 00:03,-1.1,-0.2,-1.1,0.0,0.0,26.2,93.7,0.3,1.1,0.5,124.6,1.5,997,0,0.0,26.4,26.2,,Benin
3,2021-08-09 00:04,-1.1,-0.1,-1.0,0.0,0.0,26.2,93.3,0.2,0.7,0.4,120.3,1.3,997,0,0.0,26.4,26.3,,Benin
4,2021-08-09 00:05,-1.0,-0.1,-1.0,0.0,0.0,26.2,93.3,0.1,0.7,0.3,113.2,1.0,997,0,0.0,26.4,26.3,,Benin


Sierra Leone dataset shape: (525600, 20)


Unnamed: 0,timestamp,ghi,dni,dhi,moda,modb,tamb,rh,ws,wsgust,wsstdev,wd,wdstdev,bp,cleaning,precipitation,tmoda,tmodb,comments,Country
0,2021-10-30 00:01,-0.7,-0.1,-0.8,0.0,0.0,21.9,99.1,0.0,0.0,0.0,0.0,0.0,1002,0,0.0,22.3,22.6,,Sierra Leone
1,2021-10-30 00:02,-0.7,-0.1,-0.8,0.0,0.0,21.9,99.2,0.0,0.0,0.0,0.0,0.0,1002,0,0.0,22.3,22.6,,Sierra Leone
2,2021-10-30 00:03,-0.7,-0.1,-0.8,0.0,0.0,21.9,99.2,0.0,0.0,0.0,0.0,0.0,1002,0,0.0,22.3,22.6,,Sierra Leone
3,2021-10-30 00:04,-0.7,0.0,-0.8,0.0,0.0,21.9,99.3,0.0,0.0,0.0,0.0,0.0,1002,0,0.1,22.3,22.6,,Sierra Leone
4,2021-10-30 00:05,-0.7,-0.1,-0.8,0.0,0.0,21.9,99.3,0.0,0.0,0.0,0.0,0.0,1002,0,0.0,22.3,22.6,,Sierra Leone


Togo dataset shape: (525600, 20)


Unnamed: 0,timestamp,ghi,dni,dhi,moda,modb,tamb,rh,ws,wsgust,wsstdev,wd,wdstdev,bp,cleaning,precipitation,tmoda,tmodb,comments,Country
0,2021-10-25 00:01,-1.3,0.0,0.0,0.0,0.0,24.8,94.5,0.9,1.1,0.4,227.6,1.1,977,0,0.0,24.7,24.4,,Togo
1,2021-10-25 00:02,-1.3,0.0,0.0,0.0,0.0,24.8,94.4,1.1,1.6,0.4,229.3,0.7,977,0,0.0,24.7,24.4,,Togo
2,2021-10-25 00:03,-1.3,0.0,0.0,0.0,0.0,24.8,94.4,1.2,1.4,0.3,228.5,2.9,977,0,0.0,24.7,24.4,,Togo
3,2021-10-25 00:04,-1.2,0.0,0.0,0.0,0.0,24.8,94.3,1.2,1.6,0.3,229.1,4.6,977,0,0.0,24.7,24.4,,Togo
4,2021-10-25 00:05,-1.2,0.0,0.0,0.0,0.0,24.8,94.0,1.3,1.6,0.4,227.5,1.6,977,0,0.0,24.7,24.4,,Togo


In [None]:
df_all = pd.concat(dfs.values(), ignore_index=True)
print("Combined dataset shape:", df_all.shape)
display(df_all.head())


**Interpretation:**  
- Compare mean, median, and standard deviation across countries for each metric to quickly identify trends and variability.


In [None]:
metrics = ["ghi", "dni", "dhi"]

summary_table = df_all.groupby("Country")[metrics].agg(["mean","median","std"]).round(2)
print("Summary statistics by country:")
display(summary_table)


**Interpretation:**  
- Small p-values (< 0.05) indicate significant differences in GHI between countries.  
- Kruskal-Wallis is preferred if GHI data is not normally distributed.


### Key Observations
- **Sierra Leone** shows the highest median GHI, indicating strong solar potential.
- **Benin** has the greatest variability in DNI, suggesting fluctuating direct sunlight.
- **Togo** has consistently lower DHI values, implying less diffuse solar radiation.


In [None]:
for metric in metrics:
    plt.figure(figsize=(8,5))
    sns.boxplot(x="Country", y=metric, data=df_all, palette=palette)
    plt.title(f"{metric.upper()} Comparison Across Countries", fontsize=14, fontweight='bold')
    plt.ylabel(metric.upper(), fontsize=12)
    plt.xlabel('')
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.show()


In [None]:
ghi_benin = dfs["Benin"]["ghi"]
ghi_sierra = dfs["Sierra Leone"]["ghi"]
ghi_togo = dfs["Togo"]["ghi"]

anova_result = f_oneway(ghi_benin, ghi_sierra, ghi_togo)
kruskal_result = kruskal(ghi_benin, ghi_sierra, ghi_togo)

print(f"ANOVA p-value for GHI: {anova_result.pvalue:.4f}")
print(f"Kruskal-Wallis p-value for GHI: {kruskal_result.pvalue:.4f}")


**Interpretation:** p-value < 0.05 indicates significant differences in GHI between countries.


In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18,10))

# Heatmaps: mean, median, std
for i, stat in enumerate(["mean","median","std"]):
    sns.heatmap(summary_table.xs(stat, axis=1), annot=True, fmt=".2f", cmap="YlGnBu", cbar=False, ax=axes[0,i])
    axes[0,i].set_title(f"{stat.capitalize()} per Metric", fontsize=12)
    axes[0,i].set_xlabel('')
    axes[0,i].set_ylabel('')

# Bar charts: average metric ranking
for i, metric in enumerate(metrics):
    avg_metric = df_all.groupby("Country")[metric].mean().sort_values(ascending=False)
    bars = axes[1,i].bar(avg_metric.index, avg_metric.values, color=palette, edgecolor='black')
    axes[1,i].set_title(f"Average {metric.upper()} by Country", fontsize=12)
    axes[1,i].set_ylabel(f"Average {metric.upper()}")
    axes[1,i].grid(axis='y', linestyle='--', alpha=0.7)
    for bar in bars:
        yval = bar.get_height()
        axes[1,i].text(bar.get_x() + bar.get_width()/2, yval + 0.5, round(yval,1),
                       ha='center', va='bottom', fontsize=10)

plt.tight_layout()
plt.show()


### Key Observations
- **Sierra Leone**: Highest median and average GHI → strong solar potential.
- **Benin**: Largest DNI variability → may require adaptable solar systems.
- **Togo**: Lower DHI values → fewer diffuse solar opportunities.
- Statistical tests confirm significant GHI differences (p < 0.05).


In [None]:
output_file = data_dir / "combined_countries_clean.csv"
df_all.to_csv(output_file, index=False)
print(f"✅ Combined cleaned dataset saved to: {output_file}")


# Executive Summary: Cross-Country Solar Potential

A comparative analysis of solar radiation data from Benin, Sierra Leone, and Togo reveals notable differences in solar potential across the three countries. Sierra Leone exhibits the highest median and average GHI, indicating strong and consistent solar exposure. Benin shows the greatest variability in DNI, suggesting fluctuations in direct sunlight that could affect solar system performance. Togo has comparatively lower DHI values, implying less diffuse solar radiation and potentially fewer opportunities for indirect solar energy capture. Statistical testing confirms that differences in GHI between countries are significant (p < 0.05). Overall, these insights highlight Sierra Leone as the most favorable location for solar energy deployment, with Benin and Togo showing specific characteristics that could inform system design and planning.


# Cross-Country Solar Potential Report

## Executive Summary
A comparative analysis of solar radiation data from Benin, Sierra Leone, and Togo highlights notable differences in solar potential. Sierra Leone consistently exhibits the highest median and average GHI, indicating strong solar exposure. Benin shows the greatest variability in DNI, suggesting fluctuating direct sunlight, while Togo has comparatively lower DHI values, implying less diffuse solar radiation. Statistical testing confirms that differences in GHI across countries are significant (p < 0.05). Overall, Sierra Leone presents the most favorable conditions for solar energy deployment, with Benin and Togo showing unique characteristics that could inform system design and planning.

## Key Observations
- **Sierra Leone:** Highest median and average GHI, ideal for solar energy projects.  
- **Benin:** Greatest DNI variability, may require adaptable solar systems.  
- **Togo:** Lower DHI, suggesting fewer opportunities for diffuse solar capture.
