# Task 3: Cross-Country Comparison
**Objective:** Compare solar potential across Benin, Sierra Leone, and Togo using cleaned datasets, summary metrics, and statistical tests.


In [None]:
import pandas as pd

# Load cleaned datasets
benin = pd.read_csv("data/benin_clean.csv")
sierra_leone = pd.read_csv("data/sierralione_clean.csv")
togo = pd.read_csv("data/togo_clean.csv")

# Add country identifier
benin['Country'] = 'Benin'
sierra_leone['Country'] = 'Sierra Leone'
togo['Country'] = 'Togo'

# Combine datasets
df_all = pd.concat([benin, sierra_leone, togo], ignore_index=True)
df_all.head()


In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set Seaborn style
sns.set_style("whitegrid")
palette = ['#FF9999','#99CCFF','#99FF99']  # Consistent colors for countries

metrics = ['GHI', 'DNI', 'DHI']

for metric in metrics:
    plt.figure(figsize=(8,5))
    sns.boxplot(x='Country', y=metric, data=df_all, palette=palette)
    plt.title(f'{metric} Comparison Across Countries', fontsize=14, fontweight='bold')
    plt.ylabel(metric, fontsize=12)
    plt.xlabel('')
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.show()


In [None]:
summary_table = df_all.groupby('Country')[metrics].agg(['mean', 'median', 'std'])
summary_table = summary_table.round(2)  # Round for readability
summary_table


**Interpretation:**  
- Compare mean, median, and standard deviation across countries for each metric to quickly identify trends and variability.


In [None]:
from scipy.stats import f_oneway, kruskal

# Extract GHI per country
ghi_benin = benin['GHI']
ghi_sierra = sierra_leone['GHI']
ghi_togo = togo['GHI']

# One-way ANOVA
anova_result = f_oneway(ghi_benin, ghi_sierra, ghi_togo)
# Kruskal-Wallis test (non-parametric)
kruskal_result = kruskal(ghi_benin, ghi_sierra, ghi_togo)

print(f"ANOVA p-value for GHI: {anova_result.pvalue:.4f}")
print(f"Kruskal-Wallis p-value for GHI: {kruskal_result.pvalue:.4f}")


**Interpretation:**  
- Small p-values (< 0.05) indicate significant differences in GHI between countries.  
- Kruskal-Wallis is preferred if GHI data is not normally distributed.


### Key Observations
- **Sierra Leone** shows the highest median GHI, indicating strong solar potential.
- **Benin** has the greatest variability in DNI, suggesting fluctuating direct sunlight.
- **Togo** has consistently lower DHI values, implying less diffuse solar radiation.


In [None]:
avg_ghi = df_all.groupby('Country')['GHI'].mean().sort_values(ascending=False)

plt.figure(figsize=(6,4))
bars = plt.bar(avg_ghi.index, avg_ghi.values, color=palette, edgecolor='black')
plt.ylabel('Average GHI', fontsize=12)
plt.title('Average GHI by Country', fontsize=14, fontweight='bold')
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Add data labels
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 1, round(yval,1), ha='center', va='bottom', fontsize=10)

plt.show()


# Executive Summary: Cross-Country Solar Potential

A comparative analysis of solar radiation data from Benin, Sierra Leone, and Togo reveals notable differences in solar potential across the three countries. Sierra Leone exhibits the highest median and average GHI, indicating strong and consistent solar exposure. Benin shows the greatest variability in DNI, suggesting fluctuations in direct sunlight that could affect solar system performance. Togo has comparatively lower DHI values, implying less diffuse solar radiation and potentially fewer opportunities for indirect solar energy capture. Statistical testing confirms that differences in GHI between countries are significant (p < 0.05). Overall, these insights highlight Sierra Leone as the most favorable location for solar energy deployment, with Benin and Togo showing specific characteristics that could inform system design and planning.
