# 🌍 Cross-Country Solar Irradiance Comparison – Benin, Togo, Sierra Leone

This notebook performs a comparative analysis of solar potential across **Benin**, **Togo**, and **Sierra Leone**, using cleaned irradiance data.

**Business Goal:**  
MoonLight Energy Solutions seeks data-driven guidance to prioritize solar farm investments. This analysis focuses on identifying high-potential regions based on statistical trends in Global Horizontal Irradiance (GHI), Direct Normal Irradiance (DNI), and Diffuse Horizontal Irradiance (DHI).

Key Deliverables:
- Metric comparison via boxplots and summary statistics
- Statistical testing for significance of cross-country differences
- Clear strategic recommendations

In [3]:
import pandas as pd # import pandas for data manipulation
import numpy as np # import numpy for numerical operations
import seaborn as sns # import seaborn for data visualization
import matplotlib.pyplot as plt     
from scipy.stats import f_oneway, kruskal, shapiro # import statistical tests

# Improve plot style for clarity
sns.set_style("whitegrid") # Set the style of seaborn plots
plt.rcParams.update({'axes.titlesize': 14, 'axes.labelsize': 12}) # Update default font sizes for plots

In [4]:
# Load cleaned CSVs with error handling
try: # Attempt to load cleaned data
    benin = pd.read_csv("data/benin_clean.csv") # Load Benin data
    togo = pd.read_csv("data/togo_clean.csv") # Load Togo data
    sl = pd.read_csv("data/sierra_leone_clean.csv") # Load Sierra Leone data
except FileNotFoundError as e: # Handle file not found error
    raise RuntimeError("Cleaned country CSV files not found. Ensure you have run EDA and cleaning.") from e # Ensure the data is loaded correctly

# Assign country labels
benin["country"] = "Benin" # Assign country name to Benin data
togo["country"] = "Togo" # Assign country name to Togo data
sl["country"] = "Sierra Leone" # Assign country name to Sierra Leone data

# Merge all data
df_all = pd.concat([benin, togo, sl], ignore_index=True) # Concatenate all dataframes into one


RuntimeError: Cleaned country CSV files not found. Ensure you have run EDA and cleaning.

## ☀️ Irradiance Metrics Overview

- **GHI (Global Horizontal Irradiance)**: Total solar radiation on a flat surface; core metric for fixed-tilt solar viability.
- **DNI (Direct Normal Irradiance)**: Sun-facing radiation; key for tracking PV and CSP systems.
- **DHI (Diffuse Horizontal Irradiance)**: Scattered radiation; relevant under cloudy or hazy conditions.

These three metrics will form the basis of our comparison across countries.

In [None]:
from scipy.stats import shapiro # Shapiro-Wilk normality test for GHI

# Shapiro-Wilk normality test for GHI
for country, label in zip([benin, togo, sl], ["Benin", "Togo", "Sierra Leone"]): # Iterate over each country
    stat, p = shapiro(country["GHI"].dropna().sample(min(5000, len(country))))  # cap sample size
    print(f"{label}: p-value = {p:.4f} → {'Normal' if p > 0.05 else 'Non-normal'}") # Check normality of GHI data


In [None]:
# Since data is not normally distributed, use Kruskal–Wallis
ghi_benin = benin["GHI"].dropna() # Drop NaN values from GHI column
ghi_togo = togo["GHI"].dropna() # Drop NaN values from GHI column
ghi_sl = sl["GHI"].dropna() # Drop NaN values from GHI column

h_stat, p_value = kruskal(ghi_benin, ghi_togo, ghi_sl) # Perform Kruskal-Wallis test
print(f"Kruskal–Wallis test on GHI: H = {h_stat:.3f}, p = {p_value:.5f}") # Print the results of the Kruskal-Wallis test

if p_value < 0.05: # Check if p-value is less than 0.05
    print("✅ Statistically significant differences in GHI across countries.")
else: # If p-value is not less than 0.05
    print("⚠️ No statistically significant differences detected.")


In [None]:
metrics = ["GHI", "DNI", "DHI"] # List of metrics to plot
for metric in metrics: # Iterate over each metric
    plt.figure(figsize=(9, 6)) # Create a new figure for the plot
    sns.boxplot(data=df_all, x="country", y=metric, palette="Set2") # Create a boxplot
    plt.title(f"{metric} Distribution by Country") # Set the title of the plot
    plt.ylabel(f"{metric} (W/m²)") # Set the y-axis label
    plt.xlabel("") # Set the x-axis label
    plt.grid(True) # Add grid to the plot
    plt.tight_layout() # Adjust layout for better spacing
    plt.show() # Show the plot


In [None]:
# Group by country and aggregate descriptive statistics
summary = df_all.groupby("country")[["GHI", "DNI", "DHI"]].agg(['mean', 'median', 'std', 'count']).round(2) # Group by country and calculate mean, median, std, and count
summary # Display the summary statistics


## 📊 Statistical Test Interpretation

The **Kruskal–Wallis H-test** was chosen due to non-normal GHI distributions.  
**p-value < 0.05** confirms statistically significant differences in GHI distributions across the three countries.

This result justifies using GHI rankings to prioritize solar investment regions.

In [None]:
# Bar chart with annotated values
mean_ghi = df_all.groupby("country")["GHI"].mean().sort_values(ascending=False) # Calculate mean GHI by country

fig, ax = plt.subplots(figsize=(7, 4)) # Create a new figure for the bar chart
bars = ax.bar(mean_ghi.index, mean_ghi.values, color=sns.color_palette("Set2")) # Create bars for the bar chart
ax.set_title("Average GHI by Country") # Set the title of the bar chart
ax.set_ylabel("GHI (W/m²)") # Set the y-axis label

# Annotate bars with GHI values
for bar in bars: # Iterate over each bar
    height = bar.get_height() # Get the height of the bar
    ax.annotate(f"{height:.1f}", xy=(bar.get_x() + bar.get_width() / 2, height), 
                xytext=(0, 5), textcoords="offset points", ha="center", fontsize=10) # Annotate the bar with the GHI value

plt.grid(axis='y') # Add grid to the y-axis
plt.tight_layout() # Adjust layout for better spacing
plt.show() # Show the bar chart


## 📌 Strategic Insights for MoonLight Energy Solutions

- **🇧🇯 Benin** exhibits the **highest mean and median GHI**, indicating the strongest and most consistent solar potential. Recommended as **top-priority region** for fixed-panel solar farm deployment.
- **🇸🇱 Sierra Leone** shows **high variability in GHI**, which may challenge solar generation consistency. Consider hybrid systems or storage integration to mitigate volatility.
- **🇹🇬 Togo** presents **moderate but reliable GHI**, with low variance. This stability makes it a promising candidate for smaller-scale distributed PV projects or pilot programs.

**Conclusion**: Begin with feasibility studies in Benin. Use insights from Togo for quick deployment pilots, while investing in hybrid infrastructure for Sierra Leone.

## 🧭 Reflections & Next Steps

This analysis provides a data-driven foundation for solar site prioritization. However, next stages should include:

- **Geospatial granularity**: Drill into sub-regional site data
- **Temporal stability**: Evaluate seasonal variation in GHI
- **Technical feasibility**: Consider other environmental variables (RH, wind speed, soiling)

📎 This notebook supports reproducibility and can be extended for real-time monitoring or dashboard integration (see Streamlit Task).