# Cross-Country Comparison of Solar Data

## Overview
This notebook compares the cleaned solar datasets from Benin, Sierra Leone, and Togo.  
We aim to:
- Compare key solar metrics (GHI, DNI, DHI) across countries
- Visualize metric distributions using boxplots
- Generate a summary statistics table
- Conduct statistical testing for differences
- Provide key observations and actionable recommendations


## 0. Setup — Import Libraries
- Import necessary libraries
- Define helper functions


In [1]:
# Cell 1 — Setup
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway, kruskal

sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (12,5)

# Helper function for summary statistics
def country_summary(df, country_name):
    metrics = ["GHI","DNI","DHI"]
    summary = df[metrics].agg(["mean","median","std"]).T
    summary["country"] = country_name
    return summary


## 1. Load Cleaned Datasets
- Load Benin, Sierra Leone, and Togo cleaned CSVs
- Inspect first few rows


In [None]:
benin = pd.read_csv("../data/benin_clean.csv", parse_dates=["Timestamp"])
sierra_leone = pd.read_csv("../data/sierraleone _clean.csv", parse_dates=["Timestamp"])
togo = pd.read_csv("../data/togo _clean.csv", parse_dates=["Timestamp"])

print("Benin:")
display(benin.head())
print("Sierra Leone:")
display(sierra_leone.head())
print("Togo:")
display(togo.head())


FileNotFoundError: [Errno 2] No such file or directory: '../data/sierraleone_clean.csv'

## 2. Metric Comparison — Boxplots
- Compare GHI, DNI, DHI distributions side-by-side by country


In [None]:
# Combine datasets for plotting
benin["country"] = "Benin"
sierra_leone["country"] = "Sierra Leone"
togo["country"] = "Togo"

combined = pd.concat([benin, sierra_leone, togo], ignore_index=True)

metrics = ["GHI","DNI","DHI"]

for metric in metrics:
    plt.figure(figsize=(10,5))
    sns.boxplot(x="country", y=metric, data=combined)
    plt.title(f"{metric} Distribution by Country")
    plt.show()


## 3. Summary Statistics Table
- Mean, Median, Std for GHI, DNI, DHI for each country


In [None]:
benin_summary = country_summary(benin, "Benin")
sierra_summary = country_summary(sierra_leone, "Sierra Leone")
togo_summary = country_summary(togo, "Togo")

summary_df = pd.concat([benin_summary, sierra_summary, togo_summary])
display(summary_df)
