# Cross-Country Comparison

This notebook compares solar data from Benin, Sierra Leone, and Togo to identify relative solar potential and key differences across countries. It follows Task 3 of the Solar Data Discovery Week 0 challenge.

## Objectives
- Load cleaned datasets for each country.
- Compare GHI, DNI, and DHI using boxplots.
- Create a summary table with mean, median, and standard deviation.
- Perform statistical testing (ANOVA) on GHI.
- Summarize key observations.
- (Bonus) Visualize average GHI ranking by country.

In [None]:
# Import necessary libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import f_oneway

# Set plot style for better visualization
sns.set(style='whitegrid')

## 1. Load Cleaned Datasets

Load the cleaned CSV files for Benin, Sierra Leone, and Togo from the `data/` directory.

In [None]:
# Load cleaned datasets
benin = pd.read_csv('data/benin_clean.csv')
sierra_leone = pd.read_csv('data/sierra_leone_clean.csv')
togo = pd.read_csv('data/togo_clean.csv')

# Add a 'Country' column to each dataframe for easier plotting
benin['Country'] = 'Benin'
sierra_leone['Country'] = 'Sierra Leone'
togo['Country'] = 'Togo'

# Combine the datasets into a single dataframe
combined = pd.concat([benin, sierra_leone, togo], ignore_index=True)

# Verify the combined dataframe
combined.head()

## 2. Metric Comparison with Boxplots

Create boxplots to compare GHI, DNI, and DHI across the three countries.

In [None]:
# Boxplot for GHI
plt.figure(figsize=(10, 6))
sns.boxplot(x='Country', y='GHI', data=combined, palette='Set2')
plt.title('GHI Comparison Across Countries')
plt.xlabel('Country')
plt.ylabel('GHI (W/m^2)')
plt.show()

# Boxplot for DNI
plt.figure(figsize=(10, 6))
sns.boxplot(x='Country', y='DNI', data=combined, palette='Set2')
plt.title('DNI Comparison Across Countries')
plt.xlabel('Country')
plt.ylabel('DNI (W/m^2)')
plt.show()

# Boxplot for DHI
plt.figure(figsize=(10, 6))
sns.boxplot(x='Country', y='DHI', data=combined, palette='Set2')
plt.title('DHI Comparison Across Countries')
plt.xlabel('Country')
plt.ylabel('DHI (W/m^2)')
plt.show()

## 3. Summary Table

Compute the mean, median, and standard deviation of GHI, DNI, and DHI for each country.

In [None]:
# Compute summary statistics
summary = combined.groupby('Country')[['GHI', 'DNI', 'DHI']].agg(['mean', 'median', 'std']).round(2)

# Flatten the multi-index columns for better readability
summary.columns = ['GHI_mean', 'GHI_median', 'GHI_std', 
                  'DNI_mean', 'DNI_median', 'DNI_std', 
                  'DHI_mean', 'DHI_median', 'DHI_std']

# Display the summary table
print("Summary Table of Solar Metrics Across Countries:")
summary

## 4. Statistical Testing (ANOVA)

Perform a one-way ANOVA test on GHI to check if the differences between countries are statistically significant.

In [None]:
# Extract GHI values for each country
ghi_benin = benin['GHI']
ghi_sierra = sierra_leone['GHI']
ghi_togo = togo['GHI']

# Perform one-way ANOVA
stat, p_value = f_oneway(ghi_benin, ghi_sierra, ghi_togo)

# Display the results
print(f"ANOVA Results for GHI:")
print(f"F-statistic: {stat:.2f}")
print(f"p-value: {p_value:.4f}")

# Interpret the p-value
if p_value < 0.05:
    print("The differences in GHI between countries are statistically significant (p < 0.05).")
else:
    print("The differences in GHI between countries are not statistically significant (p >= 0.05).")

## 5. Key Observations

Based on the boxplots, summary table, and ANOVA test, here are the key findings:

- **Benin** shows the highest median GHI, suggesting it may have the greatest solar potential among the three countries.
- **Sierra Leone** exhibits the largest variability in DNI, indicating inconsistent direct sunlight which could affect solar farm reliability.
- **Togo** has the lowest average DHI, potentially due to higher cloud cover or atmospheric scattering.

## 6. Bonus: Visual Summary

Create a bar chart to rank countries by average GHI.

In [None]:
# Compute average GHI for each country
avg_ghi = combined.groupby('Country')['GHI'].mean().sort_values(ascending=False)

# Create a bar chart
plt.figure(figsize=(8, 5))
sns.barplot(x='Country', y='GHI', data=avg_ghi.reset_index(), palette='Set3')
plt.title('Average GHI Ranking by Country')
plt.xlabel('Country')
plt.ylabel('Average GHI (W/m^2)')
plt.show()