## Project Overview

This project explores global renewable energy consumption patterns using World Bank data. The goal is to apply machine learning techniques to analyze, cluster, and eventually optimize renewable energy distribution across countries and regions

## Data Import

In [None]:
# importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load and inspect the data
# The World Bank data is available at: https://data.worldbank.org/indicator/EG.FEC.RNEW.ZS

file_name = 'data/API_EG.FEC.RNEW.ZS_DS2_en_csv_v2_13732.csv'

# World Bank data has extra rows we need to skip 
df = pd.read_csv(file_name, skiprows=4)
df.head()

## Exploring the data

We should check:

- Missing values

- Column names

- Data types

In [None]:
# for getting an overview of what the dataframe entails (datatype, non-null values)
print("Shape:", df.shape)
df.info()

df.isnull().sum().head()

In [None]:
# Plot top countries (latest year)
df_latest = df[["Country Name", "2020"]].dropna().sort_values(by="2020", ascending=False).head(10)
sns.barplot(x="2020", y="Country Name", data=df_latest)
plt.title("Top 10 Countries by Renewable Energy % (2020)")
plt.xlabel("% of Total Energy Consumption")
plt.ylabel("Country")
plt.show()

## Data Cleaning

# Initial Visualization of the data