# Cleaned- On-Grid Renewable Energy Statistics

Find the **`raw_data`** folder on [Raw On-Grid: IRENA_Stats_extract_2025_H1_raw.xlsx](https://github.com/MIT-Emerging-Talent/ET6-CDSP-group-08-repo/blob/main/1_datasets/raw_data/IRENA_Stats_extract_2025_H1_raw.xlsx)

This script prepares a cleaned version of the on-grid portion of the IRENA dataset, focusing on solar and renewable energy deployment in conflict-affected countries.  
 Conflicted- Affected Countries (17): **Syria, Iraq, Sudan, South Sudan, Palestine, Mali, Ethiopia, Ukraine, Yemen, Libya, Afghanistan, Nigeria, Central African Republic, Somalia,Pakistan, Mozambique and Myanmar**


Find the **`Cleaned_data`** folder on [Cleaned On-Grid: IRENA_Stats_extract_2025_H1_raw.xlsx]

In [16]:
import pandas as pd

### Working on "Country" sheet only (relevant to research question)

In [17]:
raw_df = pd.read_excel(
    "../1_datasets/raw_data/IRENA_Stats_extract_2025_H1_raw.xlsx", sheet_name="Country"
)
df = raw_df.copy()

# Dropping both sheets (Region and Global) as they are not needed

In [18]:
df.head()
print(df.head())

   Region       Sub-region  Country ISO3 code  M49 code         RE or Non-RE  \
0  Africa  Northern Africa  Algeria       DZA        12  Total Non-Renewable   
1  Africa  Northern Africa  Algeria       DZA        12  Total Non-Renewable   
2  Africa  Northern Africa  Algeria       DZA        12  Total Non-Renewable   
3  Africa  Northern Africa  Algeria       DZA        12  Total Non-Renewable   
4  Africa  Northern Africa  Algeria       DZA        12  Total Non-Renewable   

  Group Technology   Technology Sub-Technology        Producer Type  Year  \
0     Fossil fuels  Natural gas    Natural gas  On-grid electricity  2000   
1     Fossil fuels  Natural gas    Natural gas  On-grid electricity  2001   
2     Fossil fuels  Natural gas    Natural gas  On-grid electricity  2002   
3     Fossil fuels  Natural gas    Natural gas  On-grid electricity  2003   
4     Fossil fuels  Natural gas    Natural gas  On-grid electricity  2004   

   Electricity Installed Capacity (MW)  
0              

### Add a "Conflict_Status" column for conflict countries to be filtered

In [19]:
conflict_countries = [
    "Syria",
    "Iraq",
    "Sudan (the)",
    "South Sudan",
    "Palestine",
    "Mali",
    "Ethiopia",
    "Ukraine",
    "Yemen",
    "Libya",
    "Afghanistan",
    "Nigeria",
    "Central African Republic",
    "Somalia",
    "Pakistan",
    "Mozambique",
    "Myanmar",
]
# Filter out conflict countries

In [20]:
df["Conflict_Status"] = df["Country"].apply(
    lambda x: "Conflict" if x in conflict_countries else "Non-Conflict"
)


df[["Country", "Conflict_Status"]].head(10)

Unnamed: 0,Country,Conflict_Status
0,Algeria,Non-Conflict
1,Algeria,Non-Conflict
2,Algeria,Non-Conflict
3,Algeria,Non-Conflict
4,Algeria,Non-Conflict
5,Algeria,Non-Conflict
6,Algeria,Non-Conflict
7,Algeria,Non-Conflict
8,Algeria,Non-Conflict
9,Algeria,Non-Conflict


In [21]:
print(" Conflict_Status column added.")
print(df[["Country", "Conflict_Status"]].drop_duplicates().head(5))

 Conflict_Status column added.
         Country Conflict_Status
0        Algeria    Non-Conflict
148        Egypt    Non-Conflict
380        Libya        Conflict
454      Morocco    Non-Conflict
688  Sudan (the)        Conflict


### Filtered out the years that arent needed 
_**Project Scope : 2010 - 2024**_

In [22]:
df = df[(df["Year"] >= 2010) & (df["Year"] <= 2024)]
print(df["Year"].unique())

[2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
 2024]


### Filled in missing (null) values if any

In [23]:
df["Electricity Installed Capacity (MW)"] = df[
    "Electricity Installed Capacity (MW)"
].fillna(0)
# Fill missing numbers (e.g., if electricity capacity is missing, we assume 0)

In [24]:
df["Producer Type"] = df["Producer Type"].fillna("Unknown")
df["Technology"] = df["Technology"].fillna("Unknown")
df["Sub-Technology"] = df["Sub-Technology"].fillna("Unknown")
# Fill missing text (e.g., unknown technology or producer type)

In [25]:
preview = df[
    (df["Electricity Installed Capacity (MW)"] == 0)
    | (df["Producer Type"] == "Unknown")
    | (df["Technology"] == "Unknown")
    | (df["Sub-Technology"] == "Unknown")
]

print(
    preview[
        [
            "Country",
            "Year",
            "Technology",
            "Sub-Technology",
            "Producer Type",
            "Electricity Installed Capacity (MW)",
        ]
    ].head(10)
)

Empty DataFrame
Columns: [Country, Year, Technology, Sub-Technology, Producer Type, Electricity Installed Capacity (MW)]
Index: []


### Dropped the "M49 Code" column
*This column isn't useful for our solar conflict-focused analysis*

In [26]:
df.drop(columns=["M49 code"], inplace=True)

In [27]:
df.columns
print(df.columns)

Index(['Region', 'Sub-region', 'Country', 'ISO3 code', 'RE or Non-RE',
       'Group Technology', 'Technology', 'Sub-Technology', 'Producer Type',
       'Year', 'Electricity Installed Capacity (MW)', 'Conflict_Status'],
      dtype='object')


###  Save the cleaned version 
On **`1_datasets`** folder on **`cleaned_data`** folder named **IRENA_ONGRIDStats.cleaned**

In [28]:
df.to_excel(
    "../1_datasets/cleaned_data/IRENA_ONGRIDStats.cleaned.xlsx",
    sheet_name="IRENA_ONGRIDStats.cleaned",
    index=False,
)