# DFx ETL Pipeline

## energydata.info

An ETL pipeline for [electricity capacity dataset](https://energydata.info/dataset/installed-electricity-capacity-by-country-area-mw-by-country) from the International Renewable Energy Agency (IRENA).

### Libraries

In [1]:
import pandas as pd

from dotenv import load_dotenv

load_dotenv()
from dfpp.storage import AzureStorage as Storage
from dfpp.sources import energydata_info as source

storage = Storage()
SOURCE_NAME = "energydata_info"
SERIES_ID = "irena_eleccap"

### Extract

In [2]:
URL = "https://energydata.info/dataset/b33e5af4-bd51-4ee0-a062-29438471db27/resource/6938ec3a-f7bb-4493-86ba-f28faa62f139/download/eleccap_20220404-201215.xlsx"
df_raw = pd.read_excel(URL, header=1, na_values=[".."])
print("Shape:", df_raw.shape)
display(df_raw.head())

Shape: (93264, 5)


Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,Afghanistan,On-grid Solar photovoltaic,Off-grid,2011.0,
1,,,,2012.0,
2,,,,2013.0,
3,,,,2014.0,
4,,,,2015.0,


### Transform

In [3]:
df_transformed = source.transform(df_raw)
df_transformed.name = SERIES_ID
print("Shape:", df_transformed.shape)
display(df_transformed.head())



Shape: (93241, 11)


Unnamed: 0,source,series_id,series_name,disagr_energy_technology,disagr_grid_connection,alpha_3_code,prop_unit,prop_observation_type,year,value,prop_value_label
0,https://energydata.info/,irena_eleccap,Installed electricity capacity by country/area...,On-grid Solar photovoltaic,Off-grid,AFG,Megawatt,,2011,,
1,https://energydata.info/,irena_eleccap,Installed electricity capacity by country/area...,On-grid Solar photovoltaic,Off-grid,AFG,Megawatt,,2012,,
2,https://energydata.info/,irena_eleccap,Installed electricity capacity by country/area...,On-grid Solar photovoltaic,Off-grid,AFG,Megawatt,,2013,,
3,https://energydata.info/,irena_eleccap,Installed electricity capacity by country/area...,On-grid Solar photovoltaic,Off-grid,AFG,Megawatt,,2014,,
4,https://energydata.info/,irena_eleccap,Installed electricity capacity by country/area...,On-grid Solar photovoltaic,Off-grid,AFG,Megawatt,,2015,,


### Load

In [4]:
storage.publish_dataset(df_transformed, folder_path=SOURCE_NAME)

'az://dfx-etl-pipeline-dev/v25-07-15/energydata_info/irena_eleccap.parquet'