# Raw Data Ingestion — ENTSO-E Demand & Open-Meteo Weather

This notebook explains step by step the ingestion of raw data from:

- get_entsoe_demand.py 
- get_openmeteo_weather.py 

# 1. Context & Objectives

Raw data sources:

- [ENTSO-E](https://documenter.getpostman.com/view/7009892/2s93JtP3F6): hourly electricity demand
- [Open-Meteo](https://open-meteo.com/en/docs/historical-weather-api?latitude=48.8534&longitude=2.3488&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m,shortwave_radiation_instant): historical hourly weather

Challenges:

- Data must be partitioned by country & year
- Files may already exist (overwrite safely for the moment)
- API requests may fail or timeout → need retries, caching
- Duplicates or missing timestamps in responses

Goal: Produce clean parquet files ready for preprocessing.

# 2. Imports & Project Paths

In [None]:
import pandas as pd
import os
import sys
import Path

In [49]:
PROJECT_ROOT = os.path.abspath("..")
if PROJECT_ROOT not in sys.path:
    sys.path.append(PROJECT_ROOT)

In [51]:
DATA_RAW_PATH_DEMAND = Path(PROJECT_ROOT) / "data" / "raw" / "electricity_demand"
DATA_RAW_PATH_WEATHER = Path(PROJECT_ROOT) / "data" / "raw" / "weather"

In [37]:
from src.ingestion.get_entsoe_demand import fetch_entsoe_demand_one_year, fetch_entsoe_demand_and_store
from src.ingestion.get_openmeteo_weather import fetch_openmeteo_weather_one_year, fetch_openmeteo_weather_and_store

In [52]:
# Parameters
country = "FR"
year = 2023

demand_path = DATA_RAW_PATH_DEMAND / f"country={country}" / f"year={year}" / "demand.parquet"
weather_path = DATA_RAW_PATH_WEATHER / f"country={country}" / f"year={year}" / "weather.parquet"

# 3. ENTSO-E Demand Ingestion

fetch_entsoe_demand_one_year:
- Fetches hourly actual electricity demand for a full year
- Converts XML API response into DataFrame with datetime and load_MW
- Deduplicates timestamps, sorts chronologically

In [29]:
MY_TOKEN = os.getenv("ENTSOE_API_TOKEN")

In [None]:
df = fetch_entsoe_demand_one_year(year=year, country_code="10YFR-RTE------C", api_token=MY_TOKEN)
df.head()

Unnamed: 0,datetime,load_MW
0,2023-01-01 00:00:00+00:00,45709.0
1,2023-01-01 01:00:00+00:00,44640.0
2,2023-01-01 02:00:00+00:00,41533.0
3,2023-01-01 03:00:00+00:00,39248.0
4,2023-01-01 04:00:00+00:00,38389.0


fetch_entsoe_demand_and_store:
- Loops over years
- Calls fetch_entsoe_demand_one_year
- Adds country column
- Saves parquet in data/raw/electricity_demand/country=XX/year=YYYY/demand.parquet
- Overwrites safely, uses sleep to avoid throttling

In [None]:
# Fetch and store demand data for France from ENTSO-E API for the year 2023
fetch_entsoe_demand_and_store(country=country, country_code="10YFR-RTE------C", start_year=year, end_year=year)

[FETCH] ENTSO-E demand | FR | 2023
[SAVED] /Users/bachirijihane/energy-intelligence-platform/data/raw/electricity_demand/country=FR/year=2023/demand.parquet | rows=8734


# 4. Open-Meteo Weather Ingestion

fetch_openmeteo_weather_one_year:
- Uses cached requests and retries (from the API code example)
- Fetches hourly variables: temperature, humidity, wind, radiation
- Converts API response into DataFrame
- Deduplicates timestamps, sorts by datetime

In [None]:
df_weather = fetch_openmeteo_weather_one_year(year=year, latitude=48.8534, longitude=2.3488)
df_weather.head()

Unnamed: 0,datetime,temperature_2m,relative_humidity_2m,wind_speed_10m,shortwave_radiation_instant
0,2023-01-01 00:00:00+00:00,14.85,53.719143,27.859905,0.0
1,2023-01-01 01:00:00+00:00,14.95,52.638969,26.302181,0.0
2,2023-01-01 02:00:00+00:00,14.75,53.321426,23.0653,0.0
3,2023-01-01 03:00:00+00:00,14.2,55.827904,21.385939,0.0
4,2023-01-01 04:00:00+00:00,14.15,57.980919,20.683559,0.0


fetch_openmeteo_weather_and_store:
- Loops over years
- Calls fetch_openmeteo_weather_one_year 
- Adds country column
- Saves parquet in data/raw/weather/country=XX/year=YYYY/weather.parquet
- Overwrites existing files, uses small sleep

In [None]:
# Fetch and store weather data for Paris, France from Open-Meteo API for the year 2023
fetch_openmeteo_weather_and_store(country=country, latitude=48.8534, longitude=2.3488, start_year=year, end_year=year)

# 6. Inspect Raw Files

In [53]:
# ENTSO-E demand
df_demand = pd.read_parquet(demand_path)
df_demand.head()

Unnamed: 0,datetime,load_MW,country
0,2023-01-01 00:00:00+00:00,45709.0,FR
1,2023-01-01 01:00:00+00:00,44640.0,FR
2,2023-01-01 02:00:00+00:00,41533.0,FR
3,2023-01-01 03:00:00+00:00,39248.0,FR
4,2023-01-01 04:00:00+00:00,38389.0,FR


In [54]:
# Open-Meteo weather
df_weather = pd.read_parquet(weather_path)
df_weather.head()

Unnamed: 0,datetime,temperature_2m,relative_humidity_2m,wind_speed_10m,shortwave_radiation_instant,country
0,2023-01-01 00:00:00+00:00,14.85,53.719143,27.859905,0.0,FR
1,2023-01-01 01:00:00+00:00,14.95,52.638969,26.302181,0.0,FR
2,2023-01-01 02:00:00+00:00,14.75,53.321426,23.0653,0.0,FR
3,2023-01-01 03:00:00+00:00,14.2,55.827904,21.385939,0.0,FR
4,2023-01-01 04:00:00+00:00,14.15,57.980919,20.683559,0.0,FR


# 7. Conclusion

This notebook explains the raw data ingestion pipeline: 
- How ENTSO-E and Open-Meteo data are fetched
- How yearly parquet files are structured
- How reliability, caching, and robustness are ensured

Next step: preprocessing & merging (see src/preprocessing/build_preprocessed_dataset.py)