# 01 - Data Pipeline of French Energy Data

This notebook aims to:

1. Explore historical hourly electricity consumption data (ENTSO-E) and weather data (Open-Meteo) for France.
2. Assess data quality (missing values, duplicates, temporal consistency).
3. Visualize main trends in electricity demand and weather features.
4. Validate and understand the features that will be used for the demand forecasting project.

All datasets are stored in **Parquet format** for efficient and fast reading.

## 1. Imports and setups

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import sys
from pathlib import Path

In [2]:
# Add the project root to sys.path for module imports
PROJECT_ROOT = Path().resolve().parent
sys.path.append(str(PROJECT_ROOT))

In [3]:
from src.ingestion.import_entsoe import get_entsoe_load
from src.ingestion.import_meteo import get_openmeteo_data

In [4]:
# Parameters
start_date = "2025-01-01"
end_date = "2025-01-31"

## 2. Data collection (APIs)

### 2.1. Load data

In [6]:
# Import ENTSO-E load data for France
df_load_raw = get_entsoe_load(start_date, end_date, country_code="10YFR-RTE------C")

Fetching: 2025-01-01 to 2025-01-31 ...


In [7]:
df_load_raw.head()

Unnamed: 0,datetime,load_MW
0,2025-01-01 00:00:00+00:00,61780.02
1,2025-01-01 01:00:00+00:00,61542.76
2,2025-01-01 02:00:00+00:00,61594.48
3,2025-01-01 03:00:00+00:00,61084.92
4,2025-01-01 04:00:00+00:00,61409.19


In [8]:
df_load_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2976 entries, 0 to 2975
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype              
---  ------    --------------  -----              
 0   datetime  2976 non-null   datetime64[ns, UTC]
 1   load_MW   2976 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(1)
memory usage: 46.6 KB


In [9]:
df_load_raw.describe()

Unnamed: 0,load_MW
count,2976.0
mean,65647.649684
std,7542.060686
min,46949.39
25%,60342.35
50%,65633.07
75%,70470.955
max,86918.07


In [10]:
# Save data
df_load_raw.to_parquet(path=PROJECT_ROOT / "data/raw/entsoe/france_load_2025_01.parquet", index=True)

### 2.2. Weather data

In [11]:
# Import Open-Meteo weather data for Paris (France)
df_weather_raw = get_openmeteo_data(start_date, end_date)

Fetching: 2025-01-01 to 2025-01-31 ...


In [12]:
df_weather_raw.head()

Unnamed: 0,datetime,temperature_2m,relative_humidity_2m,wind_speed_10m,shortwave_radiation_instant
0,2025-01-01 00:00:00+00:00,4.35,95.541069,14.734735,0.0
1,2025-01-01 01:00:00+00:00,4.5,97.238068,16.580532,0.0
2,2025-01-01 02:00:00+00:00,4.15,95.197891,16.516901,0.0
3,2025-01-01 03:00:00+00:00,3.95,94.520172,15.215058,0.0
4,2025-01-01 04:00:00+00:00,4.0,92.865906,15.676542,0.0


In [13]:
df_weather_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 744 entries, 0 to 743
Data columns (total 5 columns):
 #   Column                       Non-Null Count  Dtype              
---  ------                       --------------  -----              
 0   datetime                     744 non-null    datetime64[ns, UTC]
 1   temperature_2m               744 non-null    float32            
 2   relative_humidity_2m         744 non-null    float32            
 3   wind_speed_10m               744 non-null    float32            
 4   shortwave_radiation_instant  744 non-null    float32            
dtypes: datetime64[ns, UTC](1), float32(4)
memory usage: 17.6 KB


In [14]:
df_weather_raw.describe()

Unnamed: 0,temperature_2m,relative_humidity_2m,wind_speed_10m,shortwave_radiation_instant
count,744.0,744.0,744.0,744.0
mean,4.053158,88.190857,11.791069,34.960403
std,3.970848,8.093206,7.853233,68.630112
min,-4.0,63.722954,0.254558,0.0
25%,0.9375,82.655792,5.472977,0.0
50%,3.55,89.685211,8.874924,0.0
75%,7.3625,94.878494,18.778969,35.046463
max,12.9,100.0,34.762844,336.201874


In [15]:
df_weather_raw.to_parquet(path=PROJECT_ROOT / "data/raw/openmeteo/france_weather_2025_01.parquet", index=True)