## Electricty Load Forecasting

In [5]:
import os
os.listdir()

['RO_2024.csv',
 'RO_2025.csv',
 '.ipynb_checkpoints',
 'RO_2023.csv',
 'electricity_forecast.ipynb']

### Data Loading
**Source:** https://transparency.entsoe.eu/

The data used represents the quarter-hourly electricity load data for Romania (years 2023-2025) and it was obtained from the ENTSO-E Transparency Platform, the official data portal of the European Network of Transmission System Operators for Electricity

In [23]:
import pandas as pd

df_2023 = pd.read_csv("RO_2023.csv")
df_2024 = pd.read_csv("RO_2024.csv")
df_2025 = pd.read_csv("RO_2025.csv")
df = pd.concat([df_2023,df_2024,df_2025], ignore_index = True)
print("Rows and columns:", df.shape)
df.head()

Rows and columns: (105216, 4)


Unnamed: 0,MTU (CET/CEST),Area,Actual Total Load (MW),Day-ahead Total Load Forecast (MW)
0,01/01/2023 00:00 - 01/01/2023 00:15,Romania (RO),5074.0,5310.0
1,01/01/2023 00:15 - 01/01/2023 00:30,Romania (RO),5016.0,5220.0
2,01/01/2023 00:30 - 01/01/2023 00:45,Romania (RO),4963.0,5150.0
3,01/01/2023 00:45 - 01/01/2023 01:00,Romania (RO),4910.0,5090.0
4,01/01/2023 01:00 - 01/01/2023 01:15,Romania (RO),4881.0,5030.0


### Data Preprocessing

In [25]:
df["start_time_str"] = df["MTU (CET/CEST)"].str.split(" - ").str[0]
df[["MTU (CET/CEST)","start_time_str"]].head()

Unnamed: 0,MTU (CET/CEST),start_time_str
0,01/01/2023 00:00 - 01/01/2023 00:15,01/01/2023 00:00
1,01/01/2023 00:15 - 01/01/2023 00:30,01/01/2023 00:15
2,01/01/2023 00:30 - 01/01/2023 00:45,01/01/2023 00:30
3,01/01/2023 00:45 - 01/01/2023 01:00,01/01/2023 00:45
4,01/01/2023 01:00 - 01/01/2023 01:15,01/01/2023 01:00


In [32]:
df.loc[
    df["MTU (CET/CEST)"].str.contains("CET|CEST"),
    ["MTU (CET/CEST)", "start_time_str"]
].sample(3, random_state = 1)

Unnamed: 0,MTU (CET/CEST),start_time_str
98790,26/10/2025 02:30 (CEST) - 26/10/2025 02:45 (CEST),26/10/2025 02:30
98794,26/10/2025 02:30 (CET) - 26/10/2025 02:45 (CET),26/10/2025 02:30
98788,26/10/2025 02:00 (CEST) - 26/10/2025 02:15 (CEST),26/10/2025 02:00


In [35]:
df["start_time_str"] = df["start_time_str"].str.replace(r" \(CET\)| \(CEST\)","", regex = True)
df["start_time"] = pd.to_datetime(df["start_time_str"], dayfirst = True)
df[["start_time_str","start_time"]].head()    

Unnamed: 0_level_0,start_time_str,start_time
start_time,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01 00:00:00,01/01/2023 00:00,2023-01-01 00:00:00
2023-01-01 00:15:00,01/01/2023 00:15,2023-01-01 00:15:00
2023-01-01 00:30:00,01/01/2023 00:30,2023-01-01 00:30:00
2023-01-01 00:45:00,01/01/2023 00:45,2023-01-01 00:45:00
2023-01-01 01:00:00,01/01/2023 01:00,2023-01-01 01:00:00


In [36]:
df = df.set_index("start_time")
df.index[:5]

DatetimeIndex(['2023-01-01 00:00:00', '2023-01-01 00:15:00',
               '2023-01-01 00:30:00', '2023-01-01 00:45:00',
               '2023-01-01 01:00:00'],
              dtype='datetime64[ns]', name='start_time', freq=None)

In [38]:
df["Actual Total Load (MW)"] = pd.to_numeric(df["Actual Total Load (MW)"], errors = "coerce")
df["Day-ahead Total Load Forecast (MW)"] = pd.to_numeric(df["Day-ahead Total Load Forecast (MW)"], errors="coerce")
df[["Actual Total Load (MW)","Day-ahead Total Load Forecast (MW)"]].dtypes

Actual Total Load (MW)                float64
Day-ahead Total Load Forecast (MW)    float64
dtype: object

The quarter-hourly values are aggregated into hourly loads in order to better inform the subsequent predictive model.

The ENTSO-E data provides power (measured in MegaWatts), which represents the rate at which electricity is being consumed, hence the appropriate way to aggregate quarter-hourly values to hourly is by computing the mean average for said hour.

In [42]:
hourly = df[["Actual Total Load (MW)","Day-ahead Total Load Forecast (MW)"]].resample("h").mean()
hourly.head()

Unnamed: 0_level_0,Actual Total Load (MW),Day-ahead Total Load Forecast (MW)
start_time,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01 00:00:00,4990.75,5192.5
2023-01-01 01:00:00,4814.5,4967.5
2023-01-01 02:00:00,4663.75,4787.5
2023-01-01 03:00:00,4566.75,4630.0
2023-01-01 04:00:00,4520.75,4580.0


In [43]:
hourly.index.min(), hourly.index.max(), hourly.shape

(Timestamp('2023-01-01 00:00:00'),
 Timestamp('2025-12-31 23:00:00'),
 (26304, 2))