## Data aggregation of energy market prices in bidding zone N01 (Norway Østlandet) and neighbouring zones for the year of 2022

### dependencies

In [1]:
# importing dependencies to aggregate dataset
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from functools import reduce # used to merge the dataset

### importing price data from each zone

In [2]:
prices_no1 = pd.read_csv("../datasets/prices/NO1 Day-ahead Prices_202201010000-202301010000.csv")
prices_no2 = pd.read_csv("../datasets/prices/NO2 Day-ahead Prices_202201010000-202301010000.csv")
prices_no3 = pd.read_csv("../datasets/prices/NO3 Day-ahead Prices_202201010000-202301010000.csv")
prices_no5 = pd.read_csv("../datasets/prices/NO5 Day-ahead Prices_202201010000-202301010000.csv")
prices_se3 = pd.read_csv("../datasets/prices/SE3 Day-ahead Prices_202201010000-202301010000.csv")

Printing out first rows of each dataset to see the structure

In [3]:
prices_no1.head()

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh],Currency,BZN|NO1
0,01.01.2022 00:00 - 01.01.2022 01:00,132.89,EUR,
1,01.01.2022 01:00 - 01.01.2022 02:00,129.3,EUR,
2,01.01.2022 02:00 - 01.01.2022 03:00,132.08,EUR,
3,01.01.2022 03:00 - 01.01.2022 04:00,111.44,EUR,
4,01.01.2022 04:00 - 01.01.2022 05:00,112.35,EUR,


In [4]:
prices_no2.head()

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh],Currency,BZN|NO2
0,01.01.2022 00:00 - 01.01.2022 01:00,132.89,EUR,
1,01.01.2022 01:00 - 01.01.2022 02:00,129.3,EUR,
2,01.01.2022 02:00 - 01.01.2022 03:00,132.08,EUR,
3,01.01.2022 03:00 - 01.01.2022 04:00,111.44,EUR,
4,01.01.2022 04:00 - 01.01.2022 05:00,112.35,EUR,


In [5]:
prices_no3.head()

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh],Currency,BZN|NO3
0,01.01.2022 00:00 - 01.01.2022 01:00,46.6,EUR,
1,01.01.2022 01:00 - 01.01.2022 02:00,41.33,EUR,
2,01.01.2022 02:00 - 01.01.2022 03:00,42.18,EUR,
3,01.01.2022 03:00 - 01.01.2022 04:00,44.37,EUR,
4,01.01.2022 04:00 - 01.01.2022 05:00,37.67,EUR,


In [6]:
prices_no5.head()

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh],Currency,BZN|NO5
0,01.01.2022 00:00 - 01.01.2022 01:00,132.89,EUR,
1,01.01.2022 01:00 - 01.01.2022 02:00,129.3,EUR,
2,01.01.2022 02:00 - 01.01.2022 03:00,132.08,EUR,
3,01.01.2022 03:00 - 01.01.2022 04:00,111.44,EUR,
4,01.01.2022 04:00 - 01.01.2022 05:00,112.35,EUR,


In [7]:
prices_se3.head()

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh],Currency,BZN|SE3
0,01.01.2022 00:00 - 01.01.2022 01:00,46.6,EUR,
1,01.01.2022 01:00 - 01.01.2022 02:00,41.33,EUR,
2,01.01.2022 02:00 - 01.01.2022 03:00,42.18,EUR,
3,01.01.2022 03:00 - 01.01.2022 04:00,44.37,EUR,
4,01.01.2022 04:00 - 01.01.2022 05:00,37.67,EUR,


### preperation for concatenation of the price data from the different zones:
The common column values we will merge the data based on is the date time column "MTU (CET/CEST)". The column currency is redundant since the column that holds price already includes it in the column name. The currency column can therefore be dropped. The datasets have a colum name for which zone the data is from, but does not include any data for each entry. This information can be added to the column name for price and the column for zone can be dropped.

In [8]:
# renaming price columns
prices_no1 = prices_no1.rename(columns={'Day-ahead Price [EUR/MWh]': 'Day-ahead Price [EUR/MWh] BZN|NO1'})
prices_no2 = prices_no2.rename(columns={'Day-ahead Price [EUR/MWh]': 'Day-ahead Price [EUR/MWh] BZN|NO2'})
prices_no3 = prices_no3.rename(columns={'Day-ahead Price [EUR/MWh]': 'Day-ahead Price [EUR/MWh] BZN|NO3'})
prices_no5 = prices_no5.rename(columns={'Day-ahead Price [EUR/MWh]': 'Day-ahead Price [EUR/MWh] BZN|NO5'})
prices_se3 = prices_se3.rename(columns={'Day-ahead Price [EUR/MWh]': 'Day-ahead Price [EUR/MWh] BZN|SE3'})

In [9]:
# dropping redundant columns
prices_no1= prices_no1.drop(['Currency', 'BZN|NO1'], axis=1)
prices_no2= prices_no2.drop(['Currency', 'BZN|NO2'], axis=1)
prices_no3= prices_no3.drop(['Currency', 'BZN|NO3'], axis=1)
prices_no5= prices_no5.drop(['Currency', 'BZN|NO5'], axis=1)
prices_se3= prices_se3.drop(['Currency', 'BZN|SE3'], axis=1)

In [10]:
# merging the price datahttp://localhost:8888/notebooks/OneDrive/Desktop/Git/Time-series-energy-price-prediction-bidding-zone-N01/code/data%20aggreagation.ipynb# into one dataframe for prices in all zones
price_data_frames = [prices_no1, prices_no2, prices_no3, prices_no5, prices_se3]
prices = reduce(lambda left, right: pd.merge(left,right, on=['MTU (CET/CEST)']), price_data_frames)

# prices = pd.concat([prices_no1, prices_no2, prices_no3, prices_no5, prices_se3]) # concatinating the dataframes

prices.head()

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh] BZN|NO1,Day-ahead Price [EUR/MWh] BZN|NO2,Day-ahead Price [EUR/MWh] BZN|NO3,Day-ahead Price [EUR/MWh] BZN|NO5,Day-ahead Price [EUR/MWh] BZN|SE3
0,01.01.2022 00:00 - 01.01.2022 01:00,132.89,132.89,46.6,132.89,46.6
1,01.01.2022 01:00 - 01.01.2022 02:00,129.3,129.3,41.33,129.3,41.33
2,01.01.2022 02:00 - 01.01.2022 03:00,132.08,132.08,42.18,132.08,42.18
3,01.01.2022 03:00 - 01.01.2022 04:00,111.44,111.44,44.37,111.44,44.37
4,01.01.2022 04:00 - 01.01.2022 05:00,112.35,112.35,37.67,112.35,37.67


In [18]:
# TODO: handle missing values: NB! Bytt til det endelige datasettet, itte det for prices!!!!!!!!!

In [20]:
# checking data for missing values, if any
prices.isna().sum()

MTU (CET/CEST)                       0
Day-ahead Price [EUR/MWh] BZN|NO1    1
Day-ahead Price [EUR/MWh] BZN|NO2    1
Day-ahead Price [EUR/MWh] BZN|NO3    1
Day-ahead Price [EUR/MWh] BZN|NO5    1
Day-ahead Price [EUR/MWh] BZN|SE3    1
dtype: int64

In [17]:
# checking which row is missing values
null_data = prices[prices.isnull().any(axis=1)]

In [16]:
null_data

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh] BZN|NO1,Day-ahead Price [EUR/MWh] BZN|NO2,Day-ahead Price [EUR/MWh] BZN|NO3,Day-ahead Price [EUR/MWh] BZN|NO5,Day-ahead Price [EUR/MWh] BZN|SE3
2042,27.03.2022 02:00 - 27.03.2022 03:00,,,,,


This row is created as a result of the move from winter time to summer time