# Data searching 

Task: Find prices for electricity (day-ahead) for Denmark. As discussed with Roman, these prices already include the different factors that affect them and are a straightforward way to predict el_price for tomorrow, using Darts (https://unit8co.github.io/darts/) library. Darts is a Python library for user-friendly forecasting and anomaly detection on time series. The energy price tomorrow will be determined by the price of electricity yesterday. 

Approach: Electricity Maps (https://www.electricitymaps.com) is an online platform that provides real-time information about the carbon intensity of electricity production and consumption in various countries and regions around the world. Electricity Maps aims to raise awareness about the carbon intensity of electricity and help individuals and organizations make more informed decisions about their energy consumption and carbon footprint. The platform uses data from a variety of sources, they fetch raw production data from public, free, and official sources. Here a link to a must-read brief explanation on how they trace the origin of electricity: https://www.electricitymaps.com/blog/flow-tracing. Reading so, we assume that the price of the electricity at a certain zone is very affected by the imports and exports of electricity at that exact point, which is also related to the supply and demand at that point. 

At their GitHub repo (https://github.com/electricitymaps/electricitymaps-contrib#readme),  we have found a FAQ which asks the following: Where does the data come from?  They provide a link (https://github.com/electricitymaps/electricitymaps-contrib#readme) where they have a readme file that explains all the different data sources and we have been able to allocate and track the data that suits for our project.  

As mentioned before, we need the electricity price one day ahead at Denmark (Zone 1 – DK1): provided by the ENTSO-E (https://transparency.entsoe.eu/) .

We download price data for the years 2020, 2021 and 2022, as csv files.

# Load the data

In [10]:
# import
import pandas as pd
import glob

In [22]:
# all csv 
csv_files = glob.glob('/Users/endikamichelenabanuelos/Desktop/AAU/M6/Semester project/M6_sem_project/Data/*.csv')

In [23]:
# df
df = pd.DataFrame()

In [24]:
# use a for loop to iterate through the CSV files and load each one into a DataFrame
for file in csv_files:
    temp_df = pd.read_csv(file)
    df = pd.concat([df, temp_df], ignore_index=True)

In [25]:
# reset the index of the DataFrame
df = df.reset_index(drop=True)

In [27]:
# df
df

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh],Currency,BZN|DK1
0,01.01.2020 00:00 - 01.01.2020 01:00,33.42,EUR,
1,01.01.2020 01:00 - 01.01.2020 02:00,31.77,EUR,
2,01.01.2020 02:00 - 01.01.2020 03:00,31.57,EUR,
3,01.01.2020 03:00 - 01.01.2020 04:00,31.28,EUR,
4,01.01.2020 04:00 - 01.01.2020 05:00,30.85,EUR,
...,...,...,...,...
26302,31.12.2022 19:00 - 31.12.2022 20:00,67.01,EUR,
26303,31.12.2022 20:00 - 31.12.2022 21:00,40.50,EUR,
26304,31.12.2022 21:00 - 31.12.2022 22:00,14.89,EUR,
26305,31.12.2022 22:00 - 31.12.2022 23:00,9.94,EUR,


In [22]:
# info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26307 entries, 0 to 26306
Data columns (total 4 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   MTU (CET/CEST)             26307 non-null  object 
 1   Day-ahead Price [EUR/MWh]  26304 non-null  float64
 2   Currency                   26304 non-null  object 
 3   BZN|DK1                    0 non-null      float64
dtypes: float64(2), object(2)
memory usage: 822.2+ KB


In [37]:
# columns
df.columns

Index(['MTU (CET/CEST)', 'Day-ahead Price [EUR/MWh]', 'Currency', 'BZN|DK1'], dtype='object')

In [39]:
# describe
df.describe()

Unnamed: 0,Day-ahead Price [EUR/MWh],BZN|DK1
count,26304.0,0.0
mean,110.641208,
std,122.782189,
min,-58.8,
25%,28.7675,
50%,63.34,
75%,152.835,
max,871.0,


# Data preprocessing

In [30]:
df

Unnamed: 0,MTU (CET/CEST),Day-ahead Price [EUR/MWh],Currency,BZN|DK1
0,01.01.2020 00:00 - 01.01.2020 01:00,33.42,EUR,
1,01.01.2020 01:00 - 01.01.2020 02:00,31.77,EUR,
2,01.01.2020 02:00 - 01.01.2020 03:00,31.57,EUR,
3,01.01.2020 03:00 - 01.01.2020 04:00,31.28,EUR,
4,01.01.2020 04:00 - 01.01.2020 05:00,30.85,EUR,
...,...,...,...,...
26302,31.12.2022 19:00 - 31.12.2022 20:00,67.01,EUR,
26303,31.12.2022 20:00 - 31.12.2022 21:00,40.50,EUR,
26304,31.12.2022 21:00 - 31.12.2022 22:00,14.89,EUR,
26305,31.12.2022 22:00 - 31.12.2022 23:00,9.94,EUR,


In [35]:
# Split the datetime column into separate d

ValueError: Columns must be same length as key