# Data Preparation for the Nord_H2ub Spine Model

This jupyter notebook contains all routines for the preparation of the input data sources into a input data file for the model in Spine. 

**Authors:** Johannes Giehl (jfg.eco@cbs.dk)

## Import of packages

In [2]:
import numpy as np
import pandas as pd

## General setting

In [44]:
year = 2019   #change to desired year
start_date = pd.Timestamp(f'{year}-01-01 00:00:00')
end_date = pd.Timestamp(f'{year}-12-31 23:00:00')

area = 'DK1'   #change to desired area

## File paths

In [4]:
#set path to correct folders

excel_file_path = '../Input_data/Input_raw/'

In [5]:
#set name of the relevant files

PV_data_availabilityfactors = f'PV_availability_factors_Kasso_{year}.xlsx'
PV_data_powerprices = f'Day_ahead_prices_{year}.xlsx'

## Workflow of the data preparation

- general parameters
- data import
- data adjustments
- final data settings
- excel/csv export


### General parameters

In [6]:
#date index
date_index = pd.date_range(start=start_date, end=end_date, freq='H')
formatted_dates = date_index.strftime('%Y-%m-%dT%H:%M:%S')
df_formatted_dates = pd.DataFrame(formatted_dates, columns=['DateTime'])

df_time = pd.DataFrame(df_formatted_dates)

### Data import

In [12]:
df_PV_availabilityfactors_values = pd.read_excel(excel_file_path+PV_data_availabilityfactors, skiprows=2, usecols=[0,1,2,3,4,5])

In [46]:
df_PV_powerprices_total_values = pd.read_excel(excel_file_path+PV_data_powerprices)
#only extracting the prices from our earlier defined area
df_PV_powerprices_values = df_PV_powerprices_total_values[df_PV_powerprices_total_values['PriceArea'] == area]

### Adjustments

In [14]:
df_PV_availabilityfactors_values.rename(columns={'time': 'time [UTC]', 'electricity': 'unit_availability_factor'}, inplace=True)
df_PV_availabilityfactors_values.head()

Unnamed: 0,time [UTC],local_time,unit_availability_factor,irradiance_direct,irradiance_diffuse,temperature
0,2019-01-01 00:00:00,2019-01-01 01:00:00,0.0,0.0,0.0,7.426
1,2019-01-01 01:00:00,2019-01-01 02:00:00,0.0,0.0,0.0,7.517
2,2019-01-01 02:00:00,2019-01-01 03:00:00,0.0,0.0,0.0,7.611
3,2019-01-01 03:00:00,2019-01-01 04:00:00,0.0,0.0,0.0,7.716
4,2019-01-01 04:00:00,2019-01-01 05:00:00,0.0,0.0,0.0,7.582


In [37]:
df_PV_powerprices_values.rename(columns={'HourUTC': 'time [UTC]', 
                                         'HourDK': 'time [DK]'}, inplace=True)
df_PV_powerprices_values.head()

Unnamed: 0,time [UTC],time [DK],PriceArea,SpotPriceDKK,SpotPriceEUR
0,2018-12-31 23:00:00,2019-01-01,SE3,211.479996,28.32
1,2018-12-31 23:00:00,2019-01-01,DK1,211.479996,28.32
2,2018-12-31 23:00:00,2019-01-01,DK2,211.479996,28.32
3,2018-12-31 23:00:00,2019-01-01,SYSTEM,332.679993,44.549999
4,2018-12-31 23:00:00,2019-01-01,DE,211.479996,28.32


### Fitting data into format

Creating a joint table for demand and availability factors:

In [27]:
column_names_1 = {'DateTime': [None, None],
                'Hydrogen_Kasso': ['node','demand'], 
                'E-Methanol_Kasso': ['node','demand'], 
                'Solar_Plant_Kasso': ['node','unit_availability_factor']}
df_blank_table_1 = pd.DataFrame(column_names_1, index=None)

In [35]:
df_temp_1 = pd.DataFrame(columns=['DateTime', 'Hydrogen_Kasso', 'E-Methanol_Kasso', 'Solar_Plant_Kasso'])

df_temp_1['DateTime'] = df_time
df_temp_1['Hydrogen_Kasso'] = 0
df_temp_1['E-Methanol_Kasso'] = 25
df_temp_1['Solar_Plant_Kasso'] = df_PV_availabilityfactors_values['unit_availability_factor']

df_table_1 = pd.concat([df_blank_table_1, df_temp_1])

df_table_1.head()

Unnamed: 0,DateTime,Hydrogen_Kasso,E-Methanol_Kasso,Solar_Plant_Kasso
0,,node,node,node
1,,demand,demand,unit_availability_factor
0,2019-01-01T00:00:00,0,25,0.0
1,2019-01-01T01:00:00,0,25,0.0
2,2019-01-01T02:00:00,0,25,0.0


Creating a joint table for power prices and prices for district heating:

In [48]:
column_names_2 = {'DateTime': ['relationship class','connection','node','parameter name'],
                'Power_Wholesale_In': ['connection__from_node','power_line_Wholesale_Kasso','Power_Wholesale','connection_flow_cost'], 
                'Power_Wholesale_Out': ['connection__to_node','power_line_Wholesale_Kasso','Power_Wholesale','connection_flow_cost'], 
                'District_Heating': ['connection__to_node','pipeline_District_Heating','District_Heating','connection_flow_cost']}
df_blank_table_2 = pd.DataFrame(column_names_2, index=None)

In [49]:
df_temp_2 = pd.DataFrame(columns=['DateTime', 'Power_Wholesale_In', 'Power_Wholesale_Out', 'District_Heating'])

df_temp_2['DateTime'] = df_time
df_temp_2['Power_Wholesale_In'] = df_PV_powerprices_values['SpotPriceEUR']
df_temp_2['Power_Wholesale_Out'] = -1*df_PV_powerprices_values['SpotPriceEUR']
df_temp_2['District_Heating'] = -1

df_table_2 = pd.concat([df_blank_table_2, df_temp_2], ignore_index=True)

df_table_2

Unnamed: 0,DateTime,Power_Wholesale_In,Power_Wholesale_Out,District_Heating
0,relationship class,connection__from_node,connection__to_node,connection__to_node
1,connection,power_line_Wholesale_Kasso,power_line_Wholesale_Kasso,pipeline_District_Heating
2,node,Power_Wholesale,Power_Wholesale,District_Heating
3,parameter name,connection_flow_cost,connection_flow_cost,connection_flow_cost
4,2019-01-01T00:00:00,,,-1
...,...,...,...,...
8759,2019-12-31T19:00:00,,,-1
8760,2019-12-31T20:00:00,,,-1
8761,2019-12-31T21:00:00,,,-1
8762,2019-12-31T22:00:00,40.560001,-40.560001,-1


### Creating one combined excel and export