# Data Preparation for the Nord_H2ub Spine Model

This jupyter notebook contains all routines for the preparation of the input data sources into a input data file for the model in Spine. 

**Authors:** Johannes Giehl (jfg.eco@cbs.dk)

## Import of packages

In [3]:
import numpy as np
import pandas as pd

## File paths

In [5]:
#set path to correct folders

excel_file_path = '../Input_data/Input_raw/'

In [6]:
#set name of the relevant files

PV_data_filename = 'PV_availability_factors_Kasso-v001-djh_2023_12_15.xlsx'
Model_structure_file = 'Model_Data_Base.xlsx'

## Workflow of the data preparation

- general parameters
- data import
- data adjustments
- final data settings
- excel/csv export


### General parameters

In [10]:
date_index = pd.date_range(start='2018-01-01T00:00:00', end='2018-12-31T23:00:00', freq='H')
formatted_dates = date_index.strftime('%Y-%m-%dT%H:%M:%S')
df_formatted_dates = pd.DataFrame(formatted_dates)

In [11]:
df_PV_values = pd.read_excel(excel_file_path+PV_data_filename, skiprows=2, usecols=[0,1,2,3,4,5])

In [12]:
df_model_structure = pd.read_excel(excel_file_path + Model_structure_file, sheet_name='Units', index_col=None)

In [13]:
df_model_structure

Unnamed: 0,Unit,Input1,Input2,Output1,Output2,Capacity_existing,Capacity_max,Efficency,Relation_Input,Relation_Output,Cost_invest,Cost_OM,Cost_var
0,Electrolyzer,Power,Water,Hydrogen,Heat,,,1,1,1,1,1,1


In [17]:
Input1_nodes = df_model_structure['Input1'].tolist()
Input2_nodes = df_model_structure['Input2'].tolist()
Output1_nodes = df_model_structure['Output1'].tolist()
Output2_nodes = df_model_structure['Output2'].tolist()

# Combine values from both columns into a single list
All_nodes_list = Input1_nodes + Input2_nodes + Output1_nodes + Output2_nodes

# Create a list with unique entries
unique_nodes_list = list(set(All_nodes_list))

In [19]:
unique_nodes_list

['Water', 'Hydrogen', 'Heat', 'Power']

### Adjustments

In [12]:
#rename PV values columns
df_PV_values.rename(columns={'time': 'time [UTC}'}, inplace=True)
df_PV_values.rename(columns={'electricity': 'availability_factor'}, inplace=True)

In [13]:
column_names = {'DateTime': [None,None], 
                'Hydrogen_Kasso': ['node','demand'], 
                'E-Methanol_Kasso': ['node','demand'], 
                'Solar_Plant_Kasso': ['node','unit_availability_factor']}
df_time_series = pd.DataFrame(column_names, index=None)
#df_time_series.index.name = 'DateTime'


In [14]:
df_time_series.head()

Unnamed: 0,DateTime,Hydrogen_Kasso,E-Methanol_Kasso,Solar_Plant_Kasso
0,,node,node,node
1,,demand,demand,unit_availability_factor


In [26]:
df_time = pd.DataFrame(df_formatted_dates)
df_time_head = pd.DataFrame(df_time_series['DateTime'])
print(df_time_head)
print(df_time)

  DateTime
0     None
1     None
                        0
0     2018-01-01T00:00:00
1     2018-01-01T01:00:00
2     2018-01-01T02:00:00
3     2018-01-01T03:00:00
4     2018-01-01T04:00:00
...                   ...
8755  2018-12-31T19:00:00
8756  2018-12-31T20:00:00
8757  2018-12-31T21:00:00
8758  2018-12-31T22:00:00
8759  2018-12-31T23:00:00

[8760 rows x 1 columns]


In [28]:
Time_Total_df = pd.concat([df_time_head, df_time], ignore_index=True)
Time_Total_df

Unnamed: 0,DateTime,0
0,,
1,,
2,,2018-01-01T00:00:00
3,,2018-01-01T01:00:00
4,,2018-01-01T02:00:00
...,...,...
8757,,2018-12-31T19:00:00
8758,,2018-12-31T20:00:00
8759,,2018-12-31T21:00:00
8760,,2018-12-31T22:00:00
