# Data Preparation for the Nord_H2ub Spine Model

This jupyter notebook contains all routines for the preparation of the input data sources into a input data file for the model in Spine. 

**Authors:** Johannes Giehl (jfg.eco@cbs.dk)

## Import of packages

In [5]:
import numpy as np
import pandas as pd

## Year

In [6]:
year = 2019   #change to desired year
start_date = pd.Timestamp(f'{year}-01-01 00:00:00')
end_date = pd.Timestamp(f'{year}-12-31 23:00:00')

## File paths

In [11]:
#set path to correct folders

excel_file_path = '../Input_data/Input_raw/'

In [52]:
#set name of the relevant files

PV_data_availabilityfactors = f'PV_availability_factors_Kasso_{year}.xlsx'

## Workflow of the data preparation

- general parameters
- data import
- data adjustments
- final data settings
- excel/csv export


### General parameters

In [101]:
#date index
date_index = pd.date_range(start=start_date, end=end_date, freq='H')
formatted_dates = date_index.strftime('%Y-%m-%dT%H:%M:%S')
df_formatted_dates = pd.DataFrame(formatted_dates, columns=['DateTime'])

df_time = pd.DataFrame(df_formatted_dates)

### Data import

In [69]:
df_PV_availabilityfactors_values = pd.read_excel(excel_file_path+PV_data_availabilityfactors, skiprows=2, usecols=[0,1,2,3,4,5])

### Adjustments

In [70]:
df_PV_availabilityfactors_values.rename(columns={'time': 'time [UTC]', 'electricity': 'unit_availability_factor'}, inplace=True)
df_PV_availabilityfactors_values

Unnamed: 0,time [UTC],local_time,unit_availability_factor,irradiance_direct,irradiance_diffuse,temperature
0,2019-01-01 00:00:00,2019-01-01 01:00:00,0.0,0.0,0.0,7.426
1,2019-01-01 01:00:00,2019-01-01 02:00:00,0.0,0.0,0.0,7.517
2,2019-01-01 02:00:00,2019-01-01 03:00:00,0.0,0.0,0.0,7.611
3,2019-01-01 03:00:00,2019-01-01 04:00:00,0.0,0.0,0.0,7.716
4,2019-01-01 04:00:00,2019-01-01 05:00:00,0.0,0.0,0.0,7.582
...,...,...,...,...,...,...
8755,2019-12-31 19:00:00,2019-12-31 20:00:00,0.0,0.0,0.0,1.257
8756,2019-12-31 20:00:00,2019-12-31 21:00:00,0.0,0.0,0.0,1.102
8757,2019-12-31 21:00:00,2019-12-31 22:00:00,0.0,0.0,0.0,1.114
8758,2019-12-31 22:00:00,2019-12-31 23:00:00,0.0,0.0,0.0,1.217


In [117]:
#creating a joint table for demand and availability factors
column_names = {'DateTime': [None, None],
                'Hydrogen_Kasso': ['node','demand'], 
                'E-Methanol_Kasso': ['node','demand'], 
                'Solar_Plant_Kasso': ['node','unit_availability_factor']}
df_blank_table = pd.DataFrame(column_names, index=None)

df_blank_table.head()

Unnamed: 0,DateTime,Hydrogen_Kasso,E-Methanol_Kasso,Solar_Plant_Kasso
0,,node,node,node
1,,demand,demand,unit_availability_factor


Now I am adding the values underneath DateTime. This is very easy and can also be used to add all the values from the existing excels under the respective columns, e.g. the availability factor.

In [118]:
df_temp = pd.DataFrame(columns=['DateTime', 'Hydrogen_Kasso', 'E-Methanol_Kasso', 'Solar_Plant_Kasso'])

df_temp['DateTime'] = df_time
df_temp['Solar_Plant_Kasso'] = df_PV_availabilityfactors_values['unit_availability_factor']

df_table = pd.concat([df_blank_table, df_temp])

df_table

Unnamed: 0,DateTime,Hydrogen_Kasso,E-Methanol_Kasso,Solar_Plant_Kasso
0,,node,node,node
1,,demand,demand,unit_availability_factor
0,2019-01-01T00:00:00,,,0.0
1,2019-01-01T01:00:00,,,0.0
2,2019-01-01T02:00:00,,,0.0
...,...,...,...,...
8755,2019-12-31T19:00:00,,,0.0
8756,2019-12-31T20:00:00,,,0.0
8757,2019-12-31T21:00:00,,,0.0
8758,2019-12-31T22:00:00,,,0.0
