# Dazls model

Energy splitting method in openSTEF

It trains one splitting model which can be used for every prediction job. As input it uses data from multiple substations with known components and it outputs the prediction of solar and wind power for unkown target substations.

The model contains 2-steps which are deployed in sequence:
1. Domain model (any data-driven model can be used)
2. Adaptation model (any data-driven model can be used)

#### For reference, see: [dazls.rst](https://github.com/OpenSTEF/openstef/tree/main/docs/dazls.rst)

### This notebook contains the code to:

1. preprocess the data which are going to be used for the model

We use the "combined_data" folder, which contains the raw data, to preprocess the data and save it in the folder "prep_data". After this, the preprocessed data with metadata can be found in the "prep_data" file in the path we have set. Then we use the prep_data to run the dazls model.

2. train dazls and generate a prediction for 1 out-of-sample csv file 


3. load and store the model

The dazls_stored.sav file is being produced and is being used in openstef for the components forecast ([create_component_forecast](https://github.com/OpenSTEF/openstef/blob/main/openstef/pipeline/create_component_forecast.py))

#####  Preprocess the data. Create the "prep_data" folder

In [4]:
import numpy as np
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
import glob
import pandas as pd
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsRegressor
import random 
from sklearn.utils import shuffle
import joblib

#Seed and preparation
random.seed(999)
np.random.seed(999)

#Path
path = "C:\\Users\\AL25802\\new energy splitting"
folder=['\\combined_data\\']
combined_data=[]
station_name=[]


#Read, create metadata and save in prep_data folder
for file_name in glob.glob(path+folder[0]+'*.csv'):
    
    #Read and fill missing values
    x = pd.read_csv(file_name, low_memory=False,parse_dates=["datetime"],index_col=0)
    x["datetime"]=pd.to_datetime(x["datetime"])
    x=x.set_index('datetime')
    x.replace([np.inf, -np.inf], np.nan, inplace=True)
    x=x.interpolate(method='ffill')

    ## Get variance metadata ####
    var=x.iloc[:,:3].var()
    for i in range(3):
        x.loc[:, 'var'+str(i)] = var.iloc[i]
    ### end get variance ####

    ## Get sem metadata ####
    sem=x.iloc[:,:3].sem()
    for i in range(3):
        x.loc[:, 'sem'+str(i)] = sem.iloc[i]
    ### end get sem ####


    ## Get min-max capacity physical metadata ####
    mini=x.iloc[:,3:5].min()
    maxi=x.iloc[:,3:5].max()
    for i in range(2):
        x.loc[:, 'min'+str(i)] = mini.iloc[i]
    for i in range(2):
        x.loc[:, 'max'+str(i)] = maxi.iloc[i]    
    ### end get sem ####    
    
    combined_data.append(x)
    sn=os.path.basename(file_name)
    station_name.append(sn[:len(sn)-4])
    x.to_csv(path+"\\prep_data\\"+sn, index=True)

In [5]:
from openstef.model.regressors.dazls import Dazls

path = "C:\\Users\\AL25802\\new energy splitting"
folder = ['\\prep_data\\']
combined_data = []
station_name = []


# Read prepared data
for file_name in glob.glob(path + folder[0] + '*.csv'):
    x = pd.read_csv(file_name, low_memory=False, parse_dates=["datetime"])
    x["datetime"] = pd.to_datetime(x["datetime"])
    x = x.set_index('datetime')
    x.columns=[x.lower() for x in x.columns]
    combined_data.append(x)
    sn = os.path.basename(file_name)
    station_name.append(sn[:len(sn) - 4])


# Split data in train and test (the first substation is being used for the testing)
training_data = pd.concat(combined_data[1:])
test_data = combined_data[0]
target_columns =['total_solar_part', 'total_wind_part']
feature_columns = [x for x in test_data.columns if x not in target_columns]
print('Testing station:',station_name[0])

Testing station: HAL


##### Initialize DAZLS model

In [6]:
model = Dazls()
# Fit model
model.fit(training_data.loc[:,feature_columns], training_data.loc[:,target_columns])

##### Generate a prediction

In [7]:
# get predicted y
y = model.predict(test_data.loc[:,feature_columns])
# print prediction performance
model.score(test_data.loc[:,target_columns], y)

(1.5145981403632442, -0.3705595044428144)

In [8]:
result = test_data.loc[:,target_columns].copy()
result['wind_split'] = y[:,0]
result['solar_split'] = y[:,1]
result.iloc[60:]

Unnamed: 0_level_0,total_solar_part,total_wind_part,wind_split,solar_split
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-07-31 15:15:00+00:00,0,-2.09,-2.7220,1.829648e-14
2021-07-31 15:30:00+00:00,0,-1.63,-2.9870,1.829648e-14
2021-07-31 15:45:00+00:00,0,-1.54,-2.9660,1.829648e-14
2021-07-31 16:00:00+00:00,0,-1.47,-2.7985,1.829648e-14
2021-07-31 16:15:00+00:00,0,-1.43,-2.0330,1.829648e-14
...,...,...,...,...
2021-10-30 23:00:00+00:00,0,-0.43,-1.1175,1.829648e-14
2021-10-30 23:15:00+00:00,0,-0.40,-1.1945,1.829648e-14
2021-10-30 23:30:00+00:00,0,-0.44,-1.3020,1.829648e-14
2021-10-30 23:45:00+00:00,0,-0.42,-1.2155,1.829648e-14


##### Load and store the model

In [9]:
filename = 'dazls_stored.sav'
joblib.dump(model, filename)

['dazls_stored.sav']