## Dazls model

Energy splitting method in openSTEF

It trains one splitting model which can be used for every prediction job. As input it uses data from multiple substations with known components and it outputs the prediction of solar and wind power for unkown target substations.

The model contains 2-steps which are deployed in sequence:
1. Domain model (any data-driven model can be used)
2. Adaptation model (any data-driven model can be used)

For reference, see: [dazls.rst](https://github.com/OpenSTEF/openstef/tree/main/docs/dazls.rst)

This notebook contains the code to train and save the model. The dazls_stored.sav file is being produced and is being used in openstef for the components forecast in the file: create_component_forecast, function: create_components_forecast_pipeline. Moreover, the notebook uses the model to generate a prediction for 1 out-of-sample csv file



In [1]:
import pandas as pd
import os
import joblib
import glob
from openstef.model.regressors.dazls import Dazls


path = "C:\\Users\\AL25802\\new energy splitting"
folder = ['\\prep_data\\']
combined_data = []
station_name = []


# Read prepared data
for file_name in glob.glob(path + folder[0] + '*.csv'):
    x = pd.read_csv(file_name, low_memory=False, parse_dates=["datetime"])
    x["datetime"] = pd.to_datetime(x["datetime"])
    x = x.set_index('datetime')
    x.columns=[x.lower() for x in x.columns]
    combined_data.append(x)
    sn = os.path.basename(file_name)
    station_name.append(sn[:len(sn) - 4])


# Split data in train and test (the first substation is being used for the testing)
training_data = pd.concat(combined_data[1:])
test_data = combined_data[0]
target_columns =['total_solar_part', 'total_wind_part']
feature_columns = [x for x in test_data.columns if x not in target_columns]
print('Testing station:',station_name[0])

Testing station: HAL


In [2]:
# Initialize DAZLS model
model = Dazls()
# Fit model
model.fit(training_data.loc[:,feature_columns], training_data.loc[:,target_columns])

In [3]:
# get predicted y
y = model.predict(test_data.loc[:,feature_columns])
# print prediction performance
model.score(test_data.loc[:,target_columns], y)

(1.5145981403632442, -0.3705595044428144)

In [4]:
result = test_data.loc[:,target_columns].copy()
result['wind_split'] = y[:,0]
result['solar_split'] = y[:,1]
result.iloc[60:]

Unnamed: 0_level_0,total_solar_part,total_wind_part,wind_split,solar_split
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2021-07-31 15:15:00+00:00,0,-2.09,-2.7220,1.829648e-14
2021-07-31 15:30:00+00:00,0,-1.63,-2.9870,1.829648e-14
2021-07-31 15:45:00+00:00,0,-1.54,-2.9660,1.829648e-14
2021-07-31 16:00:00+00:00,0,-1.47,-2.7985,1.829648e-14
2021-07-31 16:15:00+00:00,0,-1.43,-2.0330,1.829648e-14
...,...,...,...,...
2021-10-30 23:00:00+00:00,0,-0.43,-1.1175,1.829648e-14
2021-10-30 23:15:00+00:00,0,-0.40,-1.1945,1.829648e-14
2021-10-30 23:30:00+00:00,0,-0.44,-1.3020,1.829648e-14
2021-10-30 23:45:00+00:00,0,-0.42,-1.2155,1.829648e-14


In [5]:
# Load and store the model
filename = 'dazls_stored.sav'
joblib.dump(model, filename)

['dazls_stored.sav']