In [1]:
! pip install openstef==3.4.7



In Google Collab, the pandas version has to be set to 1.5.3 due to compatability reasons. 

In [2]:
! pip install pandas==1.5.3

Collecting pandas==1.5.3

  You can safely remove it manually.
  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
openstef 3.4.7 requires pandas==2.1.3, but you have pandas 1.5.3 which is incompatible.



  Downloading pandas-1.5.3-cp310-cp310-win_amd64.whl.metadata (12 kB)
Using cached pandas-1.5.3-cp310-cp310-win_amd64.whl (10.4 MB)
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 2.1.3
    Uninstalling pandas-2.1.3:
      Successfully uninstalled pandas-2.1.3
Successfully installed pandas-1.5.3


# Workshop part 2 | Learn how to make a forecast
In this second part of the workshop, we will use the model trained in the first part and make a forecast with it. 

Note: if you were not able to train the model in the first part, we have trained one for you. It is in this folder: ``mlflow_trained_models``. It should automatically work in this tutorial.  

The learning points are:
- Hands on experience with using a trained model; 
- What data is required to make a forecast;
- Hands on experience using forecast pipeline;
- How the model gets automatically loaded;
- How the predictions compare to the measurements.

In [3]:
# Import required packages.
import pandas as pd 
import numpy as np 

from openstef.data_classes.prediction_job import PredictionJobDataClass
from openstef.pipeline.train_model import train_model_pipeline
from openstef.pipeline.create_forecast import create_forecast_pipeline
import openstef
# Set plotly as the default pandas plotting backend
pd.options.plotting.backend = 'plotly'

  from .autonotebook import tqdm as notebook_tqdm


2024-02-19 09:18:44 [info     ] Proloaf not available, setting constructor to None


## Define the prediction job
The same as in workshop part 1, a prediction job has to be defined. As we are making a forecast for the model we trained in part 1, we can use the exact same prediction job. 

In [4]:
# Define properties of training/prediction. We call this a 'prediction_job'. The same is used as in the first exercise.
pj = dict(id=287,
        model='xgb', 
        quantiles=[0.10,0.30,0.50,0.70,0.90],
        forecast_type="demand", 
        lat=52.0,
        lon=5.0,
        horizon_minutes=2880,
        resolution_minutes=15,
        name="workshop_exercise_2",
        save_train_forecasts=True,
       )

pj=PredictionJobDataClass(**pj)

## Prepare the input data
Some other preparation of the input data is required for making a forecast. Namely, split into a test and train data set. 

Exercise: 
- Why do we split the dataset into train and test? 
- Why do we set the 'load' (the realised values) to nan (unknown) for the 'to_forecast_data'? 

If you are working with Google Collab, just upload the data in the 'Files' section on Google Collab. You can find this at the left toolbar, the fifth item from the top. 

If you are working with another program, please alter the path below to upload the correct data.

In [6]:
input_data=pd.read_csv("/content/input_data_sun_heavy.csv", index_col=0, parse_dates=True)
# Uncomment this line if you are not working with Google Colab but on your own device
# input_data=pd.read_csv("../data/input_data_sun_heavy.csv", index_col=0, parse_dates=True)


train_data=input_data.iloc[:-192,:] # Everything except the final 192 rows for training.
test_data=input_data.iloc[-192:,:] # Final 192 rows for testing.

In [7]:
# Prepare data to make the forecast. 
realised=input_data.loc[test_data.index, 'load'].copy(deep=True)
to_forecast_data=input_data.copy(deep=True)
to_forecast_data.loc[test_data.index, 'load']=np.nan #clear the load data for the part you want to forecast

In [8]:
# If you are working with Google Collab, storing and retrieving the model from the previous workshop is more difficult. 
# If you are working on your own device, this is not needed. As OpenSTEF is able to store and automatically retrieve your trained model. 
train_data_model, validation_data_model, test_data_model = openstef.pipeline.train_model.train_model_pipeline(
    pj,
    train_data,
    check_old_model_age=False, 
    mlflow_tracking_uri="./mlflow_trained_models",
    artifact_folder="./mlflow_artifacts",
)

2024-02-19 09:19:01 [debug    ] MLflow tracking uri at init= ./mlflow_trained_models
2024-02-19 09:19:01 [info     ] Model successfully loaded with MLflow
2024-02-19 09:19:01 [info     ] Found 22 values of constant load (repeated values), converted to NaN value. cleansing_step=repeated_values frac_values=0.0006312950156388993 num_values=22 pj_id=287
2024-02-19 09:19:01 [info     ] Removed 22 NaN values          num_removed_values=22
2024-02-19 09:19:13 [info     ] New model is better than old model, continuing with training procces
2024-02-19 09:19:17 [info     ] Model saved with MLflow        experiment_name=287
2024-02-19 09:19:20 [info     ] Logged figures to MLflow.     
2024-02-19 09:19:20 [info     ] Writing reports to ./mlflow_artifacts\287


## Make the prediction
Now that the prediction job has been defined, a model has been trained and the input data is prepared, a forecast can be made. 

Exercise: 
- What input do you need to make a forecast?
- How long did it take to make a forecast?

Bonus: look-up the correct pipeline on the OpenSTEF [website](https://openstef.github.io/openstef/user_guides.html).




In [9]:
# Location where the model was stored in the last exercise.
mlflow_tracking_uri="./mlflow_trained_models" 

forecast=openstef.pipeline.create_forecast.create_forecast_pipeline(
    pj,
    to_forecast_data, 
    mlflow_tracking_uri,
)

2024-02-19 09:19:25 [debug    ] MLflow tracking uri at init= ./mlflow_trained_models
2024-02-19 09:19:25 [info     ] Model successfully loaded with MLflow
2024-02-19 09:19:25 [info     ] Found 214 values of constant load (repeated values), converted to NaN value. cleansing_step=repeated_values frac_values=0.0061071316457863645 num_values=214 pj_id=287
2024-02-19 09:19:28 [info     ] Postproces in preparation of storing


# Inspect the results
Now that the forecast has been made, the results can be analysed. 

Exercise: answer the following questions 
- Look at the results, when is the model accurate and when is it less accurate? Why?
- Look at the two weather features plotted, do you see correlation? 

In [10]:
display(forecast.head())

Unnamed: 0,forecast,tAhead,stdev,quantile_P10,quantile_P30,quantile_P50,quantile_P70,quantile_P90,pid,customer,description,type,algtype
2023-12-30 00:15:00+00:00,2.54344,-1232.0,0.080301,2.440529,2.50133,2.54344,2.58555,2.64635,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-beginner/...
2023-12-30 00:30:00+00:00,2.188077,-1231.75,0.080301,2.085166,2.145967,2.188077,2.230187,2.290987,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-beginner/...
2023-12-30 00:45:00+00:00,2.188077,-1231.5,0.080301,2.085166,2.145967,2.188077,2.230187,2.290987,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-beginner/...
2023-12-30 01:00:00+00:00,2.198698,-1231.25,0.083105,2.092194,2.155117,2.198698,2.242278,2.305201,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-beginner/...
2023-12-30 01:15:00+00:00,2.156182,-1231.0,0.083105,2.049678,2.112602,2.156182,2.199763,2.262686,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-beginner/...


In [11]:
fig_forecast_realised=pd.concat([forecast["forecast"], realised], axis=1).plot()
fig_forecast_realised.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Load [MW]"
)
display(fig_forecast_realised.show())

None

In [12]:
# Look at the normalized plots of both the radiation and forecast, do you recognize any paterns?

fig_forecast_radiation=pd.concat(
    [
        test_data["radiation"]/max(test_data["radiation"]),
        forecast["forecast"]/max(forecast["forecast"])
    ], axis=1).plot()
fig_forecast_radiation.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Normalized values"
)
display(fig_forecast_radiation.show())

None

In [13]:
fig_forecast_windspeed=pd.concat(
    [
        test_data["windspeed"]/max(test_data["windspeed"]),
        forecast["forecast"]/max(forecast["forecast"])
    ], axis=1).plot()
fig_forecast_windspeed.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Normalized values"
)
fig_forecast_windspeed.show()

## Alter the input data 
In the code below, the radiation input data is divided by ten and thereafter a forecast is made with this new input data. The prediction job and trained model stay the same. Thus, the same model is used with half the sunshine as input.

Exercise: answer the following question: 
- What happens to the forecast when the radiation is divided by ten? Why? 


In [14]:
# Divide the radiation data by two.
to_forecast_data_rad=to_forecast_data.copy()
to_forecast_data_rad['radiation']=0.1*(to_forecast_data['radiation'])

# Make a forecast with this new input data.
mlflow_tracking_uri=r"./mlflow_trained_models" 

forecast_rad=create_forecast_pipeline(
    pj,
    to_forecast_data_rad, 
    mlflow_tracking_uri,
)

2024-02-19 09:19:41 [debug    ] MLflow tracking uri at init= ./mlflow_trained_models
2024-02-19 09:19:41 [info     ] Model successfully loaded with MLflow
2024-02-19 09:19:41 [info     ] Found 214 values of constant load (repeated values), converted to NaN value. cleansing_step=repeated_values frac_values=0.0061071316457863645 num_values=214 pj_id=287
2024-02-19 09:19:44 [info     ] Postproces in preparation of storing


In [15]:
# Inspect the results.
radiation_forecast_comparison = pd.DataFrame(
    test_data["radiation"]/max(test_data["radiation"])
)

radiation_forecast_comparison["forecast_with_full_radiation"] = forecast["forecast"]/max(forecast["forecast"])
radiation_forecast_comparison["forecast_with_half_radiation"] = forecast_rad["forecast"]/max(forecast_rad["forecast"])

fig_radiation_forecast_comparison=radiation_forecast_comparison.plot()

fig_radiation_forecast_comparison.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Normalized values"
)
display(fig_radiation_forecast_comparison.show())

None

## Bonus: Dashboard
Did you know that OpenSTEF has an eloborate dashboard which shows you everything you want to know about your forecast? Check it the dashboard documentation [here](https://raw.githack.com/OpenSTEF/.github/main/profile/html/openstef_dashboard_doc.html) . 

Which different in- and output components do you see in this dashboard? 
