# Workshop part 2 | Learn how to make a forecast
In this second part of the workshop, we will use the model trained in the first part and make a forecast with it. 

Note: if you were not able to train the model in the first part, we have trained one for you. It is in this folder: ``mlflow_trained_model``. It will follow in this tutorial how to use this. 

The learning points are:
- Hands on experience with using a trained model; 
- What data is required to make a forecast;
- Hands on experience using forecast pipeline;
- How the model gets automatically loaded;
- How the predictions compare to the measurements.

In [1]:
# Import required packages.
import pandas as pd 
import numpy as np
import openstef
from openstef.pipeline.create_forecast import create_forecast_pipeline
from openstef.data_classes.prediction_job import PredictionJobDataClass

# Set plotly as the default pandas plotting backend.
pd.options.plotting.backend = 'plotly'

## Define the prediction job
The same as in workshop part 1, a prediction job has to be defined. As we are making a forecast for the model we trained in part 1, we can use the exact same prediction job. 

You can find the documentation [here](https://github.com/OpenSTEF/openstef/blob/main/openstef/data_classes/prediction_job.py).

In [2]:
# Define properties of training/prediction. We call this a 'prediction_job'. The same is used as in the first exercise.
pj = dict(id=287,
        model='xgb', 
        quantiles=[0.10,0.30,0.50,0.70,0.90],
        forecast_type="demand", 
        lat=53.0,
        lon=5.7,
        horizon_minutes=2880,
        resolution_minutes=15,
        name="workshop_exercise_2",
        save_train_forecasts=True,
       )

pj=PredictionJobDataClass(**pj)

## Prepare the input data
Some other preparation of the input data is required for making a forecast. Namely, split into a test and train data set. 

Exercise: 
- Split the data into a train and test data set. Where the train dataset contains everything except the final 192 rows, and the test dataset contains this final 192 rows; 


Hint: you can look at this [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html). 


In [3]:
input_data=pd.read_csv("../data/input_data_sun_heavy.csv", index_col=0, parse_dates=True)

train_data=input_data.iloc[:-192,:] # Everything except the final 192 rows for training.
test_data=input_data.iloc[-192:,:] # Final 192 rows for testing.

In [4]:
assert len(test_data)==192, "test data is of invalid length"

In [5]:
# Prepare data to make the forecast. 
realised=input_data.loc[test_data.index, 'load'].copy(deep=True)
to_forecast_data=input_data.copy(deep=True)
to_forecast_data.loc[test_data.index, 'load']=np.nan #clear the load data for the part you want to forecast

## Make the prediction
Now that the prediction job has been defined, a model has been trained and the input data is prepared, a forecast can be made. 

Exercise: 
- Using the prediction job, trained model and to_forecast_data, make a forecast with the OpenSTEF pipeline;
- How long did it take to make a forecast?

Hint: look-up the correct pipeline on the OpenSTEF [website](https://openstef.github.io/openstef/user_guides.html).




In [6]:
# Location where the model was stored in the last exercise.
mlflow_tracking_uri=r"./mlflow_trained_models" 

forecast=openstef.pipeline.create_forecast.create_forecast_pipeline(
    pj,
    to_forecast_data, 
    mlflow_tracking_uri,
)

2024-02-19 09:03:22 [debug    ] MLflow tracking uri at init= ./mlflow_trained_models
2024-02-19 09:03:22 [info     ] Model successfully loaded with MLflow
2024-02-19 09:03:22 [info     ] Found 214 values of constant load (repeated values), converted to NaN value. cleansing_step=repeated_values frac_values=0.0061071316457863645 num_values=214 pj_id=287
2024-02-19 09:03:24 [info     ] Postproces in preparation of storing


# Inspect the results
Now that the forecast has been made, the results can be analysed. 

Exercise: answer the following questions 
- Look at the results, when is the model accurate and when is it less accurate? Why?
- Look at the two weather features plotted, do you see correlation? 

In [7]:
display(forecast.head())

Unnamed: 0,forecast,tAhead,stdev,quantile_P10,quantile_P30,quantile_P50,quantile_P70,quantile_P90,pid,customer,description,type,algtype
2023-12-30 00:15:00+00:00,2.51984,-1231.75,0.087013,2.408328,2.47421,2.51984,2.565469,2.631351,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-advanced/...
2023-12-30 00:30:00+00:00,2.302258,-1231.5,0.087013,2.190746,2.256628,2.302258,2.347888,2.41377,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-advanced/...
2023-12-30 00:45:00+00:00,2.302258,-1231.25,0.087013,2.190746,2.256628,2.302258,2.347888,2.41377,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-advanced/...
2023-12-30 01:00:00+00:00,2.407427,-1231.0,0.086473,2.296608,2.362081,2.407427,2.452773,2.518246,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-advanced/...
2023-12-30 01:15:00+00:00,2.058113,-1230.75,0.086473,1.947294,2.012766,2.058113,2.103459,2.168932,287,workshop_exercise_2,,demand,/c:/repos/openstef-workshop/workshop-advanced/...


In [8]:
fig_forecast_realised=pd.concat([forecast["forecast"], realised], axis=1).plot()
fig_forecast_realised.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Load [MW]"
)
display(fig_forecast_realised.show())

None

In [9]:
# Look at the normalized plots of both the radiation and forecast, do you recognize any paterns?

fig_forecast_radiation=pd.concat(
    [
        test_data["radiation"]/max(test_data["radiation"]),
        forecast["forecast"]/max(forecast["forecast"])
    ], axis=1).plot()
fig_forecast_radiation.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Normalized values"
)
display(fig_forecast_radiation.show())

None

In [10]:
fig_forecast_windspeed=pd.concat(
    [
        test_data["windspeed"]/max(test_data["windspeed"]),
        forecast["forecast"]/max(forecast["forecast"])
    ], axis=1).plot()
fig_forecast_windspeed.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Normalized values"
)
display(fig_forecast_windspeed.show())

None

## Alter the input data 
In the code below, the radiation input data is divided by ten and thereafter a forecast is made with this new input data. Thus, with the same prediction job and trained model, a forecast is made using ten percent of the 'sunshine' as input. 

Exercise: answer the following question: 
- What happens to the forecast when the radiation is divided by ten? Why? 


In [11]:
# Divide the radiation data by ten.
to_forecast_data_rad=to_forecast_data.copy()
to_forecast_data_rad['radiation']=0.1*(to_forecast_data['radiation'])

# Make a forecast with this new input data.
mlflow_tracking_uri=r"./mlflow_trained_models" 

forecast_rad=create_forecast_pipeline(
    pj,
    to_forecast_data_rad, 
    mlflow_tracking_uri,
)

2024-02-19 09:03:25 [debug    ] MLflow tracking uri at init= ./mlflow_trained_models
2024-02-19 09:03:25 [info     ] Model successfully loaded with MLflow
2024-02-19 09:03:25 [info     ] Found 214 values of constant load (repeated values), converted to NaN value. cleansing_step=repeated_values frac_values=0.0061071316457863645 num_values=214 pj_id=287
2024-02-19 09:03:27 [info     ] Postproces in preparation of storing


In [12]:
# Inspect the results.
radiation_forecast_comparison = pd.DataFrame(
    test_data["radiation"]/max(test_data["radiation"])
)

radiation_forecast_comparison["forecast_full_radiation"] = forecast["forecast"]/max(forecast["forecast"])
radiation_forecast_comparison["forecast_half_radiation"] = forecast_rad["forecast"]/max(forecast_rad["forecast"])

fig_radiation_forecast_comparison=radiation_forecast_comparison.plot()

fig_radiation_forecast_comparison.update_layout(
    xaxis_title='Timestamp',
    yaxis_title="Normalized values"
)
display(fig_radiation_forecast_comparison.show())

None

## Bonus: Dashboard
Did you know that OpenSTEF has an eloborate dashboard which shows you everything you want to know about your forecast? Check it the dashboard documentation [here](https://raw.githack.com/OpenSTEF/.github/main/profile/html/openstef_dashboard_doc.html) . 

Which different in- and output components do you see in this dashboard? 
