# Train a model
In this example notebook, a model is trained for a location with id '287'. The data for this location can be found in the 'data' folder. 
First, the prediction job will be defined, which contains the properties of the training and prediction. For example the time horizon, machine learning model and location of the forecast are defined in the prediction job. 
Thereafter, the model can be trained using the input data and prediction job by the ```train_model_pipeline()```. 

In [None]:
import pandas as pd
from IPython.display import IFrame
from openstef.pipeline.train_model import train_model_pipeline
from openstef.pipeline.create_forecast import create_forecast_pipeline
from openstef.data_classes.prediction_job import PredictionJobDataClass

## Prepare for training
Before a model can be trained, the specifications and data need to be defined. The specification of the model are defined in the prediction job (pj), where for example the machine learning model, latitude, longtide and forecast horizon are specified. Furthermore, the data has to be retrieved from the csv file containing both load, weather and energy market data. 

In [None]:
# Define properties of training/prediction. We call this a 'prediction_job'
pj = dict(id=287,
        model='xgb', 
        quantiles=[10,30,50,70,90],
        forecast_type="demand",
        lat=52.0,
        lon=5.0,
        horizon_minutes=47*60,
        resolution_minutes=15,
        name="Example",          
        hyper_params={}, 
        feature_names=None, 
        default_modelspecs=None,
        save_train_forecasts=True,
       )
pj=PredictionJobDataClass(**pj)

# Load input data
input_data = pd.read_csv('data/get_model_input_pid_287.csv', index_col='index', parse_dates=True)

# Split in training and forecasting data. Everything except the last 20 rows will be used for training
train_data = input_data.iloc[:-200,:] # everything except last 200 rows (~ 48 hours)

In [None]:
# Print the train data. 
# For every timestamp, bot the load as well as feature data is available. 
display(train_data.head())

## Train a model
Train the model by using the high-level pipeline ```train_model_pipeline```. Store the model and reports on training proces in ./mlflow_artifacts and ./mlflow_trained_models by setting artifact_folder and mlflow_tracking_uri to this respective path. 

In [None]:
train, val, test=train_model_pipeline(
    pj,
    train_data,
    check_old_model_age=False,
    mlflow_tracking_uri="./mlflow_trained_models",
    artifact_folder="./mlflow_artifacts",
    )

Now, you can find the trained model in ./mlflow_trained_models, along with reports on the training process. Below the Predictor0.25 and Predictor47.0 plots are shown, as well as the weight plot. The predictor plots show the prediction of the train, test and validation data. The weight plot shows the importance and weight of every feature.

In [None]:
# Inspect local files
display(IFrame('./mlflow_artifacts/{}/Predictor0.25.html'.format(pj['id']), width=900, height=400))
display(IFrame('./mlflow_artifacts/{}/Predictor47.0.html'.format(pj['id']), width=800, height=400))
display(IFrame('./mlflow_artifacts/{}/weight_plot.html'.format(pj['id']), width=800, height=400))


## Visual Studio Code has difficulties with displaying htmls. If you are working with VSC and are not able to inspect the plots, uncomment the code below
## to open the plots in your browser.
# import webbrowser
# webbrowser.open(r'.\mlflow_artifacts\{}\Predictor0.25.html'.format(pj['id']))
# webbrowser.open(r'.\mlflow_artifacts\{}\Predictor47.0.html'.format(pj['id']))
# webbrowser.open(r'.\mlflow_artifacts\{}\weight_plot.html'.format(pj['id']))