# **Amazon Lookout for Equipment**
*Part 3 - Model training*

### Notebook configuration update
Let's make sure that we have access to the latest version of the AWS Python packages. If you see a `pip` dependency error, check that the `boto3` version is ok: if it's greater than 1.17.48 (the first version that includes the `lookoutequipment` API), you can discard this error and move forward with the next cell:

In [None]:
import boto3
print(f'boto3 version: {boto3.__version__} (should be >= 1.17.48 to include Lookout for Equipment API)')

# Restart the current notebook to ensure we take into account the previous updates:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

### Imports

In [None]:
import config
import os
import pandas as pd
import sagemaker
import sys

# Helper functions for managing Lookout for Equipment API calls:
sys.path.append('../utils')
import lookout_equipment_utils as lookout

In [None]:
ROLE_ARN     = sagemaker.get_execution_role()
REGION_NAME  = boto3.session.Session().region_name
BUCKET       = config.BUCKET
PREFIX       = config.PREFIX_LABEL
DATASET_NAME = config.DATASET_NAME
MODEL_NAME   = config.MODEL_NAME

Based on the label time ranges, we will use the following time ranges:

* **Train set:** 1st January 2019 - 31st July 2019: Lookout for Equipment needs at least 180 days of training data and this period contains a few labelled ranges with some anomalies.
* **Evaluation set:** 1st August 2019 - 27th October 2019 *(this test set includes both normal and abnormal data to evaluate our model on)*

In [None]:
# Configuring time ranges:
training_start   = pd.to_datetime('2019-01-01 00:00:00')
training_end     = pd.to_datetime('2019-07-31 00:00:00')
evaluation_start = pd.to_datetime('2019-08-01 00:00:00')
evaluation_end   = pd.to_datetime('2019-10-27 00:00:00')

print(f'  Training period | from {training_start} to {training_end}')
print(f'Evaluation period | from {evaluation_start} to {evaluation_end}')

## Model training
---

In [None]:
# Prepare the model parameters:
lookout_model = lookout.LookoutEquipmentModel(model_name=MODEL_NAME,
                                              dataset_name=DATASET_NAME,
                                              region_name=REGION_NAME)

# Set the training / evaluation split date:
lookout_model.set_time_periods(evaluation_start,
                               evaluation_end,
                               training_start,
                               training_end)

# Set the label data location:
lookout_model.set_label_data(bucket=BUCKET, 
                             prefix=PREFIX,
                             access_role_arn=ROLE_ARN)

# This sets up the rate the service will resample the data before 
# training: we will keep the original sampling rate in this example
# (5 minutes), but feel free to use a larger sampling rate to accelerate 
# the training time:

# lookout_model.set_target_sampling_rate(sampling_rate='PT15M')

The following method encapsulates a call to the [**CreateModel**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_CreateModel.html) API:

In [None]:
# Actually create the model and train it:
lookout_model.train()

A training is now in progress as captured by the console:
    
![Training in progress](assets/create-model-training-in-progress.png)

Use the following cell to capture the model training progress. **This model should take around 30-45 minutes to be trained.** Key drivers for training time usually are:
* **Number of labels** in the label dataset (if provided)
* Number of datapoints: this number depends on the **sampling rate**, the **number of time series** and the **time range**.

The following method encapsulate a call to the [**DescribeModel**](https://docs.aws.amazon.com/lookout-for-equipment/latest/ug/API_DescribeModel.html) API and collect the model progress by looking at the `Status` field retrieved from this call:

In [None]:
lookout_model.poll_model_training(sleep_time=60)

A model is now trained and we can visualize the results of the back testing on the evaluation window selected at the beginning on this notebook:

![Training complete](assets/model-performance.png)

In the console, **you can click on each detected event**: Amazon Lookout for Equipment unpacks the ranking and display the top sensors contributing to the detected events.

When you open this window, the first event is already selected and this is the detailed view you will get from the console:

![Event details](assets/model-diagnostics.png)

This dataset contains 30 sensors:
* If each sensor contributed the same way to this event, every sensors would **equally contribute** to this event (said otherwise, every sensor would have a similar feature importance of `100% / 30 = 3.33%`).
* The top sensors (e.g. **Sensor19** with a **5.67% importance**) have a contribution that is significantly higher than this threshold, which is statistically relevant.
* If the model continues outputing detected anomalies with a similar ranking, this might push a maintenance operator to go and have a look at the associated components.

## Conclusion
---

In [None]:
# Needed for visualizing markdowns programatically
from IPython.display import display, Markdown

display(Markdown(
'''
<span style="color:green"><span style="font-size:50px">**Success!**</span></span>
<br/>
In this notebook, we use the dataset created in part 2 of this notebook series and trained an Amazon Lookout for Equipment model.

From here you can either head:
* To the next notebook where we will **extract the evaluation data** for this model and use it to perform further analysis on the model results: this is optional and just gives you some pointers on how to post-process and visualize the data provided by Amazon Lookout for Equipment.
* Or to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results.
'''))