# **Amazon Lookout for Equipment** - Demonstration on an anonymized expander dataset
*Part 3: Model training*

In [1]:
BUCKET = '<YOUR_BUCKET_NAME_HERE>'
PREFIX = 'data'

### Notebook configuration update
Amazon Lookout for Equipment being a very recent service, we need to make sure that we have access to the latest version of the AWS Python packages. If you see a `pip` dependency error, check that the `boto3` version is ok: if it's greater than 1.17.48 (the first version that includes the `lookoutequipment` API), you can discard this error and move forward with the next cell:

In [None]:
!pip3 install --quiet --upgrade boto3 tqdm sagemaker

import boto3
print(f'boto3 version: {boto3.__version__} (should be >= 1.17.48 to include Lookout for Equipment API)')

# Restart the current notebook to ensure we take into account the previous updates:
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

### Imports

In [3]:
import boto3
import os
import pandas as pd
import sagemaker
import sys
import warnings

# Helper functions for managing Lookout for Equipment API calls:
sys.path.append('../utils')
import lookout_equipment_utils as lookout

### Parameters

In [4]:
warnings.filterwarnings('ignore')

DATA       = os.path.join('..', 'data')
LABEL_DATA = os.path.join(DATA, 'labelled-data')
TRAIN_DATA = os.path.join(DATA, 'training-data', 'expander')

ROLE_ARN = sagemaker.get_execution_role()
REGION_NAME = boto3.session.Session().region_name

Based on our previous analysis, we will use the following time ranges:

* **Train set:** 1st January 2015 - 31st August 2015: Lookout for Equipment needs at least 180 days of training data. March is one of the anomaly period tagged in the label, so this should not change the modeling behaviour.
* **Test set:** 1st September 2015 - 30th November 2015 *(this test set should include both normal and abnormal data to evaluate our model on)*

In [5]:
# Loading time ranges:
timeranges_fname = os.path.join(DATA, 'timeranges.txt')
with open(timeranges_fname, 'r') as f:
    timeranges = f.readlines()
    
training_start   = pd.to_datetime(timeranges[0][:-1])
training_end     = pd.to_datetime(timeranges[1][:-1])
evaluation_start = pd.to_datetime(timeranges[2][:-1])
evaluation_end   = pd.to_datetime(timeranges[3][:-1])

print(f'Training period: from {training_start} to {training_end}')
print(f'Evaluation period: from {evaluation_start} to {evaluation_end}')

dataset_fname = os.path.join(DATA, 'dataset_name.txt')
with open(dataset_fname, 'r') as f:
    DATASET_NAME = f.readline()
    
print('Dataset used:', DATASET_NAME)

Training period: from 2015-01-01 00:00:00 to 2015-08-31 23:59:00
Evaluation period: from 2015-09-01 00:00:00 to 2015-11-30 23:59:00
Dataset used: lookout-demo-training-dataset-v4


## Model training
---

In [6]:
# Prepare the model parameters:
lookout_model = lookout.LookoutEquipmentModel(model_name='lookout-demo-model-v1',
                                              dataset_name=DATASET_NAME,
                                              region_name=REGION_NAME)

# Set the training / evaluation split date:
lookout_model.set_time_periods(evaluation_start,
                               evaluation_end,
                               training_start,
                               training_end)

# Set the label data location:
lookout_model.set_label_data(bucket=BUCKET, 
                             prefix=f'{PREFIX}/labelled-data/',
                             access_role_arn=ROLE_ARN)

# This sets up the rate the service will resample the data before 
# training:
lookout_model.set_target_sampling_rate(sampling_rate='PT5M')

In [7]:
# Actually create the model and train it:
lookout_model.train()

{'ModelArn': 'arn:aws:lookoutequipment:eu-west-1:123031033346:model/lookout-demo-model-v4/c51bd00d-0da4-4ffa-b7cd-25a9c449373b',
 'Status': 'IN_PROGRESS',
 'ResponseMetadata': {'RequestId': '9f186be8-5371-40fa-b41b-21d7de5dbcbe',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '9f186be8-5371-40fa-b41b-21d7de5dbcbe',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '150',
   'date': 'Fri, 16 Apr 2021 20:15:13 GMT'},
  'RetryAttempts': 0}}

A training is now in progress as captured by the console:
    
![Training in progress](../assets/model-training-in-progress.png)

Use the following cell to capture the model training progress. **This model should take around an hour to be trained.** Key drivers for training time are:
* Number of labels in the label dataset (if provided)
* Number of datapoints: this number depends on the sampling rate, the number of time series and the time range.

In [None]:
lookout_model.poll_model_training()

A model is now training and we can visualize the results of the back testing on the evaluation window selected at the beginning on this notebook:

![Training complete](../assets/model-training-complete.png)

## Conclusion
---
In this notebook, we use the dataset created in part 2 of this notebook series and trained a Lookout for Equipment model.

From here you can either head:
* To the next notebook where we will **extract the evaluation data** for this model and use it to perform further analysis on the model results.
* Or to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results.