# **Amazon Lookout for Equipment** - Demonstration on an anonymized expander dataset
*Part 3: Model training*

In [6]:
BUCKET = 'l4e-lookout-equipment-demo2'
PREFIX = 'data4'

### Notebook configuration update
Amazon Lookout for Equipment being a very recent service, we need to make sure that we have access to the latest version of the AWS Python packages. If you see a `pip` dependency error, check that the `boto3` version is ok: if it's greater than 1.17.48 (the first version that includes the `lookoutequipment` API), you can discard this error and move forward with the next cell:

In [1]:

import boto3
print(f'boto3 version: {boto3.__version__} (should be >= 1.17.48 to include Lookout for Equipment API)')



boto3 version: 1.17.96 (should be >= 1.17.48 to include Lookout for Equipment API)


### Imports

In [2]:
import boto3
import os
import pandas as pd
import sagemaker
import sys
import warnings

# Helper functions for managing Lookout for Equipment API calls:
sys.path.append('../utils')
import lookout_equipment_utils as lookout

### Parameters

In [3]:
warnings.filterwarnings('ignore')

DATA       = os.path.join('..', 'data')
LABEL_DATA = os.path.join(DATA, 'labelled-data')
TRAIN_DATA = os.path.join(DATA, 'training-data', 'expander')

ROLE_ARN = "arn:aws:iam::831520308310:role/l4e-role"
REGION_NAME = boto3.session.Session().region_name

Based on our previous analysis, we will use the following time ranges:

* **Train set:** 1st January 2015 - 31st August 2015: Lookout for Equipment needs at least 180 days of training data. March is one of the anomaly period tagged in the label, so this should not change the modeling behaviour.
* **Test set:** 1st September 2015 - 30th November 2015 *(this test set should include both normal and abnormal data to evaluate our model on)*

In [4]:
# Loading time ranges:
timeranges_fname = os.path.join(DATA, 'timeranges.txt')
with open(timeranges_fname, 'r') as f:
    timeranges = f.readlines()
    
training_start   = pd.to_datetime(timeranges[0][:-1])
training_end     = pd.to_datetime(timeranges[1][:-1])
evaluation_start = pd.to_datetime(timeranges[2][:-1])
evaluation_end   = pd.to_datetime(timeranges[3][:-1])

print(f'Training period: from {training_start} to {training_end}')
print(f'Evaluation period: from {evaluation_start} to {evaluation_end}')

dataset_fname = os.path.join(DATA, 'dataset_name.txt')
with open(dataset_fname, 'r') as f:
    DATASET_NAME = f.readline()
    
print('Dataset used:', DATASET_NAME)

Training period: from 2015-01-01 00:00:00 to 2015-08-31 23:59:00
Evaluation period: from 2015-09-01 00:00:00 to 2015-11-30 23:59:00
Dataset used: lookout-demo-training-dataset


## Model training
---

In [7]:
# Prepare the model parameters:
lookout_model = lookout.LookoutEquipmentModel(model_name='lookout-demo-model-v1',
                                              dataset_name=DATASET_NAME,
                                              region_name=REGION_NAME)

# Set the training / evaluation split date:
lookout_model.set_time_periods(evaluation_start,
                               evaluation_end,
                               training_start,
                               training_end)

# Set the label data location:
lookout_model.set_label_data(bucket=BUCKET, 
                             prefix=f'{PREFIX}/labelled-data/',
                             access_role_arn=ROLE_ARN)

# This sets up the rate the service will resample the data before 
# training:
lookout_model.set_target_sampling_rate(sampling_rate='PT5M')

In [9]:
# Actually create the model and train it:
lookout_model.train()

{'ModelArn': 'arn:aws:lookoutequipment:eu-west-1:831520308310:model/lookout-demo-model-v1/2fbe82e9-f7fe-45d1-9b66-622382bd1eda',
 'Status': 'IN_PROGRESS',
 'ResponseMetadata': {'RequestId': 'b3b70afd-8f7b-45dd-9f85-aa1084369097',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'b3b70afd-8f7b-45dd-9f85-aa1084369097',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '150',
   'date': 'Thu, 17 Jun 2021 15:18:39 GMT'},
  'RetryAttempts': 0}}

In [10]:
lookout_model.poll_model_training()

2021-06-17 17:19:42 | Model training: IN_PROGRESS
2021-06-17 17:20:42 | Model training: IN_PROGRESS
2021-06-17 17:21:42 | Model training: IN_PROGRESS
2021-06-17 17:22:43 | Model training: IN_PROGRESS
2021-06-17 17:23:43 | Model training: IN_PROGRESS
2021-06-17 17:24:43 | Model training: IN_PROGRESS
2021-06-17 17:25:43 | Model training: IN_PROGRESS
2021-06-17 17:26:43 | Model training: IN_PROGRESS
2021-06-17 17:27:43 | Model training: IN_PROGRESS
2021-06-17 17:28:44 | Model training: IN_PROGRESS
2021-06-17 17:29:44 | Model training: IN_PROGRESS
2021-06-17 17:30:44 | Model training: IN_PROGRESS
2021-06-17 17:31:45 | Model training: IN_PROGRESS
2021-06-17 17:32:45 | Model training: IN_PROGRESS
2021-06-17 17:33:45 | Model training: IN_PROGRESS
2021-06-17 17:34:45 | Model training: IN_PROGRESS
2021-06-17 17:35:45 | Model training: IN_PROGRESS
2021-06-17 17:36:45 | Model training: IN_PROGRESS
2021-06-17 17:37:45 | Model training: IN_PROGRESS
2021-06-17 17:38:46 | Model training: IN_PROGRESS


## Conclusion
---
In this notebook, we use the dataset created in part 2 of this notebook series and trained a Lookout for Equipment model.

From here you can either head:
* To the next notebook where we will **extract the evaluation data** for this model and use it to perform further analysis on the model results.
* Or to the **inference scheduling notebook** where we will start the model, feed it some new data and catch the results.