# Detecting Motor Anomalies

Preventing motor anomalies is a bit more complicated than battery issues. Usually, motors operate in a certain range of power, but sometimes they may present anomalous behavior. Their power consumption can go to high, due to environmental issues, or too low, due to aging issues.

As usual, let's start by recovering and looking at data:

In [1]:
%store -r data
data.head()

Unnamed: 0_level_0,device_id,motor_peak_mA,battery
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-02-22 23:59:59,7517a917b42450470661cec1bd4654f8,1335,73
2020-02-22 23:59:59,8e4a851ed2317a249a0903f29d894361,1577,73
2020-02-22 23:59:59,572ddf9d82d5675ed2db832081b70103,1585,73
2020-02-22 23:59:59,b17bbc29ce61265a6212c689a597d4d8,0,73
2020-02-22 23:59:59,19d3c55b134ab7780d2b711211b7cf7c,1286,73


In [2]:
%store -r bucket
bucket

'mt-ml-workshop-wzejasmw'

# Exploratory Data Analysis

In [3]:
train_data = data[["motor_peak_mA"]]
train_data = train_data[train_data["motor_peak_mA"] > 0]
train_data.head()

Unnamed: 0_level_0,motor_peak_mA
timestamp,Unnamed: 1_level_1
2020-02-22 23:59:59,1335
2020-02-22 23:59:59,1577
2020-02-22 23:59:59,1585
2020-02-22 23:59:59,1286
2020-02-22 23:59:59,1796


In [4]:
train_data.describe()

Unnamed: 0,motor_peak_mA
count,3079934.0
mean,525.0616
std,685.8781
min,9.0
25%,10.0
50%,21.0
75%,797.0
max,7730.0


train_data.info()

In [5]:
import matplotlib.pyplot as plt
train_data.plot(rot=30)

<matplotlib.axes._subplots.AxesSubplot at 0x7f32a4668c18>

## Synthetic Ground Truth

In [6]:
anomalies = data[["motor_peak_mA"]]
anomalies = anomalies[anomalies["motor_peak_mA"] > 0]
anomalies.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3079934 entries, 2020-02-22 23:59:59 to 2020-02-25 22:03:11
Data columns (total 1 columns):
 #   Column         Dtype
---  ------         -----
 0   motor_peak_mA  int64
dtypes: int64(1)
memory usage: 47.0 MB


In [7]:
from sklearn.model_selection import train_test_split

train_data, test_dataframe = train_test_split(anomalies, test_size=0.2)

In [8]:
test_data = test_dataframe.copy()
test_data["anomaly"] = test_data["motor_peak_mA"] > 4000
test_data["anomaly"] = test_data["anomaly"] | (test_data["motor_peak_mA"] > 50) & (test_data["motor_peak_mA"] < 200)
test_data["anomaly"] = test_data["anomaly"].astype(int) 
test_data.groupby("anomaly").count().head()

Unnamed: 0_level_0,motor_peak_mA
anomaly,Unnamed: 1_level_1
0,615635
1,352


In [9]:
test_data.describe()

Unnamed: 0,motor_peak_mA,anomaly
count,615987.0,615987.0
mean,525.730116,0.000571
std,686.395569,0.023898
min,9.0,0.0
25%,10.0,0.0
50%,21.0,0.0
75%,800.0,0.0
max,7730.0,1.0


In [10]:
train_data.describe()

Unnamed: 0,motor_peak_mA
count,2463947.0
mean,524.8945
std,685.7488
min,9.0
25%,10.0
50%,21.0
75%,797.0
max,7052.0


# Random Cut Forest Training

In [11]:
train_array = train_data.values
train_array

array([[ 20],
       [  9],
       [488],
       ...,
       [  9],
       [ 21],
       [462]])

In [12]:
test_array = test_data[["motor_peak_mA"]].values
test_array

array([[1664],
       [1490],
       [   9],
       ...,
       [1896],
       [2439],
       [1194]])

In [13]:
labels_array = test_data["anomaly"].values
labels_array

array([0, 0, 0, ..., 0, 0, 0])

In [16]:
import io
import numpy as np
import sagemaker
import sagemaker.amazon.common as smac
import boto3

s3bucket = boto3.resource('s3').Bucket(bucket)

def upload_records(array,key,labels=None):
    result = {} 
    buf = io.BytesIO()
    if (labels is not None):
        smac.write_numpy_to_dense_tensor(buf, array, labels)
    else:
        smac.write_numpy_to_dense_tensor(buf, array)
    buf.seek(0)
    s3bucket.Object(key).upload_fileobj(buf)


In [17]:
import os

s3 = boto3.client("s3")
prefix = "mt-motor-anomaly" 

cwd = os.getcwd()
train_key  = "{}/input/{}".format(prefix,"train.rio")
test_key  = "{}/input/{}".format(prefix, "test.rio")

upload_records(train_array,train_key)
upload_records(test_array,test_key,labels_array)

train_input = sagemaker.s3_input(
       s3_data="s3://{}/{}".format(bucket,train_key),
       content_type='application/x-recordio-protobuf',
       distribution='ShardedByS3Key')

test_input = sagemaker.s3_input(
       s3_data="s3://{}/{}".format(bucket,test_key),
       content_type='application/x-recordio-protobuf',
       distribution='FullyReplicated')

rcf_input = {
    'train': train_input,
    'test': test_input     
}

rcf_input

{'train': <sagemaker.inputs.s3_input at 0x7f325e8d5d30>,
 'test': <sagemaker.inputs.s3_input at 0x7f325e8d5ba8>}

# RCF Training

In [18]:
region = boto3.Session().region_name
from sagemaker.amazon.amazon_estimator import get_image_uri

rcf_container = get_image_uri(region, 'randomcutforest')
rcf_container

'438346466558.dkr.ecr.eu-west-1.amazonaws.com/randomcutforest:1'

In [19]:
rcf_hparams = {
    "num_samples_per_tree":512,
    "num_trees":50,
    "feature_dim":1,
    "eval_metrics": "accuracy"
}

In [21]:
rcf_estimator = sagemaker.estimator.Estimator(
                      rcf_container,
                      role=sagemaker.get_execution_role(),
                      train_instance_count=1,
                      train_instance_type='ml.m5.large',
                      base_job_name="mt-motor-anomaly",
                      output_path='s3://{}/{}/output'.format(bucket, prefix),
                      hyperparameters = rcf_hparams )

In [22]:
rcf_estimator.fit(rcf_input)

2020-05-13 16:17:37 Starting - Starting the training job...
2020-05-13 16:17:38 Starting - Launching requested ML instances......
2020-05-13 16:19:06 Starting - Preparing the instances for training......
2020-05-13 16:19:52 Downloading - Downloading input data...
2020-05-13 16:20:36 Training - Downloading the training image..[34mDocker entrypoint called with argument(s): train[0m
  from numpy.testing.nosetester import import_nose[0m
  from numpy.testing.decorators import setastest[0m
[34m[05/13/2020 16:20:53 INFO 140574702872384] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-conf.json: {u'_ftp_port': 8999, u'num_samples_per_tree': 256, u'_tuning_objective_metric': u'', u'_num_gpus': u'auto', u'_log_level': u'info', u'_kvstore': u'dist_async', u'force_dense': u'true', u'epochs': 1, u'num_trees': 100, u'eval_metrics': [u'accuracy', u'precision_recall_fscore'], u'_num_kv_servers': u'auto', u'mini_batch_size': 1000}[0m
[34m[05


2020-05-13 16:20:49 Training - Training image download completed. Training in progress.[34m[2020-05-13 16:20:58.237] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 1, "duration": 3518, "num_examples": 2464, "num_bytes": 68990516}[0m
[34m[05/13/2020 16:20:58 INFO 140574702872384] Sampling training data completed.[0m
[34m#metrics {"Metrics": {"epochs": {"count": 1, "max": 1, "sum": 1.0, "min": 1}, "update.time": {"count": 1, "max": 3540.1980876922607, "sum": 3540.1980876922607, "min": 3540.1980876922607}}, "EndTime": 1589386858.258652, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "RandomCutForest"}, "StartTime": 1589386854.717813}
[0m
[34m[05/13/2020 16:20:58 INFO 140574702872384] Early stop condition met. Stopping training.[0m
[34m[05/13/2020 16:20:58 INFO 140574702872384] #progress_metric: host=algo-1, completed 100 % epochs[0m
[34m#metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 246

In [23]:
print('Training job name: {}'.format(rcf_estimator.latest_training_job.job_name))

Training job name: mt-motor-anomaly-2020-05-13-16-17-37-246


In [24]:
rcf_inference = rcf_estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
)

---------------!

In [25]:
rcf_inference_endpoint = rcf_inference.endpoint
%store rcf_inference_endpoint
rcf_inference_endpoint

Stored 'rcf_inference_endpoint' (str)


'mt-motor-anomaly-2020-05-13-16-17-37-246'

In [26]:
from sagemaker.predictor import csv_serializer, json_deserializer

rcf_inference.content_type = 'text/csv'
rcf_inference.serializer = csv_serializer
rcf_inference.accept = 'application/json'
rcf_inference.deserializer = json_deserializer

In [27]:
sample_data = train_data[:5].values
sample_data

array([[ 20],
       [  9],
       [488],
       [392],
       [ 10]])

In [28]:
results = rcf_inference.predict(sample_data)
results

{'scores': [{'score': 0.8458514883},
  {'score': 0.7505436662},
  {'score': 0.7561386466},
  {'score': 0.840287521},
  {'score': 0.7417537876}]}

In [29]:
import pandas as pd
sigmas = 1

scores = results["scores"]
scores = [score["score"] for score in scores]
series = pd.Series(scores)
score_mean = series.mean()
score_max = series.max()
score_std = series.std()
score_cutoff = score_mean + sigmas*score_std
(score_mean,score_max,score_std,score_cutoff)

(0.78691502194, 0.8458514883, 0.051555137007527445, 0.8384701589475274)

In [30]:
anomalies = series[series > score_cutoff ]  
anomalies

0    0.845851
3    0.840288
dtype: float64

In [31]:
"{} anomalies detected".format(len(anomalies))

'2 anomalies detected'

## Motor Maintenance

Now that we can detect anomalies in past data, let's combine that with forecasting for predictive [motor maintenance](mt-motor-maintenance.ipynb).