# Training and Inference of our ML Model

**SageMaker Studio Kernel**: Data Science

In this exercise you will do:
 - Run a Preprocessing Job using Amazon SageMaker Processing Job
 - Run a Pytorch Training Job using Amazon SageMaker Training Job
 - Run a Batch Inference Job using Amazon SageMaker Batch Transform
 - Compute the thresholds, used by the applicatio to classify the predictions as anomalies or normal behavior

## Part 1/3 - Setup
Here we'll import some libraries and define some variables. You can also take a look on the scripts that were previously created for preparing the data and training our model.

In [None]:
import sagemaker
import numpy as np
import glob
import os
import boto3

In [None]:
s3_client = boto3.client('s3')
sm_client = boto3.client('sagemaker')

bucket_name = ""

prefix = "data"

sagemaker_session=sagemaker.Session(default_bucket=bucket_name)
role = sagemaker.get_execution_role()

### Get the dataset and upload it to an S3 bucket

In [None]:
# Download the 
# clean the buckets first
s3_client.delete_object(Bucket=bucket_name, Key="{}/input".format(prefix))

input_data = sagemaker_session.upload_data('./../data/dataset_wind_turbine.csv.gz', key_prefix="{}/input".format(prefix) )
print(input_data)

### Visualize the training script & the preprocessing script

In [None]:
## This script was created to express what we saw in the previous exercise.
## It will get the raw data from the turbine sensors, select some features, 
## denoise, normalize, encode and reshape it as a 6x10x10 tensor
## This script is the entrypoint of the first step of the ML Pipelie: Data preparation
!pygmentize ./../algorithms/preprocessing/preprocessing.py

In [None]:
## This is the training/prediction script, used by the training step of 
## our ML Pipeline. In this step, a SageMaker Training Job will run this 
## script to build the model. Then, in the batch transform step,
## the same script will be used again to load the trained model
## and rebuild (predict) all the training samples. These predictions
## will then be used to compute MAE and the thresholds, for detecting anomalies
!pygmentize ./../algorithms/training/wind_turbine.py

***

## Part 2/3: Run the end to end ML workflow

### Define the import modules

In [None]:
import boto3
import logging
import sagemaker
from sagemaker.inputs import CreateModelInput, TrainingInput, TransformInput
from sagemaker.model import Model
from sagemaker.pytorch.estimator import PyTorch
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.transformer import Transformer
from sagemaker.workflow.parameters import ParameterInteger, ParameterString
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.step_collections import RegisterModel
from sagemaker.workflow.steps import CacheConfig, CreateModelStep, ProcessingStep, TrainingStep, TransformStep
import time
import traceback

In [None]:
s3_client = boto3.client('s3')
sm_client = boto3.client('sagemaker')

***

### Step 1/4: Create the Processing Job

#### Define input parameters

In [None]:
input_data = "s3://{}/{}/input".format(bucket_name, prefix)

preprocessing_framework_version = "0.23-1"
preprocessing_instance_type = "ml.m5.xlarge"
preprocessing_instance_count = 1

In [None]:
role = sagemaker.get_execution_role()
sagemaker_session=sagemaker.Session(default_bucket=bucket_name)

#### Run Processing Job

In [None]:
script_processor = SKLearnProcessor(
    framework_version=preprocessing_framework_version,
    role=role,
    instance_type=preprocessing_instance_type,
    instance_count=preprocessing_instance_count,
    max_runtime_in_seconds=7200,
    sagemaker_session=sagemaker_session
)

In [None]:
response = script_processor.run(
    code="./../algorithms/preprocessing/preprocessing.py",
    inputs=[
        ProcessingInput(source=input_data, destination='/opt/ml/processing/input')
    ],
    outputs=[
        ProcessingOutput(
            output_name='train_data', 
            source='/opt/ml/processing/train',
            destination='s3://{}/{}/output/train_data'.format(bucket_name, prefix)),
        ProcessingOutput(
            output_name='statistics', 
            source='/opt/ml/processing/statistics',
            destination='s3://{}/{}/output/statistics'.format(bucket_name, prefix))
    ],
    arguments=['--num-dataset-splits', '20']
)

### Step 2/4: Create the Trining Job

#### Define input parameters

In [None]:
training_framework_version = "1.6.0"
training_python_version = "py3"
training_instance_type = "ml.c5.4xlarge"
training_instance_count = 1
training_hyperparameters = {
    'k_fold_splits': 6,
    'k_index_only': 3, # after running some experiments with this dataset, it makes sense to fix it
    'num_epochs': 20,
    'batch_size': 256,
    'learning_rate': 0.0001,
    'dropout_rate': 0.001
}
training_metrics = [
    {'Name': 'train_loss:mse', 'Regex': ' train_loss=(\S+);'},
    {'Name': 'test_loss:mse', 'Regex': ' test_loss=(\S+);'}
]

In [None]:
role = sagemaker.get_execution_role()
sagemaker_session=sagemaker.Session(default_bucket=bucket_name)

#### Run Training Job

In [None]:
estimator = PyTorch(
        './../algorithms/training/wind_turbine.py',
        framework_version=training_framework_version,
        role=role,
        sagemaker_session=sagemaker_session,
        instance_type=training_instance_type,
        instance_count=training_instance_count,
        py_version=training_python_version,
        hyperparameters=training_hyperparameters,
        metric_definitions=training_metrics,
        output_path="s3://{}/models".format(bucket_name)
    )

In [None]:
estimator.fit(
    inputs={"train": TrainingInput(
        s3_data="s3://{}/{}/output/train_data".format(bucket_name, prefix),
        content_type="application/x-npy"
    )}
)

### Step 3/4: Register Model in the Model Registry

#### Input Parameters

In [None]:
model_package_group_name = "mlops-iot-package-group"
model_approval_status = "PendingManualApproval"

transform_instance_type = "ml.c5.xlarge"
transform_instance_count = 2

#### Create Model Package Group Name

In [None]:
describe_response = sm_client.describe_model_package_group(
    ModelPackageGroupName=model_package_group_name
)

print(describe_response)

if describe_response == "":
    response = sm_client.create_model_package_group(
        ModelPackageGroupName=model_package_group_name
    )
    
    print(response)

#### Register Trained Model in the Model Package Group

In [None]:
estimator.register(
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    content_types=["application/x-npy"],
    response_types=["application/x-npy"],
    inference_instances=[transform_instance_type],
    transform_instances=[transform_instance_type]
)

### Step 4/4: Run Batch Transform Job

#### Input Parameters

In [None]:
transform_instance_type = "ml.c5.xlarge"
transform_instance_count = 2

output_batch_data = "s3://{}/{}/output/eval".format(bucket_name, prefix)

#### Run Batch Transform Job

In [None]:
transformer = estimator.transformer(
    instance_count=transform_instance_count,
    instance_type=transform_instance_type,
    strategy="MultiRecord",
    assemble_with="Line",
    output_path=output_batch_data,
    accept='application/x-npy',
    max_payload=20
)

In [None]:
transformer.transform(
    "s3://{}/{}/output/train_data".format(bucket_name, prefix), 
    content_type="application/x-npy", 
    split_type=None
)

## Part 3/3 - Compute the threshold based on MAE

### Download the predictions & Compute MAE/thresholds

In [None]:
import boto3
import sagemaker

input_data = sm_client.describe_training_job(TrainingJobName=estimator._current_job_name)
input_data = input_data['InputDataConfig'][0]['DataSource']['S3DataSource']['S3Uri']

tokens = input_data.split('/', 3)
sagemaker_session.download_data(bucket=bucket_name, key_prefix='data/output/eval/', path='./../data/preds/')
sagemaker_session.download_data(bucket=bucket_name, key_prefix=tokens[3], path='./../data/input/')

In [None]:
import numpy as np
import glob

x_inputs = np.vstack([np.load(i) for i in glob.glob('./../data/input/*.npy')])
y_preds = np.vstack([np.load(i) for i in glob.glob('./../data/preds/*.out')])

n_samples,n_features,n_rows,n_cols = x_inputs.shape

x_inputs = x_inputs.reshape(n_samples, n_features, n_rows*n_cols).transpose((0,2,1))
y_preds = y_preds.reshape(n_samples, n_features, n_rows*n_cols).transpose((0,2,1))

mae_loss = np.mean(np.abs(y_preds - x_inputs), axis=1).transpose((1,0))
mae_loss[np.isnan(mae_loss)] = 0

thresholds = np.mean(mae_loss, axis=1)

if not(os.path.exists("./../data/statistics")):
    os.mkdir("./../data/statistics")

np.save('./../data/statistics/thresholds.npy', thresholds)
print(",".join(thresholds.astype(str)))

Alright! Now, you can start the next exercise which is on creating an Amazon SageMaker Pipeline for autimating all the steps defined in this notebook.

1. [__Training Pipeline__](./02-SageMaker-Pipeline-Training.ipynb)
