## Model Evaluation using SageMaker Processing Job

1. [Introduction](#Introduction)
2. [Prerequisites](#Prerequisites)
3. [Setup](#Setup)
4. [Dataset](#Dataset)
5. [Build a SageMaker Processing Job](#Build-a-SageMaker-Processing-Job)
    1. [Prepare the Script and Docker File](#Prepare-the-Script-and-Docker-File)
    2. [Configure a ScriptProcessor](#Configure-a-ScriptProcessor)
6. [Review Outputs](#Review-Outputs)

# Introduction

Postprocess and Model evaluation is an important step to vet out models before deployment. In this lab you will use ScriptProcessor from SageMaker Process to build a post processing step after model training to evaluate the performance of the model.  

To setup your ScriptProcessor, we will build a custom container for a model evaluation script which will Load the tensorflow model, Load the test dataset and annotation (either from previous module or run the `optional-prepare-data-and-model.ipynb` notebook), and then run predicition and generate the confussion matrix. 

** Note: This Notebook was tested on Data Science Kernel in SageMaker Studio**


# Prerequisites

Download the notebook into your environment, and you can run it by simply execute each cell in order. To understand what's happening, you'll need:

- Access to the SageMaker default S3 bucket.
- Familiarity with Python and numpy
- Basic familiarity with AWS S3.
- Basic understanding of AWS Sagemaker.
- Basic familiarity with AWS Command Line Interface (CLI) -- ideally, you should have it set up with credentials to access the AWS account you're running this notebook from.
- SageMaker Studio is preferred for the full UI integration

## Setup

Setting up the environment, load the libraries, and define the parameter for the entire notebook.

In [None]:
import sagemaker
from sagemaker import get_execution_role
import boto3
import json

role = get_execution_role()
sess = sagemaker.Session()

account = sess.account_id()
region = sess.boto_region_name
bucket = sess.default_bucket() # or use your own custom bucket name
prefix = 'postprocessing-modal-evaluation'

### Dataset
The dataset we are using is from [Caltech Birds (CUB 200 2011)](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html). **If you kept your artifacts from previous labs, then simply update the s3 location below for you Test images and Test data annotation file.  If you do not have them, just run the `optional-prepare-data-and-model.ipynb` notebook to generate the files, and then update the path below.**

- S3 path for test image data
- S3 path for test data annotation file
- S3 path for the bird classification model

In [None]:
s3_images = f's3://{bucket}/{prefix}/outputs/test/'
s3_manifest = f's3://{bucket}/{prefix}/outputs/manifest'
s3_model = f's3://{bucket}/{prefix}/postprocessing-modal-evaluation-2022-03-25-23-23-23-103/output'

## Build a SageMaker Processing Job

### Prepare the Script and Docker File
With SageMaker, you can run data processing jobs using the SKLearnProcessor, popular ML frameworks processors, Apache Spark, or BYOC.  To learn more about [SageMaker Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html)

For this example we are going to practice using ScriptProcess and Bring Our Own Container (BYOC). ScriptProcess require you to feed a container uri from ECR and a custom script for the process.

Here is what the script below does:
1. loading the tf model
2. looping through the annotation file to run inference predictions
3. tally the results using sklearn libraries & generate the confusion matrix
4. save the metrics in a evaluation.json report as output

In [None]:
%%writefile evaluation.py
import logging

import pandas as pd
import argparse
import pathlib
import json
import os
import numpy as np
import tarfile
import uuid

from PIL import Image

from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    confusion_matrix,
    f1_score
)

from tensorflow import keras
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing import image
# from smexperiments import tracker

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())

input_path =  "/opt/ml/processing/input/test" #"output/test" #
manifest_path = "/opt/ml/processing/input/manifest/test.csv"#"output/manifest/test.csv"
model_path = "/opt/ml/processing/model" #"model" # 
output_path = '/opt/ml/processing/output' #"output" # 

HEIGHT=224; WIDTH=224

def predict_bird_from_file_new(fn, model):
    
    img = Image.open(fn).convert('RGB')
    
    img = img.resize((WIDTH, HEIGHT))
    img_array = image.img_to_array(img) #, data_format = "channels_first")

    x = img_array.reshape((1,) + img_array.shape)
    instance = preprocess_input(x)

    del x, img
    
    result = model.predict(instance)

    predicted_class_idx = np.argmax(result)
    confidence = result[0][predicted_class_idx]

    return predicted_class_idx, confidence

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model-file", type=str, default="model.tar.gz")
    args, _ = parser.parse_known_args()

    logger.debug("Extracting the model")

    model_file = os.path.join(model_path, args.model_file)
    file = tarfile.open(model_file)
    file.extractall(model_path)

    file.close()

    logger.debug("Load model")

    model = keras.models.load_model("{}/1".format(model_path))

    logger.debug("Starting evaluation.")
    
    # load test data.  this should be an argument
    df = pd.read_csv(manifest_path)
    
    num_images = df.shape[0]
    
    class_name_list = sorted(df['class_id'].unique().tolist())
    
    class_name = pd.Series(df['class_name'].values,index=df['class_id']).to_dict()
    
    logger.debug('Testing {} images'.format(df.shape[0]))
    num_errors = 0
    preds = []
    acts  = []
    for i in range(df.shape[0]):
        fname = df.iloc[i]['image_file_name']
        act   = int(df.iloc[i]['class_id']) - 1
        acts.append(act)
        
        pred, conf = predict_bird_from_file_new(input_path + '/' + fname, model)
        preds.append(pred)
        if (pred != act):
            num_errors += 1
            logger.debug('ERROR on image index {} -- Pred: {} {:.2f}, Actual: {}'.format(i, 
                                                                   class_name_list[pred], conf, 
                                                                   class_name_list[act]))
    precision = precision_score(acts, preds, average='micro')
    recall = recall_score(acts, preds, average='micro')
    accuracy = accuracy_score(acts, preds)
    cnf_matrix = confusion_matrix(acts, preds, labels=range(len(class_name_list)))
    f1 = f1_score(acts, preds, average='micro')
    
    logger.debug("Accuracy: {}".format(accuracy))
    logger.debug("Precision: {}".format(precision))
    logger.debug("Recall: {}".format(recall))
    logger.debug("Confusion matrix: {}".format(cnf_matrix))
    logger.debug("F1 score: {}".format(f1))
    
    logger.debug(cnf_matrix)
    
    matrix_output = dict()
    
    for i in range(len(cnf_matrix)):
        matrix_row = dict()
        for j in range(len(cnf_matrix[0])):
            matrix_row[class_name[class_name_list[j]]] = int(cnf_matrix[i][j])
        matrix_output[class_name[class_name_list[i]]] = matrix_row

    
    report_dict = {
        "multiclass_classification_metrics": {
            "accuracy": {"value": accuracy, "standard_deviation": "NaN"},
            "precision": {"value": precision, "standard_deviation": "NaN"},
            "recall": {"value": recall, "standard_deviation": "NaN"},
            "f1": {"value": f1, "standard_deviation": "NaN"},
            "confusion_matrix":matrix_output
        },
    }

    output_dir = "/opt/ml/processing/evaluation"
    pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)

    evaluation_path = f"{output_dir}/evaluation.json"
    with open(evaluation_path, "w") as f:
        f.write(json.dumps(report_dict))


build a custom docker container and push to ECR

You can use the standard TFflow container, but ScriptProcessor currently does not support `source_dir` for custom requirement.txt and multiple python file.  That is on the roadmap, please follow this [thread](https://github.com/aws/sagemaker-python-sdk/issues/1248) for updates.

In [None]:
!mkdir docker

In [None]:
%%writefile docker/requirements.txt
# This is the set of Python packages that will get pip installed
# at startup of the Amazon SageMaker endpoint or batch transformation. 
Pillow
scikit-learn
pandas
numpy
tensorflow==2.1
boto3==1.18.4
sagemaker-experiments
matplotlib==3.4.2

In [None]:
%%writefile docker/Dockerfile

FROM public.ecr.aws/docker/library/python:3.7
    
ADD requirements.txt /

RUN pip3 install -r requirements.txt

ENV PYTHONUNBUFFERED=TRUE 
ENV TF_CPP_MIN_LOG_LEVEL="2"

ENTRYPOINT ["python3"]

The easiest way to build a container image and push to ECR is to use studion image builder. This require certain permission for your sagemaker execution role, please follow this [blog](https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/) to update your role policy. 

In [None]:
!pip install sagemaker-studio-image-build

In [None]:
container_name = "sagemaker-tf-container"
!cd docker && sm-docker build . --file Dockerfile --repository $container_name:2.0
    
ecr_image = "{}.dkr.ecr.{}.amazonaws.com/{}:2.0".format(account, region, container_name)

### Configure a ScriptProcessor
1) copy the ecr uri from the step above
2) initialize the Process (instance count, instance type, etc.)
3) run the processing job (define script path, input arguments, input and output file locations

Note: we are not using GPU, so you can ignore the CUDA warning message. You can add the corresponding libraries to you docker file if you want use GPU acceleration.

In [None]:
import boto3
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput, Processor
from sagemaker import get_execution_role

import uuid

region = boto3.session.Session().region_name

role = get_execution_role()

image_uri = ecr_image

s3_evaluation_output = f's3://{bucket}/{prefix}/outputs/evaluation'


script_processor = ScriptProcessor(base_job_name = prefix,
                command=['python3'],
                image_uri=image_uri,
                role=role,
                instance_count=1,
                instance_type='ml.m5.xlarge')

In [None]:
script_processor.run(
                        code='evaluation.py',
                        arguments=["--model-file", "model.tar.gz"],
                        inputs=[ProcessingInput(source=s3_images, 
                                                destination="/opt/ml/processing/input/test"),
                                ProcessingInput(source=s3_manifest, 
                                                destination="/opt/ml/processing/input/manifest"),
                                ProcessingInput(source=s3_model, 
                                                destination="/opt/ml/processing/model"),
                               ],
                        outputs=[
                            ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation", 
                                             destination=s3_evaluation_output),
                        ]
                    )

# Review Outputs

At the end of the lab, you will generate a json file containing the performance metrics (accuracy, precision, recall, f1, and confusion matrix) on your test dataset.  Run the cell below to review the output.

In [None]:
import pprint as pp
s3 = boto3.resource('s3')
eval_matrix_key = f'{prefix}/outputs/evaluation/evaluation.json'
content_object = s3.Object(bucket, eval_matrix_key)
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)

pp.pprint(json_content['multiclass_classification_metrics'])