# AWS Sagemaker Image Classification of Dog Breed 

This notebook shows how to do image classification with a pre-trained model in AWS Sagemaker.
We use a pretrained convolutional neural network with 50 layers (resnet50) in Pytorch Vision. The CNN is improved by using residual learning that shortcuts connections by skipping layers.

The model is fine-tuned to our image data set, consisting of dog images labelled by breed. 
This is done using a hyper parameter job to train it in Sagemaker. 
Finally, we enable profiling and debugging with hooks to track the progress of our model training. Ultimately, we obtain a supervised image classifier that can tell the breed of dogs.

Debugging helps troubleshoot the training of our neural network (e.g., vanishing/exploding gradients, bad weight initialization, saturated activation, overfitting, class imbalance, loss).

Profiling helps create a report while training that tracks the progress of our model training.
Finally, profiling can help measure GPU, CPU, and memory utilization. 

In [11]:
# Install packages 
# We will need the smdebug package
!pip install smdebug

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.[0m


In [12]:
# Import packages that we need
# For instance, we will Boto3 and Sagemaker
import sagemaker
import boto3
from sagemaker import get_execution_role

In [13]:
# Set up role, sagemaker, and s3 bucket 
role = get_execution_role()
session = sagemaker.Session()
region = session.boto_region_name
bucket = session.default_bucket()
print(f"Region {region} and S3 Bucket: {bucket}")

Region us-west-1 and S3 Bucket: sagemaker-us-west-1-625155689245


## Dataset
TODO: Explain what dataset you are using for this project. Maybe even give a small overview of the classes, class distributions etc that can help anyone not familiar with the dataset get a better understand of it.

In [None]:
# Command to download and unzip data (NB: only run once) 
!wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
!unzip dogImages.zip

In [None]:
# Fetch and upload the data to AWS S3 (NB: only run once)
image_data_folder_path ="dogImages"
s3_file_path = session.upload_data(path=image_data_folder_path, bucket=bucket, key_prefix=image_data_folder_path)
print(f"S3 File Path: {s3_file_path}")

In [14]:
s3_file_path = f"s3://{bucket}/dogImages"
s3_file_path

's3://sagemaker-us-west-1-625155689245/dogImages'

## Hyperparameter Tuning

We will finetune a pretrained model with hyperparameter tuning. 
We need to tune at minimum two hyperparameters.
Optionally, we can explain why we pick those hyperparameters and ranges.

To do finetuning (i.e., transfer learning), we will add a fully connected neural network layer on top of the pretrained model as the output layer, which can perform classification of 133 breeds. 


We use the Adam optimizer (with adaptive learning rate) from Pytorch and the following hyperparameters:
- Learning rate: 0.000001 (10^-6) to 1.0 (10^0) 
- Epsilon: 1 to 3 
- Weight decay: 0 to 0.1 (1e-1) 

The learning rate controls the step size at which we move in the direction towards the minimum of a loss function during each training iteration. Epsilon controls the threshold at which we automatically discard redundant layers that produce responses smaller than this threshold: [epsilon-ResNet](https://arxiv.org/abs/1804.01661). Namely, epsilon controls the tradeoff between accuracy and network size and is typically in the 1-3 range. The weight decay is a regularizing term controlling the weights by penalizing/shrinking them during backprogation to avoid overfitting. It is usually set between 0 and 0.1: [ML Mastery](https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-regularization/). 

The ranges are picked randomly through trial and error experimentation and rough guidelines. 

Typical values for a neural network with standardized inputs (or inputs mapped to (0, 1) interval) are less than 1 and greater than 10^-6: [ML Mastery](https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/). 

**Note:** We will need to use the `hpo.py` script to perform hyperparameter tuning.

In [20]:
# Declare HP ranges, metrics etc.
from sagemaker.tuner import (
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner
)

hyperparameter_ranges = {
    "lr": ContinuousParameter(0.000001, 1.0),
    "eps": ContinuousParameter(1, 3),
    "weight_decay": ContinuousParameter(0, 0.1),
    "batch_size": CategoricalParameter([64, 128]), # Picked range similar to exercises for speedup
}

objective_metric_name = "avg_test_loss"
objective_type = "Minimize" 
metric_definitions = [{"Name": objective_metric_name, "Regex": "Test set: Average loss: ([0-9\\.]+)"}]

In [19]:
# Create estimator for our HPs
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point = "hpo.py", # Hyper parameter entry point for training 
    base_job_name = "sagemaker-dog-breed-image-classification-hyperparameter-tuning",
    role = role,
    instance_count = 1, 
    instance_type = "ml.g4dn.xlarge", # 4 CPU, 16 GB RAM, GPU 
    py_version = "py36", 
    framework_version = "1.8"
) 

tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions, # Extracts average loss from logs with regex
    max_jobs=4, # Max train jobs, one per CPU core 
    max_parallel_jobs=4, # Max parallel training jobs to start
    objective_type=objective_type,
    early_stopping_type="Auto" # Early stopping may happen 
)

In [None]:
# TODO: Fit your HP Tuner
tuner.fit() # TODO: Remember to include your data channels

In [None]:
# TODO: Get the best estimators and the best HPs

best_estimator = #TODO

#Get the hyperparameters of the best trained model
best_estimator.hyperparameters()

## Model Profiling and Debugging
TODO: Using the best hyperparameters, create and finetune a new model

**Note:** You will need to use the `train_model.py` script to perform model profiling and debugging.

In [None]:
# TODO: Set up debugging and profiling rules and hooks

In [None]:
# TODO: Create and fit an estimator

estimator = # TODO: Your estimator here

In [None]:
# TODO: Plot a debugging output.

**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

In [None]:
# TODO: Display the profiler output

## Model Deploying

In [None]:
# TODO: Deploy your model to an endpoint

predictor=estimator.deploy() # TODO: Add your deployment configuration like instance type and number of instances

In [None]:
# TODO: Run an prediction on the endpoint

image = # TODO: Your code to load and preprocess image to send to endpoint for prediction
response = predictor.predict(image)

In [None]:
# TODO: Remember to shutdown/delete your endpoint once your work is done
predictor.delete_endpoint()