# AWS Sagemaker Dog Breed Classification

This notebook lists all the steps that you need to complete the complete this project. You will need to complete all the TODOs in this notebook as well as in the README and the two python scripts included with the starter code.


**TODO**: Give a helpful introduction to what this notebook is for. Remember that comments, explanations and good documentation make your project informative and professional.

**Note:** This notebook has a bunch of code and markdown cells with TODOs that you have to complete. These are meant to be helpful guidelines for you to finish your project while meeting the requirements in the project rubrics. Feel free to change the order of these the TODO's and use more than one TODO code cell to do all your tasks.

In [2]:
# TODO: Install any packages that you might need
# For instance, you will need the smdebug package
!pip install smdebug

Keyring is skipped due to an exception: 'keyring.backends'
Collecting smdebug
  Using cached smdebug-1.0.12-py2.py3-none-any.whl (270 kB)
Collecting pyinstrument==3.4.2
  Using cached pyinstrument-3.4.2-py2.py3-none-any.whl (83 kB)
Collecting pyinstrument-cext>=0.2.2
  Using cached pyinstrument_cext-0.2.4-cp37-cp37m-manylinux2010_x86_64.whl (20 kB)
Installing collected packages: pyinstrument-cext, pyinstrument, smdebug
Successfully installed pyinstrument-3.4.2 pyinstrument-cext-0.2.4 smdebug-1.0.12
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
# TODO: Import any packages that you might need
# For instance you will need Boto3 and Sagemaker
import sagemaker
import boto3

import IPython
import os

from sagemaker.session import Session
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorch

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

# import modules for debugging, profiling
from sagemaker.debugger import Rule, ProfilerRule, DebuggerHookConfig, ProfilerConfig, FrameworkProfile, rule_configs

# import modules for hyperparameter tuning
from sagemaker.tuner import (IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner)

## Dataset
TODO: Explain what dataset you are using for this project. Maybe even give a small overview of the classes, class distributions etc that can help anyone not familiar with the dataset get a better understand of it.

In [3]:
#TODO: Fetch and upload the data to AWS S3
# Command to download and unzip data
# !wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
# !unzip dogImages.zip

# sync data to s3
# !aws s3 sync dogImages/ s3://dog-breed-image-classifier/

In [4]:
# bucket name to keep projec data
bucket = 'dog-breed-image-classifier'
input_data = f's3://{bucket}/dogImages'

print(f'Training Input data is located at: {input_data}')

Training Input data is located at: s3://dog-breed-image-classifier/dogImages


## Hyperparameter Tuning
**TODO:** This is the part where you will finetune a pretrained model with hyperparameter tuning. Remember that you have to tune a minimum of two hyperparameters. However you are encouraged to tune more. You are also encouraged to explain why you chose to tune those particular hyperparameters and the ranges.

**Note:** You will need to use the `hpo.py` script to perform hyperparameter tuning.

In [5]:
#TODO: Declare your HP ranges, metrics etc.
hyperparamater_ranges = {
    'arch': CategoricalParameter(['densenet121', 'resnet18']),
    'epochs': IntegerParameter(5, 20),
    'lr': ContinuousParameter(1e-5, 0.1),
    'dropout_rate': CategoricalParameter([0.2, 0.35, 0.5, 0.65]),
    'hidden_units': CategoricalParameter([256, 384, 512, 640, 768]),    
    'batch_size': CategoricalParameter([16, 32, 64, 128]),
    'test_batch_size': CategoricalParameter([16, 32, 64, 128]),
}

In [6]:
#TODO: Create estimators for your HPs
estimator = PyTorch(
    entry_point='hpo.py',
    role = get_execution_role(),
    instance_type = 'ml.g4dn.xlarge',  #-->'ml.m5.xlarge', g4dn
    instance_count =1,
    framework_version = '1.12',
    py_version = 'py38',
)


objective_metric_name = 'validation accuracy'
objective_type = 'Maximize'
metric_definitions =  [{
    "Name": "validation accuracy",
    "Regex": "valid accuracy: ([0-9\\.]+)"}]   # fetch name from valid log/print in train func


tuner = HyperparameterTuner(
    estimator = estimator,
    objective_metric_name = objective_metric_name,
    hyperparameter_ranges = hyperparamater_ranges,
    metric_definitions = metric_definitions,
    max_jobs = 4,
    max_parallel_jobs = 2,
    objective_type = objective_type,
    early_stopping_type= 'Auto'
)

In [7]:
# TODO: Fit your HP Tuner
tuner.fit({'training': input_data}, wait=True)

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config
No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


................................................................................................................................................................................................................................................................................................................................................................................!


In [None]:
# TODO: Get the best estimators and the best HPs
best_estimator = tuner.best_estimator()

#Get the hyperparameters of the best trained model
best_estimator.hyperparameters()

## Model Profiling and Debugging
TODO: Using the best hyperparameters, create and finetune a new model

**Note:** You will need to use the `train_model.py` script to perform model profiling and debugging.

In [None]:
# TODO: Set up debugging and profiling rules and hooks
rules = [
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.loss_not_decreasing()),
    Rule.sagemaker(rule_configs.vanishing_gradient()),
    Rule.sagemaker(rule_configs.poor_weight_initialization()),
    ProfilerRule.sagemaker(rule_configs.LowGPUUtilization()),
    ProfilerRule.sagemaker(rule_configs.ProfilerReport())
]


hook_config = DebuggerHookConfig(
    hook_parameters={
        'train.save_interval':'100',
        'eval.save_interval': '10'
    }
)

profiler_config = ProfilerConfig(
    system_monitor_interval_millis = 500,
    framework_profile_params = FrameworkProfile(num_steps=10)
)

In [None]:
best_hyperparamaters = {
    'arch': None,
    'epochs': None,
    'lr': None,
    'batch_size': None,
    'test_batch_size': None,
}

best_estimator = PyTorch(
    entry_point='train_model.py',
    role = get_execution_role(),
    instance_type = 'ml.g4dn.xlarge',
    instance_count =1,
    framework_version = '1.12',
    py_version = 'py38',
    hyperparameters = best_hyperparameters
    # create hook config and profiler
    rules = rules,
    debugger_hook_config = hook_config,
    profiler_config = profiler_config    
)



In [None]:
# TODO: Plot a debugging output.

**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

In [None]:
# Display the profiler output
rule_output_path = estimator.output_path + estimator.latest_training_job.job_name + "/rule-output"
print(f"Profiler report in {rule_output_path}")

In [None]:
! aws s3 ls {rule_output_path} --recursive

In [None]:
! aws s3 cp {rule_output_path} ./ --recursive

In [None]:
# Get the autogenerated folder name of profiler report
profiler_report_name = [
    rule["RuleConfigurationName"]
    for rule in estimator.latest_training_job.rule_job_summary()
    if "Profiler" in rule["RuleConfigurationName"]
][0]

In [None]:
# Diplay profiler report
IPython.display.HTML(filename=profiler_report_name + "/profiler-output/profiler-report.html")

## Model Deploying

In [None]:
# Deploy your model to an endpoint
predictor=estimator.deploy(
    instance_count='ml.m5.xlarge',
    initial_instance_count=1
)

In [None]:
# Run an prediction on the endpoint

image = # TODO: Your code to load and preprocess image to send to endpoint for prediction
response = predictor.predict(image)

In [None]:
# Remember to shutdown/delete your endpoint once your work is done
predictor.delete_endpoint()