<!-- # TODO: Title -->

# Dog breed classification

<!-- This notebook lists all the steps that you need to complete the complete this project. You will need to complete all the TODOs in this notebook as well as in the README and the two python scripts included with the starter code. -->

<!-- **TODO**: Give a helpful introduction to what this notebook is for. Remember that comments, explanations and good documentation make your project informative and professional. -->

In this notebook:

1. The dataset of dog breeds is collected and uploaded to an S3 bucket.
1. A pre-trained model is fine-tuned on the dataset, on AWS SageMaker
1. First the model undergoes hyperparameter tuning,
1. Then, selecting the most appropriate hyperparameters, the model is trained,
   & profiled, and debugged.
1. Once the training is complete the model is deployed to an endpoint, and
   queried.
1. _and in the end, all AWS resources are released_

<!-- **Note:** This notebook has a bunch of code and markdown cells with TODOs that you have to complete. These are meant to be helpful guidelines for you to finish your project while meeting the requirements in the project rubrics. Feel free to change the order of these the TODO's and use more than one TODO code cell to do all your tasks. -->


In [None]:
# DONE: TODO: Install any packages that you might need
# For instance, you will need the smdebug package
!pip install -U pip > /dev/null 2> /dev/null
!pip install -U boto3 sagemaker smdebug > /dev/null


In [None]:
# DONE: TODO: Import any packages that you might need
# For instance you will need Boto3 and Sagemaker
import os
import random
from pathlib import Path

import boto3
import IPython
import matplotlib.pyplot as plt
import sagemaker
from PIL import Image
from sagemaker.debugger import (
    DebuggerHookConfig,
    FrameworkProfile,
    ProfilerConfig,
    ProfilerRule,
    Rule,
    rule_configs,
)
from sagemaker.pytorch import PyTorch
from sagemaker.tuner import (
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
    IntegerParameter,
)
from smdebug.core.modes import ModeKeys
from smdebug.trials import create_trial
from torchvision import transforms


In [None]:
random.seed(42)
plt.style.use("ggplot")
plt.rcParams["figure.figsize"] = (16, 9)
instance_type_train = "ml.m5.large"
instance_type_deploy = "ml.m3.medium"
role = sagemaker.get_execution_role()


## Dataset

<!-- TODO: Explain what dataset you are using for this project. Maybe even give a small overview of the classes, class distributions etc that can help anyone not familiar with the dataset get a better understand of it. -->

The dataset consists of images of several dog breeds, split into train, test, &
validation.


In [None]:
#TODO: Fetch and upload the data to AWS S3

# Command to download and unzip data
!wget -O dogImages.zip https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip >/dev/null
!unzip  dogImages.zip > /dev/null


In [None]:
session = sagemaker.Session()
bucket = session.default_bucket()
prefix = "dogImages"
inputs_train = session.upload_data(
    path="./dogImages/train",
    bucket=bucket,
    key_prefix=os.path.join(prefix, "train"),
)
inputs_test = session.upload_data(
    path="./dogImages/test",
    bucket=bucket,
    key_prefix=os.path.join(prefix, "test"),
)
inputs_valid = session.upload_data(
    path="./dogImages/valid",
    bucket=bucket,
    key_prefix=os.path.join(prefix, "test"),
)


## Hyperparameter Tuning

<!-- **TODO:** This is the part where you will finetune a pretrained model with hyperparameter tuning. Remember that you have to tune a minimum of two hyperparameters. However you are encouraged to tune more. You are also encouraged to explain why you chose to tune those particular hyperparameters and the ranges. -->

<!-- **Note:** You will need to use the `hpo.py` script to perform hyperparameter tuning. -->


In [None]:
# DONE: TODO: Declare your HP ranges, metrics etc.
hp_ranges = {
    "epochs": IntegerParameter(2, 16),
    "batch-size": CategoricalParameter([32, 64, 128]),
    "lr": ContinuousParameter(0.001, 0.1),
    "beta1": ContinuousParameter(0.03, 0.9),
    "beta2": ContinuousParameter(0.03, 0.999),
}


In [None]:
objective_metric_name = "mean loss"
objective_type = "Minimize"
metric_definitions = [
    {
        "Name": "mean loss",
        "Regex": r"(?i)(?<=^average\s+loss:\s+)[\d\.]+",
        # "Regex": r"Average loss: ([0-9\.]+)",
    }
]


In [None]:
# DONE: TODO: Create estimators for your HPs
estimator = PyTorch(
    entry_point="hpo.py",
    role=role,
    py_version="py38",
    framework_version="1.13.1",
    instance_count=1,
    instance_type=instance_type_train,
)  # DONE: TODO: Your estimator here

tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hp_ranges,
    metric_definitions,
    max_jobs=2,
    max_parallel_jobs=2,
    objective_type=objective_type,
)  # DONE: TODO: Your HP tuner here


In [None]:
# TODO: Fit your HP Tuner
_ = tuner.fit(
    {"training": inputs_train, "validation": inputs_valid},
)  # TODO: Remember to include your data channels


In [None]:
# TODO: Get the best estimators and the best HPs

best_estimator = tuner.best_estimator()  # TODO

# Get the hyperparameters of the best trained model
hyperparameters = best_estimator.hyperparameters()
hyperparameters


## Model Profiling and Debugging

<!-- TODO: Using the best hyperparameters, create and finetune a new model -->

<!-- **Note:** You will need to use the `train_model.py` script to perform model profiling and debugging. -->


In [None]:
# DONE: TODO: Set up debugging and profiling rules and hooks
rules = [
    Rule.sagemaker(rule_configs.vanishing_gradient()),
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.overtraining()),
    Rule.sagemaker(rule_configs.poor_weight_initialization()),
    Rule.sagemaker(rule_configs.loss_not_decreasing()),
    ProfilerRule.sagemaker(rule_configs.ProfilerReport()),
    ProfilerRule.sagemaker(rule_configs.LowGPUUtilization()),
]


In [None]:
hook_config = DebuggerHookConfig(
    hook_parameters={"train.save_interval": "100", "eval.save_interval": "10"}
)
profiler_config = ProfilerConfig(
    system_monitor_interval_millis=500,
    framework_profile_params=FrameworkProfile(num_steps=10),
)


In [None]:
# TODO: Create and fit an estimator

estimator = PyTorch(
    role=role,
    instance_count=1,
    instance_type=instance_type_train,
    entry_point="train_model.py",
    framework_version="1.13.1",
    py_version="py38",
    hyperparameters=hyperparameters,
    profiler_config=profiler_config,
    rules=rules,
    debugger_hook_config=hook_config,
)  # DONE: TODO: Your estimator here


In [None]:
_ = estimator.fit(
    {
        "training": inputs_train,
        "validation": inputs_valid,
        "testing": inputs_test,
    },
)


In [None]:
# TODO: Plot a debugging output.
trial = create_trial(estimator.latest_job_debugger_artifacts_path())
tensor_name = random.choice(trial.tensor_names())


In [None]:
def get_data(trial, tensor_name, mode):
    tensor = trial.tensor(tensor_name)
    steps = tensor.steps(mode=mode)
    vals = [tensor.value(step, mode=mode) for step in steps]
    return steps, vals


In [None]:
steps_train, vals_train = get_data(trial, tensor_name, ModeKeys.TRAIN)
steps_eval, vals_eval = get_data(trial, tensor_name, ModeKeys.EVAL)

plt.plot(steps_train, vals_train, label=tensor_name)
plt.plot(steps_eval, vals_eval, label=f"val_{tensor_name}")
plt.legend()
plt.show()


<!-- **TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it? -->


In [None]:
# TODO: Display the profiler output
profiler_report_name = [
    rule["RuleConfigurationName"]
    for rule in estimator.latest_training_job.rule_job_summary()
    if "Profiler" in rule["RuleConfigurationName"]
][0]
IPython.display.HTML(
    filename=profiler_report_name + "/profiler-output/profiler-report.html"
)


## Model Deploying


In [None]:
# TODO: Deploy your model to an endpoint

predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type=instance_type_deploy,
)  # TODO: Add your deployment configuration like instance type and number of instances


In [None]:
# TODO: Run an prediction on the endpoint
ROOT = Path(".").resolve()
img_path = (
    ROOT
    / "dogImages"
    / random.choice(os.listdir(ROOT / "dogImages" / "valid"))
)
transform = transforms.Compose(
    [
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ]
)

image = transform(
    Image.open(img_path)
)  # TODO: Your code to load and preprocess image to send to endpoint for prediction
response = predictor.predict(image)


In [None]:
# TODO: Remember to shutdown/delete your endpoint once your work is done
predictor.delete_endpoint()
