# Classifying dog breedsusing SageMaker

This notebook lists all the steps that you need to complete the complete this project. You will need to complete all the TODOs in this notebook as well as in the README and the two python scripts included with the starter code.


The dataset contains images from 133 dog breeds divided into training, testing and validation datasets.

We will use the ResNet50 model to perform transfer learning on a dataset using pretrained models available in Pytorch.

Once the model is trained (using hyperparameter tuning), you will need to deploy the model to a Sagemaker Endpoint. To test your deployment, you also need to query the deployed model with a sample image and get a prediction.

**Pipeline**

To finish this project, you will have to perform tasks and use tools that a typical ML Engineer does as a part of their job. Broadly, your project has 3 main steps:

1. Data Preparations
2. Training
3. Depoy

As an ML Engineer, you will need to track and coordinate the flow of data (which could be images, models, metrics etc) through these different steps. The goal of this project is not to train an accurate model, but to set up an infrastructure that enables other developers to train such models.


**Note:** This notebook has a bunch of code and markdown cells with TODOs that you have to complete. These are meant to be helpful guidelines for you to finish your project while meeting the requirements in the project rubrics. Feel free to change the order of these the TODO's and use more than one TODO code cell to do all your tasks.

In [2]:
# TODO: Install any packages that you might need
# For instance, you will need the smdebug package
!pip install smdebug

Collecting smdebug
  Downloading smdebug-1.0.12-py2.py3-none-any.whl (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.1/270.1 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting pyinstrument==3.4.2
  Downloading pyinstrument-3.4.2-py2.py3-none-any.whl (83 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.3/83.3 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m
Collecting pyinstrument-cext>=0.2.2
  Downloading pyinstrument_cext-0.2.4-cp37-cp37m-manylinux2010_x86_64.whl (20 kB)
Installing collected packages: pyinstrument-cext, pyinstrument, smdebug
Successfully installed pyinstrument-3.4.2 pyinstrument-cext-0.2.4 smdebug-1.0.12
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# TODO: Import any packages that you might need
# For instance you will need Boto3 and Sagemaker
import sagemaker
import boto3

## Dataset
TODO: Explain what dataset you are using for this project. Maybe even give a small overview of the classes, class distributions etc that can help anyone not familiar with the dataset get a better understand of it.

In [5]:
#TODO: Fetch and upload the data to AWS S3

# Command to download and unzip data
!wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip
!unzip dogImages.zip

**Visualize**

See [Visualize DogBreeds](Visualize_DogBreeds.ipynb)

**Load the data to S3**

Using the sagemaker SDK grab the current region, execution role and bucket

In [16]:
import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role

session = sagemaker.Session()

bucket= session.default_bucket()
print("Default Bucket: {}".format(bucket))

region = session.boto_region_name
print("AWS Region: {}".format(region))

role = get_execution_role() #sagemaker iam role
print("RoleArn: {}".format(role))

Default Bucket: sagemaker-us-east-1-011919444977
AWS Region: us-east-1
RoleArn: arn:aws:iam::011919444977:role/service-role/AmazonSageMaker-ExecutionRole-20221103T102549


With this data we can easily sync your data up to S3!

In [19]:

import os

os.environ["DEFAULT_S3_BUCKET"] = bucket
!aws s3 sync ./dogImages/train s3://${DEFAULT_S3_BUCKET}/dogImages/train/
!aws s3 sync ./dogImages/test s3://${DEFAULT_S3_BUCKET}/dogImages/test/
!aws s3 sync ./dogImages/valid s3://${DEFAULT_S3_BUCKET}/dogImages/valid/

'\nimport os\n\nos.environ["DEFAULT_S3_BUCKET"] = bucket\n!aws s3 sync ./dogImages/train s3://${DEFAULT_S3_BUCKET}/dogImages/train/\n!aws s3 sync ./dogImages/test s3://${DEFAULT_S3_BUCKET}/dogImages/test/\n!aws s3 sync ./dogImages/valid s3://${DEFAULT_S3_BUCKET}/dogImages/valid/\n'

## Hyperparameter Tuning
**TODO:** This is the part where you will finetune a pretrained model with hyperparameter tuning. Remember that you have to tune a minimum of two hyperparameters. However you are encouraged to tune more. You are also encouraged to explain why you chose to tune those particular hyperparameters and the ranges.

**Note:** You will need to use the `hpo.py` script to perform hyperparameter tuning.

In [23]:
'''
import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role

session = sagemaker.Session()

bucket= session.default_bucket()
print("Default Bucket: {}".format(bucket))

#region = session.boto_region_name
#print("AWS Region: {}".format(region))

role = get_execution_role() #sagemaker iam role
print("RoleArn: {}".format(role))
'''

'\nimport sagemaker\nfrom sagemaker.session import Session\nfrom sagemaker import get_execution_role\n\nsession = sagemaker.Session()\n\nbucket= session.default_bucket()\nprint("Default Bucket: {}".format(bucket))\n\n#region = session.boto_region_name\n#print("AWS Region: {}".format(region))\n\nrole = get_execution_role() #sagemaker iam role\nprint("RoleArn: {}".format(role))\n'

In [24]:
from sagemaker.pytorch import PyTorch
from sagemaker import get_execution_role
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)


In [26]:
#TODO: Declare your HP ranges, metrics etc.

hyperparameter_ranges = {
    "lr": ContinuousParameter(0.001, 0.01), 
    "batch-size": CategoricalParameter([32, 64]),
    "epochs": IntegerParameter(2, 4)
}

objective_metric_name = "Test Loss"
objective_type = "Minimize"
metric_definitions = [{"Name": "Test Loss", "Regex": "Testing Loss: ([0-9\\.]+)"}]


#hyperparameters = {"data_dir":}

In [35]:
#TODO: Create estimators for your HPs

estimator = PyTorch(
    entry_point="hpo.py",
    role=get_execution_role(),
    py_version='py36',
    framework_version="1.8",
    instance_count=1,
    instance_type='ml.m4.xlarge'#,
    #hyperparameters=hyperparameters
)# TODO: Your estimator here

tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=2,
    max_parallel_jobs=1,
    objective_type=objective_type,
    early_stopping_type="Auto"
)# TODO: Your HP tuner here

Read more on [HyperparameterTuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html)

In [38]:
# TODO: Fit your HP Tuner
tuner.fit({'train': f's3://{bucket}/dogImages'
           },wait=True)
# TODO: Remember to include your data channels

In [36]:
# TODO: Get the best estimators and the best HPs

best_estimator = tuner.best_estimator()

#Get the hyperparameters of the best trained model
best_estimator.hyperparameters()

## Model Profiling and Debugging
TODO: Using the best hyperparameters, create and finetune a new model

**Note:** You will need to use the `train_model.py` script to perform model profiling and debugging.

In [18]:
#from sagemaker.pytorch import PyTorch
#from sagemaker import get_execution_role
from sagemaker.debugger import (
    Rule,
    DebuggerHookConfig,
    rule_configs,
)
from sagemaker.debugger import ProfilerConfig, FrameworkProfile

In [20]:
# TODO: Set up debugging and profiling rules and hooks
rules = [
    Rule.sagemaker(rule_configs.vanishing_gradient()),
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.overtraining()),
    Rule.sagemaker(rule_configs.poor_weight_initialization()),
]

hook_config = DebuggerHookConfig(
    hook_parameters={"train.save_interval": "100", "eval.save_interval": "10"}
)


profiler_config = ProfilerConfig(
    system_monitor_interval_millis=500, framework_profile_params=FrameworkProfile(num_steps=10)
)

#"num_gpu":True

hyperparameters = {"epochs": "2", "batch-size": "64", "test-batch-size": "100", "lr": "0.001"}

In [24]:
# TODO: Create and fit an estimator

estimator =PyTorch(
    entry_point="train_model.py",
    base_job_name="sagemaker-dog-breed-pytorch",
    role=get_execution_role(),
    instance_count=1,
    instance_type="ml.m5.large",
    hyperparameters = hyperparameters,
    framework_version = "1.8",
    py_version="py36",
    debugger_hook_config=hook_config,
    profiler_config=profiler_config
)   

estimator.fit({'train': f's3://{bucket}/dogImages'
           },wait=True)

2022-10-18 17:27:04 Starting - Starting the training job...
2022-10-18 17:27:28 Starting - Preparing the instances for trainingProfilerReport-1666114024: InProgress
.........
2022-10-18 17:28:50 Downloading - Downloading input data.................[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-10-18 17:31:47,334 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-10-18 17:31:47,338 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-10-18 17:31:47,354 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-10-18 17:31:47,362 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-10-18 17:31:47,943 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-10-18 17:31:47,969 sagem

In [None]:
# TODO: Plot a debugging output.


**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

[Source on Debugger](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-debugger/pytorch_model_debugging/pytorch_script_change_smdebug.ipynb)

In [39]:
# TODO: Display the profiler output
estimator

<sagemaker.pytorch.estimator.PyTorch at 0x7f9c09804690>

## Model Deploying

In [None]:
# TODO: Deploy your model to an endpoint

predictor=estimator.deploy() # TODO: Add your deployment configuration like instance type and number of instances

In [None]:
# TODO: Run an prediction on the endpoint

image = # TODO: Your code to load and preprocess image to send to endpoint for prediction
response = predictor.predict(image)

In [None]:
# TODO: Remember to shutdown/delete your endpoint once your work is done
predictor.delete_endpoint()