# TODO: Title

This notebook lists all the steps that you need to complete the complete this project. You will need to complete all the TODOs in this notebook as well as in the README and the two python scripts included with the starter code.


**TODO**: Give a helpful introduction to what this notebook is for. Remember that comments, explanations and good documentation make your project informative and professional.

**Note:** This notebook has a bunch of code and markdown cells with TODOs that you have to complete. These are meant to be helpful guidelines for you to finish your project while meeting the requirements in the project rubrics. Feel free to change the order of these the TODO's and use more than one TODO code cell to do all your tasks.

In [2]:
# TODO: Install any packages that you might need
# For instance, you will need the smdebug package
!pip install smdebug



In [3]:
# TODO: Import any packages that you might need
# For instance you will need Boto3 and Sagemaker
import sagemaker
import boto3
sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-pytorch-cifar"

role = sagemaker.get_execution_role()

## Dataset
TODO: Explain what dataset you are using for this project. Maybe even give a small overview of the classes, class distributions etc that can help anyone not familiar with the dataset get a better understand of it.

In [4]:
from torchvision.datasets import CIFAR10
from torchvision import transforms

## Download from pytorch to local directory
local_dir = 'data'
CIFAR10.mirrors = ["https://sagemaker-sample-files.s3.amazonaws.com/datasets/image/CIFAR10/"]
CIFAR10(
    local_dir,
    download=True,
    transform=transforms.Compose(
        [transforms.ToTensor()]
    )
)

Files already downloaded and verified


Dataset CIFAR10
    Number of datapoints: 50000
    Root location: data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
           )

In [5]:
#TODO: Fetch and upload the data to AWS S3
inputs = sagemaker_session.upload_data(path="data", bucket=bucket, key_prefix=prefix)
print("input spec (in this case, just an S3 path): {}".format(inputs))

input spec (in this case, just an S3 path): s3://sagemaker-us-east-1-441223896543/sagemaker/DEMO-pytorch-cifar


## Hyperparameter Tuning
**TODO:** This is the part where you will finetune a pretrained model with hyperparameter tuning. Remember that you have to tune a minimum of two hyperparameters. However you are encouraged to tune more. You are also encouraged to explain why you chose to tune those particular hyperparameters and the ranges.

**Note:** You will need to use the `hpo.py` script to perform hyperparameter tuning.

The chosen hyper parameters are the learning rate which is the most important parameter one can spend time tuning because it would help approach minimum error quickly if it's chosen correctly.
The other hyperparameter chosen is the number of epochs, It's important to know whether higher number of epochs helps the network more in cnvergence or not.

In [10]:
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner,
)

In [11]:
 #TODO: Initialise your hyperparameters
hyperparameter_ranges = {
    "lr": ContinuousParameter(0.001, 0.1),
    "epochs": CategoricalParameter([5, 10, 15, 20]),
}

In [12]:
#TODO: Create estimators for your HPs

 # TODO: Your estimator here
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="hpo.py",
    role=role,
    py_version='py36',
    framework_version="1.8",
    instance_count=1,
    instance_type="ml.m5.large"
)


In [13]:
objective_metric_name = "average test loss"
objective_type = "Minimize"
metric_definitions = [{"Name": "average test loss", "Regex": "Test set: Average loss: ([0-9\\.]+)"}]



In [14]:



tuner = HyperparameterTuner(
    estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=4,
    max_parallel_jobs=4,
    objective_type=objective_type,
)# TODO: Your HP tuner here

Don't worry about the error below, I just stopped the notebook but the tuner was created successfully and stored in the tuner variable.

In [15]:
# TODO: Fit your HP Tuner
tuner.fit({"training":inputs}) # TODO: Remember to include your data channels

INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker:Creating hyperparameter tuning job with name: pytorch-training-230322-1933


........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

In [16]:
# TODO: Get the best estimators and the best HPs
best_estimator = tuner.best_estimator() 


2023-03-22 20:08:42 Starting - Preparing the instances for training
2023-03-22 20:08:42 Downloading - Downloading input data
2023-03-22 20:08:42 Training - Training image download completed. Training in progress.
2023-03-22 20:08:42 Uploading - Uploading generated training model
2023-03-22 20:08:42 Completed - Resource released due to keep alive period expiry


In [17]:
#Get the hyperparameters of the best trained model
best_estimator.hyperparameters()

{'_tuning_objective_metric': '"average test loss"',
 'epochs': '"5"',
 'lr': '0.0018828461172301957',
 'sagemaker_container_log_level': '20',
 'sagemaker_estimator_class_name': '"PyTorch"',
 'sagemaker_estimator_module': '"sagemaker.pytorch.estimator"',
 'sagemaker_job_name': '"pytorch-training-2023-03-22-19-33-21-743"',
 'sagemaker_program': '"hpo.py"',
 'sagemaker_region': '"us-east-1"',
 'sagemaker_submit_directory': '"s3://sagemaker-us-east-1-441223896543/pytorch-training-2023-03-22-19-33-21-743/source/sourcedir.tar.gz"'}

It was Found that Number of Epochs=10 and learning rate of 0.0025 training job has the lowest test loss. That's why they will be used in the next section

## Model Profiling and Debugging
TODO: Using the best hyperparameters, create and finetune a new model

**Note:** You will need to use the `train_model.py` script to perform model profiling and debugging.

In [18]:
#Set up debugging and profiling rules and hooks
from sagemaker.pytorch import PyTorch
from sagemaker import get_execution_role

from sagemaker.debugger import (
    Rule,
    DebuggerHookConfig,
    rule_configs,
)

from sagemaker.debugger import Rule, ProfilerRule, rule_configs

rules = [
    Rule.sagemaker(rule_configs.vanishing_gradient()),
    Rule.sagemaker(rule_configs.overfit()),
    Rule.sagemaker(rule_configs.overtraining()),
    Rule.sagemaker(rule_configs.poor_weight_initialization()),
    Rule.sagemaker(rule_configs.loss_not_decreasing()),
    ProfilerRule.sagemaker(rule_configs.LowGPUUtilization()),
    ProfilerRule.sagemaker(rule_configs.ProfilerReport())
]

In [19]:
from sagemaker.debugger import ProfilerConfig, FrameworkProfile

hook_config = DebuggerHookConfig(
    
    hook_parameters = {"train.save_interval": "100", "eval.save_interval": "10"}
)

profiler_config = ProfilerConfig(
    system_monitor_interval_millis=500, framework_profile_params=FrameworkProfile(num_steps=10)
)


In [20]:
hyperparameters = {
    "BatchSize": 32, #From Previous Task
    "epochs": 10, #From previous tuner
    "lr":  0.0025 #From Previous Tuner
}



In [21]:
# TODO: Create and fit an estimator

# TODO: Your estimator here

#TODO: Create the estimator to train your model
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker import get_execution_role
## The line of code to be executed will be something like that
#  pytorch_cifar.py --batch-size 32 --lr 0.001 --test-batch-size 100
estimator = PyTorch(
    entry_point="train_model.py",
    base_job_name="sagemaker-script-mode",
    role=get_execution_role(),
    instance_count=1,
    instance_type="ml.m5.xlarge",
    hyperparameters = hyperparameters,
    framework_version="1.8",
    py_version="py36",
    ## Debugger parameters
    rules=rules,
    debugger_hook_config=hook_config,
    ## Profiler parameters
    profiler_config=profiler_config
)


In [22]:
estimator.fit({"training":inputs})


INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.image_uris:Defaulting to the only supported framework/algorithm version: latest.
INFO:sagemaker.image_uris:Ignoring unnecessary instance type: None.
INFO:sagemaker.i

2023-03-22 21:24:15 Starting - Starting the training job...
2023-03-22 21:24:31 Starting - Preparing the instances for trainingVanishingGradient: InProgress
Overfit: InProgress
Overtraining: InProgress
PoorWeightInitialization: InProgress
LossNotDecreasing: InProgress
LowGPUUtilization: InProgress
ProfilerReport: InProgress
......
2023-03-22 21:25:31 Downloading - Downloading input data...
2023-03-22 21:26:11 Training - Downloading the training image..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-03-22 21:26:26,333 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-03-22 21:26:26,336 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-03-22 21:26:26,344 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-03-22 21:26:26,347 sagemaker_pytorch_conta

In [23]:
from smdebug.trials import create_trial
from smdebug.core.modes import ModeKeys

trial = create_trial(estimator.latest_job_debugger_artifacts_path())

[2023-03-22 22:22:36.780 pytorch-1-8-gpu-py3-ml-g4dn-xlarge-60bd0d07a83be181dcf7335baae2:41 INFO s3_trial.py:42] Loading trial debug-output at path s3://sagemaker-us-east-1-441223896543/sagemaker-script-mode-2023-03-22-21-24-14-585/debug-output


In [24]:
# TODO: Can you print the names of all the tensors that were tracked
print(trial.tensor_names())


[2023-03-22 22:22:38.965 pytorch-1-8-gpu-py3-ml-g4dn-xlarge-60bd0d07a83be181dcf7335baae2:41 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2023-03-22 22:22:39.980 pytorch-1-8-gpu-py3-ml-g4dn-xlarge-60bd0d07a83be181dcf7335baae2:41 INFO trial.py:210] Loaded all steps
['CrossEntropyLoss_output_0', 'gradient/ResNet_fc.0.bias', 'gradient/ResNet_fc.0.weight', 'gradient/ResNet_fc.1.bias', 'gradient/ResNet_fc.1.weight', 'gradient/ResNet_fc.2.bias', 'gradient/ResNet_fc.2.weight', 'layer1.0.relu_input_0', 'layer1.0.relu_input_1', 'layer1.0.relu_input_2', 'layer1.1.relu_input_0', 'layer1.1.relu_input_1', 'layer1.1.relu_input_2', 'layer1.2.relu_input_0', 'layer1.2.relu_input_1', 'layer1.2.relu_input_2', 'layer2.0.relu_input_0', 'layer2.0.relu_input_1', 'layer2.0.relu_input_2', 'layer2.1.relu_input_0', 'layer2.1.relu_input_1', 'layer2.1.relu_input_2', 'layer2.2.relu_input_0', 'layer2.2.relu_input_1', 'layer2.2.relu_input_2', 'layer2.3.relu_input_0', 'layer2.3.relu_inp

In [25]:
Loss = trial.tensor_names()[0]
print(Loss)

CrossEntropyLoss_output_0


In [26]:
# TODO: Can you print the number of datapoints for one of those tensors
# for both train and eval mode
print(len(trial.tensor(Loss).steps(mode=ModeKeys.TRAIN)))
print(len(trial.tensor(Loss).steps(mode=ModeKeys.EVAL)))

4
29


In [27]:
from smdebug.profiler.analysis.notebook_utils.training_job import TrainingJob

tj = TrainingJob(estimator._current_job_name)
tj.wait_for_sys_profiling_data_to_be_available()

ProfilerConfig:{'S3OutputPath': 's3://sagemaker-us-east-1-441223896543/', 'ProfilingIntervalInMilliseconds': 500, 'ProfilingParameters': {'DataloaderProfilingConfig': '{"StartStep": 0, "NumSteps": 10, "MetricsRegex": ".*", }', 'DetailedProfilingConfig': '{"StartStep": 0, "NumSteps": 10, }', 'FileOpenFailThreshold': '50', 'HorovodProfilingConfig': '{"StartStep": 0, "NumSteps": 10, }', 'LocalPath': '/opt/ml/output/profiler', 'PythonProfilingConfig': '{"StartStep": 0, "NumSteps": 10, "ProfilerName": "cprofile", "cProfileTimer": "total_time", }', 'RotateFileCloseIntervalInSeconds': '60', 'RotateMaxFileSizeInBytes': '10485760', 'SMDataParallelProfilingConfig': '{"StartStep": 0, "NumSteps": 10, }'}}
s3 path:s3://sagemaker-us-east-1-441223896543/sagemaker-script-mode-2023-03-22-21-24-14-585/profiler-output


Profiler data from system is available


In [52]:
loss_not_decreasing_rule = Rule.sagemaker(base_config=rule_configs.loss_not_decreasing(),
         rule_parameters={"tensor_regex": "CrossEntropyLoss_output_0",
                         "mode": "TRAIN"})

In [28]:
# TODO: Plot a debugging output.
from smdebug.profiler.analysis.notebook_utils.timeline_charts import TimelineCharts

system_metrics_reader = tj.get_systems_metrics_reader()
system_metrics_reader.refresh_event_file_list()

view_timeline_charts = TimelineCharts(
    system_metrics_reader,
    framework_metrics_reader=None,
    select_dimensions=["CPU", "GPU"],
    select_events=["total"]
)

[2023-03-22 22:22:40.534 pytorch-1-8-gpu-py3-ml-g4dn-xlarge-60bd0d07a83be181dcf7335baae2:41 INFO metrics_reader_base.py:134] Getting 57 event files
select events:['total']
select dimensions:['CPU', 'GPU']
filtered_events:{'total'}
filtered_dimensions:{'CPUUtilization-nodeid:algo-1'}


GPU chart isn't provided since GPU device wasn't enabled


In [22]:
rule_output_path = estimator.output_path + estimator.latest_training_job.job_name + "/rule-output"
print(f"You will find the profiler report in {rule_output_path}")

You will find the profiler report in s3://sagemaker-us-east-1-990956898706/sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output


In [23]:
! aws s3 ls {rule_output_path} --recursive

2022-12-01 18:49:55     390776 sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-report.html
2022-12-01 18:49:55     241552 sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-report.ipynb
2022-12-01 18:49:49        192 sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-reports/BatchSize.json
2022-12-01 18:49:49        200 sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-reports/CPUBottleneck.json
2022-12-01 18:49:49       2088 sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-reports/Dataloader.json
2022-12-01 18:49:49        127 sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-reports/GPUMemoryIncrease.json
2022-12-01 18:49:49        199 sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/Profile

In [24]:
! aws s3 cp {rule_output_path} ./ --recursive #save them locally

download: s3://sagemaker-us-east-1-990956898706/sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-reports/BatchSize.json to ProfilerReport/profiler-output/profiler-reports/BatchSize.json
download: s3://sagemaker-us-east-1-990956898706/sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-reports/GPUMemoryIncrease.json to ProfilerReport/profiler-output/profiler-reports/GPUMemoryIncrease.json
download: s3://sagemaker-us-east-1-990956898706/sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-report.ipynb to ProfilerReport/profiler-output/profiler-report.ipynb
download: s3://sagemaker-us-east-1-990956898706/sagemaker-script-mode-2022-12-01-17-54-32-629/rule-output/ProfilerReport/profiler-output/profiler-reports/CPUBottleneck.json to ProfilerReport/profiler-output/profiler-reports/CPUBottleneck.json
download: s3://sagemaker-us-east-1-990956898706/sagemake

In [25]:
import os

# get the autogenerated folder name of profiler report
profiler_report_name = [
    rule["RuleConfigurationName"]
    for rule in estimator.latest_training_job.rule_job_summary()
    if "Profiler" in rule["RuleConfigurationName"]
][0]

In [26]:
profiler_report_name

'ProfilerReport'

In [27]:
import IPython

IPython.display.HTML(filename=profiler_report_name + "/profiler-output/profiler-report.html")

Unnamed: 0,Description,Recommendation,Number of times rule triggered,Number of datapoints,Rule parameters
StepOutlier,"Detects outliers in step duration. The step duration for forward and backward pass should be roughly the same throughout the training. If there are significant outliers, it may indicate a system stall or bottleneck issues.","Check if there are any bottlenecks (CPU, I/O) correlated to the step outliers.",93,15715,threshold:3  mode:None  n_outliers:10  stddev:3
MaxInitializationTime,Checks if the time spent on initialization exceeds a threshold percent of the total training time. The rule waits until the first step of training loop starts. The initialization can take longer if downloading the entire dataset from Amazon S3 in File mode. The default threshold is 20 minutes.,"Initialization takes too long. If using File mode, consider switching to Pipe mode in case you are using TensorFlow framework.",0,15715,threshold:20
CPUBottleneck,"Checks if the CPU utilization is high and the GPU utilization is low. It might indicate CPU bottlenecks, where the GPUs are waiting for data to arrive from the CPUs. The rule evaluates the CPU and GPU utilization rates, and triggers the issue if the time spent on the CPU bottlenecks exceeds a threshold percent of the total training time. The default threshold is 50 percent.",Consider increasing the number of data loaders or applying data pre-fetching.,0,6365,threshold:50  cpu_threshold:90  gpu_threshold:10  patience:1000
GPUMemoryIncrease,Measures the average GPU memory footprint and triggers if there is a large increase.,Choose a larger instance type with more memory if footprint is close to maximum available memory.,0,0,increase:5  patience:1000  window:10
BatchSize,"Checks if GPUs are underutilized because the batch size is too small. To detect this problem, the rule analyzes the average GPU memory footprint, the CPU and the GPU utilization.","The batch size is too small, and GPUs are underutilized. Consider running on a smaller instance type or increasing the batch size.",0,6361,cpu_threshold_p95:70  gpu_threshold_p95:70  gpu_memory_threshold_p95:70  patience:1000  window:500
Dataloader,"Checks how many data loaders are running in parallel and whether the total number is equal the number of available CPU cores. The rule triggers if number is much smaller or larger than the number of available cores. If too small, it might lead to low GPU utilization. If too large, it might impact other compute intensive operations on CPU.",Change the number of data loader processes.,0,10,min_threshold:70  max_threshold:200
LowGPUUtilization,"Checks if the GPU utilization is low or fluctuating. This can happen due to bottlenecks, blocking calls for synchronizations, or a small batch size.","Check if there are bottlenecks, minimize blocking calls, change distributed training strategy, or increase the batch size.",0,0,threshold_p95:70  threshold_p5:10  window:500  patience:1000
IOBottleneck,Checks if the data I/O wait time is high and the GPU utilization is low. It might indicate IO bottlenecks where GPU is waiting for data to arrive from storage. The rule evaluates the I/O and GPU utilization rates and triggers the issue if the time spent on the IO bottlenecks exceeds a threshold percent of the total training time. The default threshold is 50 percent.,"Pre-fetch data or choose different file formats, such as binary formats that improve I/O performance.",0,6365,threshold:50  io_threshold:50  gpu_threshold:10  patience:1000
LoadBalancing,"Detects workload balancing issues across GPUs. Workload imbalance can occur in training jobs with data parallelism. The gradients are accumulated on a primary GPU, and this GPU might be overused with regard to other GPUs, resulting in reducing the efficiency of data parallelization.",Choose a different distributed training strategy or a different distributed training framework.,0,0,threshold:0.2  patience:1000

Unnamed: 0,mean,max,p99,p95,p50,min
Step Durations in [s],0.16,3.79,0.25,0.18,0.14,0.11


**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

When Deploying the endpoint, It appears an error mentioning that smdebug is not defined, It seems that while deployment the instance won't access the smdebug library which is actually not important in the deployment phase so a protection against this error is providedin the train_model.py.
First while importing the module the following try except is added:

    try:
        import smdebug.pytorch as smd
        from smdebug.profiler.utils import str2bool
    except:
        pass
    
Second when initializing the hook and assigning a model, the following code is added:

    try:
        hook = smd.Hook.create_from_json_file()
        hook.register_hook(model)
    except:
        hook=0
Third, Training and Testing will receive the hook value, depending on which a protection was added inside their loops:

    ##Protection against Error Smdebug not defined
    if (hook==0):
        pass
    else:
        hook.set_mode(smd.modes.TRAIN)
    ############################################## 
    ##Protection against Error Smdebug not defined
    if (hook==0):
        pass
    else:
        hook.set_mode(smd.modes.EVAL)
    ############################################## 
Finally the train_model.py is working for both Training and Deployment.

## Model Deploying

In [32]:
# TODO: Deploy your model to an endpoint
# TODO: Add your deployment configuration like instance type and number of instances
#change the instance type to support msdebug
predictor=estimator.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")  

INFO:sagemaker:Creating model with name: sagemaker-script-mode-2023-03-22-23-43-41-020
INFO:sagemaker:Creating endpoint with name sagemaker-script-mode-2023-03-22-23-43-41-020


-----!

In [33]:
predictor.endpoint_name

'sagemaker-script-mode-2023-03-22-23-43-41-020'

## Prepare data for inference

In [34]:
# TODO: Your code to load and preprocess image to send to endpoint for prediction
import gzip 
import numpy as np
import random
import os

file = 'data/cifar-10-batches-py/data_batch_1'
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        Data = pickle.load(fo, encoding='bytes')
    return Data

Data_extracted=unpickle(file)
data=np.reshape(Data_extracted[b'data'][0], (3, 32, 32))

In [35]:
Imgs_train=unpickle('data/cifar-10-batches-py/data_batch_1') #image from the training dataset

The following datapoint is for a leptodactylus which is a sort  of a frog that takes label 6.

In [36]:
#Information about the picked Image
n=0
print(Imgs_train[b'labels'][n])
print(Imgs_train[b'data'][n].shape)
print(Imgs_train[b'filenames'][n])
Img_train=np.reshape(Imgs_train[b'data'][n], (3, 32, 32))
print(Img_train)

6
(3072,)
b'leptodactylus_pentadactylus_s_000004.png'
[[[ 59  43  50 ... 158 152 148]
  [ 16   0  18 ... 123 119 122]
  [ 25  16  49 ... 118 120 109]
  ...
  [208 201 198 ... 160  56  53]
  [180 173 186 ... 184  97  83]
  [177 168 179 ... 216 151 123]]

 [[ 62  46  48 ... 132 125 124]
  [ 20   0   8 ...  88  83  87]
  [ 24   7  27 ...  84  84  73]
  ...
  [170 153 161 ... 133  31  34]
  [139 123 144 ... 148  62  53]
  [144 129 142 ... 184 118  92]]

 [[ 63  45  43 ... 108 102 103]
  [ 20   0   0 ...  55  50  57]
  [ 21   0   8 ...  50  50  42]
  ...
  [ 96  34  26 ...  70   7  20]
  [ 96  42  30 ...  94  34  34]
  [116  94  87 ... 140  84  72]]]


In [37]:
data_batch_index=np.expand_dims(Img_train, axis=0)
data_batch_index.shape

(1, 3, 32, 32)


 Query the Endpoint

In [38]:
# TODO: Run an prediction on the endpoint
import torch
response = predictor.predict(torch.Tensor(data_batch_index)) #convert array to a byte tensor

# TODO: Query the endpoint
print(response)

[[1300.07116699 3585.88574219 2537.01391602  894.39477539 -238.90423584
  1101.35620117 -700.69622803 1356.94873047 3891.93505859 -330.97793579]]


In [39]:
m = torch.nn.Softmax(dim=1)

Note that the highest softmax of the element with index  6 that represents a frog

In [40]:
m(torch.Tensor(response))

[2023-03-22 23:46:21.925 pytorch-1-8-gpu-py3-ml-g4dn-xlarge-60bd0d07a83be181dcf7335baae2:41 INFO profiler_config_parser.py:102] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.


tensor([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])

For another imahge from the training dataset "camion" which is a sort of trucks that has a label of 9.

In [41]:
#Information about the picked Image
n=1
print(Imgs_train[b'labels'][n])
print(Imgs_train[b'data'][n].shape)
print(Imgs_train[b'filenames'][n])
Img_train2=np.reshape(Imgs_train[b'data'][n], (3, 32, 32))
print(Img_train2)

9
(3072,)
b'camion_s_000148.png'
[[[154 126 105 ...  91  87  79]
  [140 145 125 ...  96  77  71]
  [140 139 115 ...  79  68  67]
  ...
  [175 156 154 ...  42  61  93]
  [165 156 159 ... 103 123 131]
  [163 158 163 ... 143 143 143]]

 [[177 137 104 ...  95  90  81]
  [160 153 125 ...  99  80  73]
  [155 146 115 ...  82  70  69]
  ...
  [167 154 160 ...  34  53  83]
  [154 152 161 ...  93 114 121]
  [148 148 156 ... 133 134 133]]

 [[187 136  95 ...  71  71  70]
  [169 154 118 ...  78  62  61]
  [164 149 112 ...  64  55  55]
  ...
  [166 160 170 ...  36  57  91]
  [128 130 142 ...  96 120 131]
  [120 122 133 ... 139 142 144]]]


In [42]:
data_batch_index2=np.expand_dims(Img_train2, axis=0)
data_batch_index2.shape

(1, 3, 32, 32)

In [43]:
# TODO: Run an prediction on the endpoint
import torch
response2 = predictor.predict(torch.Tensor(data_batch_index2)) #convert array to a byte tensor

# TODO: Query the endpoint
print(response2)

[[ 2378.15673828  5754.39257812  5431.69140625   843.10400391
  -1957.63366699  1334.09936523 -2027.74609375  1581.72595215
   7745.26464844 -1515.98022461]]


In [44]:
m = torch.nn.Softmax(dim=1)

The output Indicates that element in index 9 has the highest value

In [45]:
m(torch.Tensor(response2))

tensor([[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])

### Cleanup

After you have finished with this exercise, remember to delete the prediction endpoint to release the instance associated with it

In [46]:
predictor.delete_endpoint()


INFO:sagemaker:Deleting endpoint configuration with name: sagemaker-script-mode-2023-03-22-23-43-41-020
INFO:sagemaker:Deleting endpoint with name: sagemaker-script-mode-2023-03-22-23-43-41-020


In [47]:
!zip -r CD0387-deep-learning-topics-within-computer-vision-nlp-project-starter.zip "CD0387-deep-learning-topics-within-computer-vision-nlp-project-starter"

/bin/sh: 1: zip: not found
