# 3) Model training
In this notebook, we will train an entire pretrained VGG model (not just the classifications layers).


In [2]:
%%capture
!pip install smdebug
!pip install torchvision --no-cache-dir  

In [3]:
import json
import sagemaker
from sagemaker import get_execution_role
from sagemaker.session import Session
from sagemaker.pytorch import PyTorch, PyTorchModel
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from sagemaker.debugger import DebuggerHookConfig, ProfilerConfig, FrameworkProfile
from sagemaker.debugger import Rule, ProfilerRule, rule_configs
from smdebug.trials import create_trial
from smdebug.core.modes import ModeKeys

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import host_subplot

import boto3
import os
import numpy as np

from PIL import Image
import IPython

session = sagemaker.Session()

bucket = session.default_bucket()
print("Default Bucket: {}".format(bucket))

region = session.boto_region_name
print("AWS Region: {}".format(region))

role = get_execution_role()
print("RoleArn: {}".format(role))

prefix = "capstone-inventory-project"

[2022-01-16 09:25:17.151 datascience-1-0-ml-t3-medium-1abf3407f667f989be9d86559395:22 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
Default Bucket: sagemaker-us-east-1-837030799965
AWS Region: us-east-1
RoleArn: arn:aws:iam::837030799965:role/service-role/AmazonSageMaker-ExecutionRole-20211207T163039


The train_vgg.py file contains the training script.  
While the whole model will be trained (for 10 epochs), the weights are pretrained.  
models.vgg11_bn(pretrained=True)

In [None]:
# Create and fit an estimator
estimator = PyTorch(
    entry_point="scripts/train.py",
    role=role,
    py_version='py36',
    framework_version="1.8",
    instance_count=1,
    instance_type="ml.g4dn.xlarge",        
    profiler_config=profiler_config,      # profiler hook
    debugger_hook_config=debugger_config, # debugger hook
    output_path = 's3://{}/{}/main_training'.format(bucket, prefix),  # The training jobs output (mainly model artefacts) will go there.
    hyperparameters={                                                 # Best values from the previous hpo
        "batch-size": 16, # Hyperparameters obtain from our hpo
        "lr": 0.002}
)

estimator.fit({"train": "s3://{}/{}/data".format(bucket, prefix)})

2022-01-15 16:51:14 Starting - Starting the training job.

The results are quite disappointing as the model did not manage to improve at all.

![alt text](images/results-training.png "Title")

I see two possible reasons:
- The model type is not appropriate to capture the wide range of features which can define the number of objects in a bin.
- The model should have been trained from scratch. To use pretrained weights might have stuck the model into a local minima right at the start.

In the following notebook, we will try to mitigate this issue.