## Safety Helmet Detection

Welcome to this SageMaker Notebook! This is an entirely managed notebook service that you can use to create and edit machine learning models. We will be using it today to create a binary image classification model using the Apache MXNet deep learning framework. We will then learn how to deploy this model onto our DeepLens device.

In this notebook we will be to using MXNet's Gluon interface, to download and edit a pre-trained ImageNet model and transform it into binary classifier, which we can use to differentiate between hot dogs and not hot dogs.

### Setup

Before we start, make sure the kernel in the the notebook is set to the correct one, `condamxnet3.6` which has all the dependencies we will need for this tutorial already installed.

First we'll start by importing a bunch of packages into the notebook that you'll need later and installing any required packages that are missing into our notebook kernel.

In [31]:
%%bash
conda install scikit-image

Solving environment: ...working... done

## Package Plan ##

  environment location: /home/ec2-user/anaconda3/envs/mxnet_p36

  added / updated specs: 
    - scikit-image


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2018.8.13          |           py36_0         138 KB
    scikit-image-0.14.0        |   py36hf484d3e_1        24.1 MB
    openssl-1.0.2p             |       h14c3975_0         3.5 MB
    ------------------------------------------------------------
                                           Total:        27.7 MB

The following packages will be UPDATED:

    certifi:         2018.4.16-py36_0      conda-forge --> 2018.8.13-py36_0     
    openssl:         1.0.2o-0              conda-forge --> 1.0.2p-h14c3975_0    
    scikit-image:    0.13.1-py36h14c3975_1             --> 0.14.0-py36hf484d3e_1

The following packages will be DOWNGRADED:

    ca-certificates: 2018.4.



  current version: 4.4.10
  latest version: 4.5.10

Please update conda by running

    $ conda update -n base conda


certifi 2018.8.13:            |   0% certifi 2018.8.13: ########## | 100% 
scikit-image 0.14.0:            |   0% scikit-image 0.14.0: #5         |  16% scikit-image 0.14.0: ###3       |  33% scikit-image 0.14.0: #####1     |  51% scikit-image 0.14.0: ######8    |  69% scikit-image 0.14.0: ########1  |  82% scikit-image 0.14.0: #########2 |  92% scikit-image 0.14.0: ########## | 100% 
openssl 1.0.2p:            |   0% openssl 1.0.2p: #######5   |  75% openssl 1.0.2p: ########2  |  83% openssl 1.0.2p: #########5 |  96% openssl 1.0.2p: ########## | 100% 


In [32]:
from __future__ import print_function
import logging
logging.basicConfig(level=logging.INFO)
import os
import time
from collections import OrderedDict
import skimage.io as io
import numpy as np

from mxnet import gluon, image, init, nd
from mxnet import autograd as ag
from mxnet.gluon import nn
from mxnet.gluon.model_zoo import vision as models
from mxnet.gluon.data.vision import transforms


import mxnet as mx

## Model

The model we will be downloading and editing is [SqueezeNet](https://arxiv.org/abs/1602.07360), an extremely efficient image classification model that achived 2012 State of the Art accuracy on the popular [ImageNet](http://www.image-net.org/challenges/LSVRC/), image classification challenge. SqueezeNet is just a convolutional neural network, with an architecture chosen to have a small number of parameters and to require a minimal amount of computation. It's especially popular for folks that need to run CNNs on low-powered devices like cell phones and other internet-of-things devices, such as DeepLens. The MXNet Deep Learning framework offers squeezenet v1.0 and v1.1 that are pretrained on ImageNet through it's model Zoo.

## Pulling the pre-trained model
The MXNet model zoo  gives us convenient access to a number of popular models,
both their architectures and their pretrained parameters.
Let's download SqueezeNet right now with just a few lines of code.

In [38]:
# Demo mode uses the validation dataset for training, which is smaller and faster to train.
demo = True
log_interval = 100
gpus = 0

# Options are imperative or hybrid. Use hybrid for better performance.
mode = 'hybrid'

# training hyperparameters
batch_size = 256
if demo:
    epochs = 5
    learning_rate = 0.02
    wd = 0.002
else:
    epochs = 40
    learning_rate = 0.05
    wd = 0.002

# the class weight for hotdog class to help the imbalance problem.
positive_class_weight = 5

In [39]:
from __future__ import print_function
import logging
logging.basicConfig(level=logging.INFO)
import os
import time
from collections import OrderedDict
import skimage.io as io

import mxnet as mx
from mxnet.test_utils import download
mx.random.seed(127)

In [40]:
classes = 23

epochs = 5
lr = 0.001
per_device_batch_size = 1
momentum = 0.9
wd = 0.0001

lr_factor = 0.75
lr_steps = [10, 20, 30, np.inf]

num_gpus = 0
num_workers = 1
ctx = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]
batch_size = per_device_batch_size * max(num_gpus, 1)

In [41]:
jitter_param = 0.4
lighting_param = 0.1

transform_train = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomFlipLeftRight(),
    transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param,
                                 saturation=jitter_param),
    transforms.RandomLighting(lighting_param),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

transform_test = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

In [69]:
path = './'
train_path = os.path.join(path, 'train')
#val_path = os.path.join(path, 'val')
test_path = os.path.join(path, 'test')

train_dataset = gluon.data.DataLoader(
    gluon.data.vision.ImageFolderDataset(train_path).transform_first(transform_train),
    batch_size=batch_size, shuffle=True, num_workers=num_workers)

test_dataset = gluon.data.DataLoader(
    gluon.data.vision.ImageFolderDataset(train_path).transform_first(transform_train),
    batch_size=batch_size, shuffle=True, num_workers=num_workers)

In [70]:
from mxnet.gluon import nn
from mxnet.gluon.model_zoo import vision as models

# get pretrained squeezenet
net = models.squeezenet1_1(pretrained=True, prefix='deep_crash_helmet_')
# crash helemt happens to be a class in imagenet.
# we can reuse the weight for that class for better performance
# here's the index for that class for later use
# See https://gist.github.com/aaronpolhamus/964a4411c0906315deb9f4a3723aac57
imagenet_helmet_index = 778

### Safety Helmet Net

In vision networks it is common that the first set of layers learns the task of recognizing edges, curves and other important visual features of the input image. We call this feature extraction, and once the abstract features are extracted we can leverage a much simpler model to classify images using these features.

We will use the feature extractor from the pretrained squeezenet (every layer except the last one) to build our own classifier for safety helmets. Conveniently, the MXNet model zoo handles the decapitation for us. All we have to do is specify the number of output classes in our new task, which we do via the keyword argument `classes=2`.

In [71]:
deep_safety_helmet_net = models.squeezenet1_1(prefix='deep_safety_helmet_', classes=2)
deep_safety_helmet_net.collect_params().initialize()
deep_safety_helmet_net.features = net.features

# Lets take a look at what this network looks like
print(deep_safety_helmet_net)

SqueezeNet(
  (features): HybridSequential(
    (0): Conv2D(3 -> 64, kernel_size=(3, 3), stride=(2, 2))
    (1): Activation(relu)
    (2): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(0, 0), ceil_mode=True)
    (3): HybridSequential(
      (0): HybridSequential(
        (0): Conv2D(64 -> 16, kernel_size=(1, 1), stride=(1, 1))
        (1): Activation(relu)
      )
      (1): HybridConcurrent(
        (0): HybridSequential(
          (0): Conv2D(16 -> 64, kernel_size=(1, 1), stride=(1, 1))
          (1): Activation(relu)
        )
        (1): HybridSequential(
          (0): Conv2D(16 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): Activation(relu)
        )
      )
    )
    (4): HybridSequential(
      (0): HybridSequential(
        (0): Conv2D(128 -> 16, kernel_size=(1, 1), stride=(1, 1))
        (1): Activation(relu)
      )
      (1): HybridConcurrent(
        (0): HybridSequential(
          (0): Conv2D(16 -> 64, kernel_size=(1, 1), stride=(1, 1))
      

The network can already be used for prediction. However, since it hasn't been finetuned yet so the network performance could not be optimal.

Let's test it out by defining a prediction function to feed a local image into the network and get the predicted output

In [72]:
from skimage.color import rgba2rgb

def classify_safety_helmet(net, url):
    I = io.imread(url)
    if I.shape[2] == 4:
        I = rgba2rgb(I)
    image = mx.nd.array(I).astype(np.uint8)
    image = mx.image.resize_short(image, 256)
    image, _ = mx.image.center_crop(image, (224, 224))
    image = mx.image.color_normalize(image.astype(np.float32)/255,
                                     mean=mx.nd.array([0.485, 0.456, 0.406]),
                                     std=mx.nd.array([0.229, 0.224, 0.225]))
    image = mx.nd.transpose(image.astype('float32'), (2,1,0))
    image = mx.nd.expand_dims(image, axis=0)
    out = mx.nd.SoftmaxActivation(net(image))
    print('Probabilities are: '+str(out[0].asnumpy()))
    result = np.argmax(out.asnumpy())
    outstring = ['Not wearing safety helmet', 'Wearing safety helmet']
    print(outstring[result])

Now lets download a safety helmet image and an image not wearing a helmet to our local directory to test this model on



![Wearing helmet](https://www.worksafe.qld.gov.au/__data/assets/image/0017/161333/ris-workplace-exposure-standards-framework-banner.jpg)

![Not wearing helmet](./test/neg/996.jpg)





In [73]:
# To make the defined network run quickly we usually hybridize it first. 
# This also allows us to serialize and export our model
deep_safety_helmet_net.hybridize()

# Let's run the classification on downloaded images to see what our model comes up with
classify_safety_helmet(deep_safety_helmet_net, 'https://www.worksafe.qld.gov.au/__data/assets/image/0017/161333/ris-workplace-exposure-standards-framework-banner.jpg') # check for wearing safety helmet
classify_safety_helmet(deep_safety_helmet_net, './test/neg/996.jpg') # check for not wearing safety helmet

Probabilities are: [ 0.4962869   0.50371313]
Wearing safety helmet
Probabilities are: [ 0.59094107  0.40905896]
Not wearing safety helmet


In [49]:
deep_safety_helmet_net.export('safety_helmet_or_not_model')

The predictions are not great, so we can use a "fine tuning" process (see https://gluon.mxnet.io/chapter08_computer-vision/fine-tuning.html), where we retrained the model with images of wearing hot dogs and not hot dogs. We can then apply these new parameters to our model to make it even more accurate.

In [55]:
# let's examine the output layer and find the last conv layer
print(net.output)

HybridSequential(
  (0): Conv2D(512 -> 1000, kernel_size=(1, 1), stride=(1, 1))
  (1): Activation(relu)
  (2): AvgPool2D(size=(13, 13), stride=(13, 13), padding=(0, 0), ceil_mode=False)
  (3): Flatten
)


In [74]:
# the last conv layer is the second layer
pretrained_conv_params = net.output[0].params

# weights can then be found from the above parameter dict
pretrained_weight_param = pretrained_conv_params.get('weight')
pretrained_bias_param = pretrained_conv_params.get('bias')

# next, we locate the right slice that we're interested in.
helmet_w = mx.nd.split(pretrained_weight_param.data().as_in_context(mx.cpu()),
                       1000, axis=0)[imagenet_helmet_index]
helmet_b = mx.nd.split(pretrained_bias_param.data().as_in_context(mx.cpu()),
                       1000, axis=0)[imagenet_helmet_index]

# our classifier is for two classes. here, we reuse the helmet class weight,
# and randomly initialize the 'not helmet' class.
new_classifier_w = mx.nd.concat(mx.nd.random_normal(shape=helmet_w.shape, scale=0.02),
                                helmet_w,
                                dim=0)
new_classifier_b = mx.nd.concat(mx.nd.random_normal(shape=helmet_b.shape, scale=0.02),
                                helmet_b,
                                dim=0)

# finally, we initialize the parameter buffers and set the values.
# since classifier is a HybridSequential/Sequential, the following
# takes the zero-indexed 1-st layer of the classifier
final_conv_layer_params = deep_safety_helmet_net.output[0].params
final_conv_layer_params.get('weight').set_data(new_classifier_w)
final_conv_layer_params.get('bias').set_data(new_classifier_b)

In [75]:
# return metrics string representation
def metric_str(names, accs):
    return ', '.join(['%s=%f'%(name, acc) for name, acc in zip(names, accs)])
metric = mx.metric.create(['acc', 'f1'])

In [76]:
import mxnet.gluon as gluon
from mxnet.image import color_normalize

def evaluate(net, data_iter, ctx):
    #data_iter.reset()
    for batch in data_iter:
        data = color_normalize(batch.data[0]/255,
                               mean=mx.nd.array([0.485, 0.456, 0.406]).reshape((1,3,1,1)),
                               std=mx.nd.array([0.229, 0.224, 0.225]).reshape((1,3,1,1)))
        data = gluon.utils.split_and_load(data, ctx_list=ctx, batch_axis=0)
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        for x in data:
            outputs.append(net(x))
        metric.update(label, outputs)
    out = metric.get()
    metric.reset()
    return out

In [77]:
import mxnet.autograd as autograd

def train(net, train_iter, val_iter, epochs, ctx):
    if isinstance(ctx, mx.Context):
        ctx = [ctx]
    trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': learning_rate, 'wd': wd})
    loss = gluon.loss.SoftmaxCrossEntropyLoss()

    best_f1 = 0
    val_names, val_accs = evaluate(net, val_iter, ctx)
    logging.info('[Initial] validation: %s'%(metric_str(val_names, val_accs)))
    for epoch in range(epochs):
        tic = time.time()
        train_iter.reset()
        btic = time.time()
        for i, batch in enumerate(train_iter):
            # the model zoo models expect normalized images
            data = color_normalize(batch.data[0]/255,
                                   mean=mx.nd.array([0.485, 0.456, 0.406]).reshape((1,3,1,1)),
                                   std=mx.nd.array([0.229, 0.224, 0.225]).reshape((1,3,1,1)))
            data = gluon.utils.split_and_load(data, ctx_list=ctx, batch_axis=0)
            label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
            outputs = []
            Ls = []
            with autograd.record():
                for x, y in zip(data, label):
                    z = net(x)
                    # rescale the loss based on class to counter the imbalance problem
                    L = loss(z, y) * (1+y*positive_class_weight)/positive_class_weight
                    # store the loss and do backward after we have done forward
                    # on all GPUs for better speed on multiple GPUs.
                    Ls.append(L)
                    outputs.append(z)
                for L in Ls:
                    L.backward()
            trainer.step(batch.data[0].shape[0])
            metric.update(label, outputs)
            if log_interval and not (i+1)%log_interval:
                names, accs = metric.get()
                logging.info('[Epoch %d Batch %d] speed: %f samples/s, training: %s'%(
                               epoch, i, batch_size/(time.time()-btic), metric_str(names, accs)))
            btic = time.time()

        names, accs = metric.get()
        metric.reset()
        logging.info('[Epoch %d] training: %s'%(epoch, metric_str(names, accs)))
        logging.info('[Epoch %d] time cost: %f'%(epoch, time.time()-tic))
        val_names, val_accs = evaluate(net, val_iter, ctx)
        logging.info('[Epoch %d] validation: %s'%(epoch, metric_str(val_names, val_accs)))

        if val_accs[1] > best_f1:
            best_f1 = val_accs[1]
            logging.info('Best validation f1 found. Checkpointing...')
            net.save_params('safety-helmet-%d.params'%(epoch))

if mode == 'hybrid':
    deep_safety_helmet_net.hybridize()
if epochs > 0:
    contexts = [mx.gpu(i) for i in range(gpus)] if gpus > 0 else [mx.cpu()]
    deep_safety_helmet_net.collect_params().reset_ctx(contexts)
    train(deep_safety_helmet_net, train_dataset, test_dataset, epochs, contexts)

AttributeError: 'list' object has no attribute 'data'

In [None]:
Use the new fine-tuned params 

The predictions seem reasonable, so we can export this as a serialized model to our local directory. This is a simple one line command, which produces a set of two files: a json file holding the network architecture, and a params file holding the parameters the network learned.

Now let's push this serialized model to S3, where we can then optimize it for our DeepLense device and then push it down onto our device for inference.

In [14]:
import boto3
import re

assumed_role = boto3.client('sts').get_caller_identity()['Arn']
s3_access_role = re.sub(r'^(.+)sts::(\d+):assumed-role/(.+?)/.*$', r'\1iam::\2:role/\3', assumed_role)
print(s3_access_role)
s3 = boto3.resource('s3')
bucket= 'your s3 bucket name here' 

json = open('hotdog_or_not_model-symbol.json', 'rb')
params = open('hotdog_or_not_model-0000.params', 'rb')
s3.Bucket(bucket).put_object(Key='test/hotdog_or_not_model-symbol.json', Body=json)
s3.Bucket(bucket).put_object(Key='test/hotdog_or_not_model-0000.params', Body=params)

INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.amazonaws.com


arn:aws:iam::622803848910:role/SageMaker_role_IM


s3.Object(bucket_name='sagemaker-test1', key='hotdog_or_not_model-0000.params')