# 8 Deep Learning

8.1 Use a pre-trained ResNet50 and VGG16 on Keras to implement an inference of the test image in this directory!

* Example: <https://keras.io/applications/>


8.2 Use a pre-trained ResNet50 and VGG16 on PyTorch to implement an inference of the test image in this directory!

* Example: http://pytorch.org/docs/master/torchvision/models.html

8.3 Compare the inference times of both networks and frameworks!

8.4 Find two positive and negative sample images that are correctly/incorrectly classified!


In [28]:
import os, sys
import tensorflow as tf
os.chdir(os.path.join(os.environ["HOME"], "exercise-students-2020/08_DeepLearning"))
os.environ["KERAS_BACKEND"]="tensorflow"
import keras
from keras.applications.resnet50 import ResNet50
from keras.applications.resnet50 import preprocess_input as resnet_preprocess_input, decode_predictions as resnet_decode_predictions
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input as vgg_preprocess_input, decode_predictions as vgg_decode_predictions
import numpy as np
from keras.preprocessing.image import load_img, img_to_array

In [20]:
resnet = ResNet50()
vgg = VGG16()

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5


In [21]:
image = img_to_array(load_img('test_image.jpg' , target_size=(224, 224)))
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))

In [22]:
resnet_decode_predictions(resnet.predict(resnet_preprocess_input(image)))

[[('n04604644', 'worm_fence', 0.07046478),
  ('n02793495', 'barn', 0.0686209),
  ('n03000134', 'chainlink_fence', 0.061941594),
  ('n04326547', 'stone_wall', 0.05747776),
  ('n03930313', 'picket_fence', 0.05441839)]]

In [29]:
vgg_decode_predictions(vgg.predict(vgg_preprocess_input(image)))

[[('n02793495', 'barn', 0.3092531),
  ('n03028079', 'church', 0.23857628),
  ('n03743016', 'megalith', 0.09048893),
  ('n04604644', 'worm_fence', 0.06188944),
  ('n03891251', 'park_bench', 0.05850323)]]

In [38]:
import torch
import torchvision.models as models

In [39]:
vgg16 = models.vgg16(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)

In [40]:
vgg16.eval()
resnext50_32x4d.eval()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1

In [42]:
from PIL import Image
img = Image.open('test_image.jpg')

In [43]:
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(), 
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406], 
        std=[0.229, 0.224, 0.225]
    )
])


In [49]:
def classify(model, decoder, img):
    return decoder(model(torch.unsqueeze(transform(img), 0)).detach().numpy())

In [50]:
classify(vgg16, vgg_decode_predictions, img)

[[('n02793495', 'barn', 7.1604614),
  ('n04604644', 'worm_fence', 7.028587),
  ('n03891251', 'park_bench', 6.6059346),
  ('n07802026', 'hay', 6.1398897),
  ('n02965783', 'car_mirror', 6.0873003)]]

In [52]:
classify(resnext50_32x4d, resnet_decode_predictions, img)

[[('n04604644', 'worm_fence', 9.05752),
  ('n02793495', 'barn', 7.4780717),
  ('n04326547', 'stone_wall', 6.7159877),
  ('n03891251', 'park_bench', 6.4993634),
  ('n03743016', 'megalith', 6.4531875)]]

In [53]:
img = torch.unsqueeze(transform(img), 0)

In [55]:
%%time
_ = vgg16(img)

CPU times: user 887 ms, sys: 410 ms, total: 1.3 s
Wall time: 132 ms


In [71]:
%%time
_ = resnext50_32x4d(img)

CPU times: user 670 ms, sys: 377 ms, total: 1.05 s
Wall time: 108 ms


In [58]:
resnet_image = resnet_preprocess_input(image)
vgg_image = vgg_preprocess_input(image)

In [68]:
%%time
_ = resnet.predict(resnet_image)

CPU times: user 663 ms, sys: 442 ms, total: 1.11 s
Wall time: 121 ms


In [75]:
%%time
_ = vgg.predict(resnet_image)

CPU times: user 855 ms, sys: 370 ms, total: 1.23 s
Wall time: 132 ms


#### Explanation
Time used to infere an image can vary a lot even on the same framework and architecture, that's why it's not possible to distinguish quickly what framework is faster.