How to use prebuilt CNN in Keras

We will use VGG16 - VGG16 is trained on Image Net dataset. It can recognize 1000 common world objects

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [1]:
import os
import tensorflow
import keras

### Pre trained models:
1. From keras you can access them using applications module:
https://keras.io/api/applications/
2. The first time when you access these models, keras will download them on your machine and since these models can be fairly large, it will take some time for you to use these models the first time.
3. You will need to pre-process images in a particular manner when you use a particular model, keras makes it easy for you to do this, by using preprocess_input

### Step 1: Loading the model in the memory (if doing this for first time the model will download in the system and then loaded in the memory

In [2]:
from keras.applications.vgg16 import VGG16,preprocess_input
from keras.preprocessing.image import load_img,img_to_array
import numpy as np

Most of the models require some preprocessing to be done on the images so you need to import preprocess_input for whatever model you're trying to use

to convert load_img to npy arrays we need img_to_array

In [3]:
model=VGG16()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels.h5


Running this line will cause model to be downloaded into your memory - very large size, so if you run this code for the first time, the model gets downloaded into your system and then gets downloaded into the memory

Depending on your net speed, this could take a while

In [4]:
model.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0     

We see a fairly large model with many conv layers, many pooling layers

### Step 2: Load the image, make sure that the image is of size which the model can take as an input
https://keras.io/applications/#vgg16

The image we are using is that of a pizza. Now we'll see whether the VGG 16 model will be able to identify this image as pizza

Using load_img to load image

In [5]:
img_path="pizza.jpg"
img=load_img(img_path,target_size=(224,224))

Note: Most models will have a restriction on the size of input that it can take. VGG16 can only take images of size 224x224

load_img gives you the ability to read an image in the requisite size.
(look at the doc to see image size requisites for other models)

In [6]:
img.size

(224, 224)

### Step 3 : Preprocess the data, image must be converted to an array and then necessary pre-processing be applied

In [7]:
img_array=img_to_array(img) ##converts image into a np array

In [8]:
img_array.shape

(224, 224, 3)

224px224p across 3 RGB channels
(size1,size2,channels)

Now preprocess_input function requires a 4-dim input, so we will need to add an extra dimension using expand_dims

In order to create a batch of images, you need an additional dimension: (samples, size1,size2,channels)

In [9]:
img_array=np.expand_dims(img_array,axis=0)
## axis: Position in the expanded axes where the new axis (or axes) is placed.

In [10]:
img_array.shape

(1, 224, 224, 3)

(samples, size1,size2,channels)

In [11]:
img_array

array([[[[242., 242., 244.],
         [240., 241., 243.],
         [233., 234., 236.],
         ...,
         [173., 163., 161.],
         [172., 162., 161.],
         [172., 162., 161.]],

        [[225., 226., 230.],
         [234., 235., 237.],
         [248., 248., 250.],
         ...,
         [174., 164., 163.],
         [170., 160., 159.],
         [170., 160., 159.]],

        [[237., 238., 240.],
         [253., 253., 253.],
         [255., 255., 255.],
         ...,
         [171., 161., 160.],
         [172., 162., 161.],
         [171., 161., 160.]],

        ...,

        [[136., 126., 117.],
         [138., 129., 120.],
         [138., 129., 122.],
         ...,
         [124., 111.,  92.],
         [122., 110.,  88.],
         [124., 111.,  92.]],

        [[137., 124., 116.],
         [135., 124., 118.],
         [137., 128., 121.],
         ...,
         [124., 111.,  92.],
         [123., 110.,  91.],
         [122., 109.,  90.]],

        [[138., 125., 116.],
       

In [12]:
img_pre_processed=preprocess_input(img_array)

In [13]:
img_pre_processed

array([[[[140.061    , 125.221    , 118.32     ],
         [139.061    , 124.221    , 116.32     ],
         [132.061    , 117.221    , 109.32     ],
         ...,
         [ 57.060997 ,  46.221    ,  49.32     ],
         [ 57.060997 ,  45.221    ,  48.32     ],
         [ 57.060997 ,  45.221    ,  48.32     ]],

        [[126.061    , 109.221    , 101.32     ],
         [133.061    , 118.221    , 110.32     ],
         [146.061    , 131.22101  , 124.32     ],
         ...,
         [ 59.060997 ,  47.221    ,  50.32     ],
         [ 55.060997 ,  43.221    ,  46.32     ],
         [ 55.060997 ,  43.221    ,  46.32     ]],

        [[136.061    , 121.221    , 113.32     ],
         [149.061    , 136.22101  , 129.32     ],
         [151.061    , 138.22101  , 131.32     ],
         ...,
         [ 56.060997 ,  44.221    ,  47.32     ],
         [ 57.060997 ,  45.221    ,  48.32     ],
         [ 56.060997 ,  44.221    ,  47.32     ]],

        ...,

        [[ 13.060997 ,   9.221001 ,  1

The array has changed a little bit. 

The preprocess_input function is meant to adequate your image to the format the model requires.

Some models use images with values ranging from 0 to 1. Others from -1 to +1. Others use the "caffe" style, that is not normalized, but is centered.

### Step 4: Use the pre-processed array to obtain predictions from the model

In [14]:
from keras.applications.vgg16 import decode_predictions

In [15]:
preds=model.predict(img_pre_processed)

In [16]:
preds

array([[8.09403759e-12, 1.19740040e-09, 3.63045184e-11, 6.16766924e-12,
        5.38119939e-12, 4.67018257e-10, 1.60714220e-11, 6.41266693e-11,
        6.68801792e-10, 2.15077739e-12, 3.10413251e-09, 3.94310584e-10,
        2.09437645e-09, 1.35691194e-10, 8.78269419e-12, 1.84269350e-10,
        1.54407667e-11, 2.10021861e-11, 1.06973232e-11, 6.47694398e-11,
        5.73020190e-12, 5.49854433e-12, 1.32985615e-12, 1.92453128e-12,
        1.05192504e-11, 4.44279752e-11, 1.43763085e-10, 8.15150947e-10,
        1.81952318e-11, 6.80917545e-11, 7.43055756e-11, 1.61425526e-10,
        1.58936564e-09, 4.55192412e-10, 3.86958209e-11, 1.22816063e-11,
        1.36552394e-10, 3.92725380e-10, 4.79942530e-10, 6.89640359e-12,
        1.59173119e-10, 7.92500440e-12, 1.62174829e-10, 1.45306519e-11,
        1.93776679e-11, 6.91387225e-11, 1.28756616e-10, 1.16889554e-09,
        9.66811050e-13, 2.53120771e-11, 1.08658492e-10, 1.90061682e-11,
        1.60220018e-10, 3.37314815e-10, 5.36554712e-09, 8.574716

This will result in an array of length 1000, corresponding to the probabilities for all of those 1000 classes for which VGG16 model was pretrained on

To make better sense of these predictions we use decode_predictions

In [17]:
decode_predictions(preds,top=5)
## see top5 predictions by probability

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json


[[('n07873807', 'pizza', 0.9987103),
  ('n07693725', 'bagel', 0.00047675797),
  ('n07717410', 'acorn_squash', 0.00014626134),
  ('n07716358', 'zucchini', 0.00013352957),
  ('n07753113', 'fig', 0.000107980035)]]

Our model was able to identify pizza as "pizza" cause the first entry gets the highest probabilty = which is pizza

You may wonder what are the 1000 categories that VGG16 was trained on

### vgg16 model is trained on imagenet database to identify 1000 common objects/animals/places etc

In [18]:
decode_predictions(preds,top=1000)

[[('n07873807', 'pizza', 0.9987103),
  ('n07693725', 'bagel', 0.00047675797),
  ('n07717410', 'acorn_squash', 0.00014626134),
  ('n07716358', 'zucchini', 0.00013352957),
  ('n07753113', 'fig', 0.000107980035),
  ('n07684084', 'French_loaf', 9.5818046e-05),
  ('n07768694', 'pomegranate', 9.028103e-05),
  ('n07875152', 'potpie', 5.4539167e-05),
  ('n07579787', 'plate', 3.5352503e-05),
  ('n02776631', 'bakery', 3.1126852e-05),
  ('n07742313', 'Granny_Smith', 1.4971491e-05),
  ('n07745940', 'strawberry', 1.3797816e-05),
  ('n04263257', 'soup_bowl', 9.127857e-06),
  ('n07747607', 'orange', 7.6794295e-06),
  ('n03400231', 'frying_pan', 6.849952e-06),
  ('n07871810', 'meat_loaf', 5.9920726e-06),
  ('n07860988', 'dough', 5.3712947e-06),
  ('n04270147', 'spatula', 4.8429442e-06),
  ('n07720875', 'bell_pepper', 3.930653e-06),
  ('n04597913', 'wooden_spoon', 3.5990251e-06),
  ('n07697537', 'hotdog', 3.5827074e-06),
  ('n07734744', 'mushroom', 3.5543521e-06),
  ('n07714990', 'broccoli', 3.5347493e

You can use this model to figure out if these objects are the input image or not