# Convolution neural network with Tensorflow

Lets start off by getting the data. As convolution neural networks work well with images, we can start off with a dataset of images.

We are gonna use two categories from the FOOD-101 dataset and build a binary classifier.

In [1]:
import zipfile

!wget https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip

zip_ref = zipfile.ZipFile("pizza_steak.zip","r")
zip_ref.extractall()
zip_ref.close()

### Inspect the data and be one with it

The file structure has been formated to directories and subdirectories to use it for cnns

More specifically:

- A train directory which contains all of the images in the training dataset with subdirectories each named after a certain class containing images of that class.
- A test directory with the same structure as the train directory.

In [2]:
import os 
num_steak = len(os.listdir("../working/pizza_steak/train/steak/"))
num_steak

In [3]:
# getting the class names programmatically 
import pathlib
import numpy as np
data_dir = pathlib.Path("../working/pizza_steak/train")
# create a list of class names from the sub directories
class_names = np.array(sorted([item.name for item in data_dir.glob('*')]))
print(class_names)

### visualize visualize visualize visualize

In [4]:
import matplotlib.pyplot as plt
import matplotlib.image as npimg
import random

def view_random_image(target_dir,target_class):
    # setup target directory
    target_folder = target_dir+target_class
    
    rand_img = random.sample(os.listdir(target_folder),1)
    
    img = npimg.imread(target_folder +'/'+rand_img[0])
    plt.imshow(img)
    plt.title(target_class)
    plt.axis("off")
    
    print(f"Image shape: {img.shape}")
    return img

In [5]:
img = view_random_image(target_dir ="../working/pizza_steak/train/",target_class = "steak")

Looking at the image shape more closely, you'll see it's in the form (Width, Height, Colour Channels).

In our case, the widthand height may vary but all are colour images. The values in the R,G and B channels will vary from 0 to 255

So in convolution neural networks, the network looks for patterns in those values in the 3 channels

>As we've discussed before, many machine learning models, including neural networks prefer the values they work with to be between 0 and 1. Knowing this, one of the most common preprocessing steps for working with images is to scale (also referred to as normalize) their pixel values by dividing the image arrays by 255.

In [6]:
img/255.

Components of a convolutional neural network:

| **Hyperparameter/Layer type** | **What does it do?** | **Typical values** |
| ----- | ----- | ----- |
| Input image(s) | Target images you'd like to discover patterns in| Whatever you can take a photo (or video) of |
| Input layer | Takes in target images and preprocesses them for further layers | `input_shape = [batch_size, image_height, image_width, color_channels]` |
| Convolution layer | Extracts/learns the most important features from target images | Multiple, can create with [`tf.keras.layers.ConvXD`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) (X can be multiple values) |
| Hidden activation | Adds non-linearity to learned features (non-straight lines) | Usually ReLU ([`tf.keras.activations.relu`](https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu)) |
| Pooling layer | Reduces the dimensionality of learned image features | Average ([`tf.keras.layers.AvgPool2D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D)) or Max ([`tf.keras.layers.MaxPool2D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D)) |
| Fully connected layer | Further refines learned features from convolution layers | [`tf.keras.layers.Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) |
| Output layer | Takes learned features and outputs them in shape of target labels | `output_shape = [number_of_classes]` (e.g. 3 for pizza, steak or sushi)|
| Output activation | Adds non-linearities to output layer | [`tf.keras.activations.sigmoid`](https://www.tensorflow.org/api_docs/python/tf/keras/activations/sigmoid) (binary classification) or [`tf.keras.activations.softmax`](https://www.tensorflow.org/api_docs/python/tf/keras/activations/softmax) |

How they stack together:

![](https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-simple-convnet.png)
*A simple example of how you might stack together the above layers into a convolutional neural network. Note the convolutional and pooling layers can often be arranged and rearranged into many different formations.*

For reference, the model we're using replicates TinyVGG, the computer vision architecture which fuels the CNN explainer webpage.

> The architecture we're using below is a scaled-down version of VGG-16, a convolutional neural network which came 2nd in the 2014 ImageNet classification competition.

In [7]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

#  set the seed
tf.random.set_seed(42)

# normalizing the data
train_datagen = ImageDataGenerator(rescale = 1./255)
valid_datagen = ImageDataGenerator(rescale =1./255)

train_dir = "../working/pizza_steak/train/"
test_dir = "../working/pizza_steak/test/"

train_data = train_datagen.flow_from_directory(train_dir,
                                              batch_size = 32, # number of images to be processed at the time
                                              target_size = (224,224), # convert all images to the size of 224X224
                                              class_mode ="binary", # type of problem we're working on
                                              seed = 42)
valid_data = valid_datagen.flow_from_directory(test_dir,
                                              batch_size = 32,
                                              target_size = (224,224),
                                              class_mode ="binary",
                                              seed = 42)
# Create a CNN model
model_1  = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters =10,
                          kernel_size = (3,3),
                          activation ="relu",
                          input_shape = (224,224,3)), # first layer specifies input shape
    tf.keras.layers.Conv2D(10,(3,3),activation='relu'),
    tf.keras.layers.MaxPool2D(pool_size=(2,2), padding ="valid"),
    tf.keras.layers.Conv2D(10,(3,3),activation ="relu"),
    tf.keras.layers.Conv2D(10,(3,3),activation ="relu"),
    tf.keras.layers.MaxPool2D((2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1,activation ="sigmoid") # binary actiavtion output
])

# compile the model

model_1.compile(loss = "binary_crossentropy",
               optimizer = tf.keras.optimizers.Adam(),
               metrics =["accuracy"])

history_1 = model_1.fit(train_data,
                       epochs =5,
                       steps_per_epoch = len(train_data),
                       validation_data = valid_data,
                       validation_steps = len(valid_data))

We can see that the model gives a very good accuracy within 5 epochs, but we are training only on 2 of the classes rather than 101 classes


In [8]:
model_1.summary()


**Binary classification: Let's break it down**

- Become one with the data (visualize, visualize, visualize...)
- Preprocess the data (prepare it for a model)
    - A batch is a small subset of the dataset a model looks at during training. 
        For example, rather than looking at 10,000 images at one time and trying to figure out the patterns, a model might only look at 32 images at a time.
        It does this for a couple of reasons:
        - 10,000 images (or more) might not fit into the memory of your processor (GPU).
        - Trying to learn the patterns in 10,000 images in one hit could result in the model not being able to learn very well.
        - Why 32? A batch size of 32 is good for your health.There are many different batch sizes you could use but 32 has proven to be very effective in many different use cases and is often the default for many data preprocessing functions.
        - The ImageDataGenerator class helps us prepare our images into batches as well as perform transformations on them as they get loaded into the model.You might've noticed the rescale parameter. This is one example of the transformations we're doing.
- Create a model (start with a baseline)
    - And it follows the typical CNN structure of:
        
      Input -> Conv + ReLU layers (non-linearities) -> Pooling layer -> Fully connected (dense layer) as Output
    - Let's discuss some of the components of the Conv2D layer:
      - The "2D" means our inputs are two dimensional (height and width), even though they have 3 colour channels, the convolutions are run on each channel invididually.
      - filters - these are the number of "feature extractors" that will be moving over our images.
      - kernel_size - the size of our filters, for example, a kernel_size of (3, 3) (or just 3) will mean each filter will have the size 3x3, meaning it will look at a space of 3x3 pixels each time. The smaller the kernel, the more fine-grained features it will extract.
      - stride - the number of pixels a filter will move across as it covers the image. A stride of 1 means the filter moves across each pixel 1 by 1. A stride of 2 means it moves 2 pixels at a time.
      - padding - this can be either 'same' or 'valid', 'same' adds zeros the to outside of the image so the resulting output of the convolutional layer is the same as the input, where as 'valid' (default) cuts off excess pixels where the filter doesn't fit (e.g. 224 pixels wide divided by a kernel size of 3 (224/3 = 74.6) means a single pixel will get cut off the end.
   - Since we're working on a binary classification problem (pizza vs. steak), the loss function we're using is 'binary_crossentropy', if it was mult-iclass, we might use something like 'categorical_crossentropy'.
   - Adam with all the default settings is our optimizer and our evaluation metric is accuracy.
- Fit the model
- Evaluate the model
   - When a model's validation loss starts to increase, it's likely that it's overfitting the training dataset. This means, it's learning the patterns in the training dataset too well and thus its ability to generalize to unseen data will be diminished.
- Adjust different parameters and improve model (try to beat your baseline)
   - Fitting a machine learning model comes in 3 steps:
       - Create a basline.
       - Beat the baseline by overfitting a larger model.
       - Reduce overfitting.
   - And there are even a few more things we could try to further overfit our model:
       - Increase the number of convolutional layers.
       - Increase the number of convolutional filters.
       - Add another dense layer to the output of our flattened layer.
- Repeat until satisfied


Lets build 2 models 
 - A convnet with max pooling 
 - a convnet with max pooling and data augmentation
 
For the first model, we'll follow the modified basic CNN structure:
 - Input -> Conv layers + ReLU layers (non-linearities) + Max Pooling layers -> Fully connected (dense layer) as Output

In [9]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dense,Flatten

In [10]:
model_2 = Sequential()
model_2.add(Conv2D(10,(3,3),activation='relu',input_shape=(224,224,3)))
model_2.add(MaxPool2D((2,2))) # reduce the number of features by half
model_2.add(Conv2D(10,(3,3),activation="relu"))
model_2.add(MaxPool2D((2,2)))
model_2.add(Conv2D(10,(3,3),activation="relu"))
model_2.add(MaxPool2D((2,2)))
model_2.add(Flatten())
model_2.add(Dense(1,activation ="sigmoid"))
model_2.compile(loss ='binary_crossentropy',
               optimizer = tf.keras.optimizers.Adam(),
               metrics = ['accuracy'])
hist_2 =model_2.fit(train_data,
                   epochs = 5,
                   steps_per_epoch = len(train_data),
                   validation_data = valid_data,
                   validation_steps = len(valid_data))

In [11]:
model_2.summary()

In [12]:
def plot_loss_curves(history):
    """
    returns seperate accuracy and loss curves
    """
    loss = history.history['loss']
    val_loss =history.history['val_loss']
    
    acc  = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    
    epochs= range(len(history.history['loss']))
    
    plt.plot(epochs,loss,label='training_loss')
    plt.plot(epochs,val_loss,label='val_loss')
    plt.title('loss')
    plt.xlabel('Epochs')
    plt.legend()
    
    plt.figure()
    plt.plot(epochs,acc,label ='training accuracy')
    plt.plot(epochs,val_acc,label='val_accuracy')
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.legend();

In [13]:
plot_loss_curves(hist_2)

The validation loss looks to start increasing towards the end and in turn potentially leading to overfitting.Time to dig into our bag of tricks and try another method of overfitting prevention, data augmentation.To implement data augmentation, we'll have to reinstantiate our ImageDataGenerator instances.

In [14]:
train_datagen_augmented = ImageDataGenerator(rescale =1.0/255,
                                            rotation_range =20,
                                            shear_range = 0.2,
                                            zoom_range =0.2,
                                            width_shift_range =0.2,
                                            height_shift_range =0.2,
                                            horizontal_flip =True)
test_datagen = ImageDataGenerator(rescale=1.0/255)

In [15]:
train_data_aug = train_datagen_augmented.flow_from_directory(train_dir,
                                                            target_size =(224,224),
                                                             batch_size = 32,
                                                             class_mode = 'binary',
                                                             shuffle = False)
test_data = test_datagen.flow_from_directory(test_dir,
                                            target_size=(224,224),
                                            batch_size =32,
                                            class_mode ='binary')

In [16]:
model_3 = Sequential([
    Conv2D(10,3,activation ='relu',input_shape=(224,224,3)),
    MaxPool2D(pool_size=2),
    Conv2D(10,3,activation='relu'),
    MaxPool2D(),
    Conv2D(10,3,activation='relu'),
    MaxPool2D(),
    Flatten(),
    Dense(1,activation='sigmoid')
])
model_3.compile(loss ='binary_crossentropy',
               optimizer =tf.keras.optimizers.Adam(),
               metrics =['accuracy'])
hist_3 = model_3.fit(train_data_aug,
                    epochs =5,
                    steps_per_epoch =len(train_data_aug),
                    validation_data = valid_data,
                    validation_steps = len(valid_data))

when we created train_data_augmented we turned off data shuffling using shuffle=False which means our model only sees a batch of a single kind of images at a time.

For example, the pizza class gets loaded in first because it's the first class. Thus it's performance is measured on only a single class rather than both classes. The validation data performance improves steadily because it contains shuffled data.

Since we only set shuffle=False for demonstration purposes (so we could plot the same augmented and non-augmented image), we can fix this by setting shuffle=True on future data generators.

ImageDataGenerator instance augments the data as it's loaded into the model. The benefit of this is that it leaves the original images unchanged. The downside is that it takes longer to load them in.

>One possible method to speed up dataset manipulation would be to look into TensorFlow's parrallel reads and buffered prefecting options.

In [17]:
plot_loss_curves(hist_3)

In [18]:
train_data_aug = train_datagen_augmented.flow_from_directory(train_dir,
                                                            target_size =(224,224),
                                                             batch_size = 32,
                                                             class_mode = 'binary',
                                                             shuffle = True)
test_data = test_datagen.flow_from_directory(test_dir,
                                            target_size=(224,224),
                                            batch_size =32,
                                            class_mode ='binary')

In [19]:
model_4 = Sequential([
    Conv2D(10,3,activation ='relu',input_shape=(224,224,3)),
    MaxPool2D(pool_size=2),
    Conv2D(10,3,activation='relu'),
    MaxPool2D(),
    Conv2D(10,3,activation='relu'),
    MaxPool2D(),
    Flatten(),
    Dense(1,activation='sigmoid')
])
model_4.compile(loss ='binary_crossentropy',
               optimizer =tf.keras.optimizers.Adam(),
               metrics =['accuracy'])
hist_4 = model_4.fit(train_data_aug,
                    epochs =5,
                    steps_per_epoch =len(train_data_aug),
                    validation_data = valid_data,
                    validation_steps = len(valid_data))

In [20]:
plot_loss_curves(hist_4)

#### Repeat until satisified
We've trained a few model's on our dataset already and so far they're performing pretty good, we could try to continue to improve our model:
- Increase the number of model layers (e.g. add more convolutional layers).
- Increase the number of filters in each convolutional layer (e.g. from 10 to 32, 64, or 128, these numbers aren't set in stone either, they are usually found through trial and error).
- Train for longer (more epochs).
- Finding an ideal learning rate.
- Get more data (give the model more opportunities to learn).
- Use transfer learning to leverage what another image model has learned and adjust it for our own use case.
- Adjusting each of these settings (except for the last two) during model development is usually referred to as hyperparameter tuning.

You can think of hyperparameter tuning as similar to adjusting the settings on your oven to cook your favourite dish. Although your oven does most of the cooking for you, you can help it by tweaking the dials.

In [21]:
model_5 = Sequential([
    Conv2D(32,(3,3),activation='relu',input_shape=(224,224,3)),
#     Conv2D(64,(3,3),activation='relu'),
    MaxPool2D((2,2)),
#     Conv2D(32,(3,3),activation='relu'),
#     Conv2D(32,(3,3),activation='relu'),
#     MaxPool2D((2,2)),
    Conv2D(16,(3,3),activation='relu'),
    MaxPool2D((2,2)),
    Flatten(),
#     Dense(32,activation= 'relu'),
    Dense(1,activation ='sigmoid')
])

model_5.compile(loss ='binary_crossentropy',
               optimizer = tf.keras.optimizers.Adam(learning_rate =0.001),
               metrics =['accuracy'])
hist_5 = model_5.fit(train_data_aug,
                    epochs =14,
                    steps_per_epoch =len(train_data_aug),
                    validation_data=test_data,
                    validation_steps = len(test_data))

In [22]:
# View our example image
import matplotlib.image as mpimg 
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg 
steak = mpimg.imread("03-steak.jpeg")
plt.imshow(steak)
plt.axis(False);

In [23]:
steak.shape

In [24]:
# Since our model takes in images of shapes (224, 224, 3), we've got to reshape our custom image to use it with our model.
def pre_proc(filename,img_shape=224):
    """
    reads a image from file and converts it to tensor and reshapes it"""
    img = tf.io.read_file(filename)
    img = tf.image.decode_image(img,channels=3)
    img = tf.image.resize(img,size =[img_shape,img_shape])
    img =img/255.
    return img

In [25]:
steak = pre_proc("../working/03-steak.jpeg")
steak.shape

In [26]:
model_5.predict(steak)


There's one more problem...

Although our image is in the same shape as the images our model has been trained on, we're still missing a dimension.

Remember how our model was trained in batches?

Well, the batch size becomes the first dimension.

So in reality, our model was trained on data in the shape of (batch_size, 224, 224, 3).

We can fix this by adding an extra to our custom image tensor using tf.expand_dims.

In [28]:
#  Add an extra axis
print(f"Shape before new dimension: {steak.shape}")
steak = tf.expand_dims(steak, axis=0) # add an extra dimension at axis 0
#steak = steak[tf.newaxis, ...] # alternative to the above, '...' is short for 'every other dimension'
print(f"Shape after new dimension: {steak.shape}")
steak


In [29]:
pred = model_5.predict(steak)
pred

Ahh, the predictions come out in prediction probability form. In other words, this means how likely the image is to be one class or another.

Since we're working with a binary classification problem, if the prediction probability is over 0.5, according to the model, the prediction is most likely to be the postive class (class 1).

And if the prediction probability is under 0.5, according to the model, the predicted class is most likely to be the negative class (class 0).

In [30]:
class_names =['pizza','steak']

In [31]:
pred_class = class_names[int(tf.round(pred)[0][0])]
pred_class

In [32]:
def pred_and_plot(model,filename,class_names):
    """imports an image located at filename, makes a prediction on it  with a trained model
    and plots the image with predicted class as the title"""
    img = pre_proc(filename)
    pred = model.predict(tf.expand_dims(img,axis =0))
    pred_class = class_names[int(tf.round(pred)[0][0])]
    plt.imshow(img)
    plt.title(f"prediction: {pred_class}")
    plt.axis(False)

In [33]:
pred_and_plot(model_5,"./03-steak.jpeg",class_names)

In [34]:

# Download another test image and make a prediction on it
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-pizza-dad.jpeg 
pred_and_plot(model_5, "./03-pizza-dad.jpeg", class_names)


## Multi-class Verification
### import and become one with the data


In [35]:

import zipfile

# Download zip file of 10_food_classes images
# See how this data was created - https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/extras/image_data_modification.ipynb
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip 

# Unzip the downloaded file
zip_ref = zipfile.ZipFile("10_food_classes_all_data.zip", "r")
zip_ref.extractall()
zip_ref.close() 

In [36]:
import os
# Walk through 10_food_classes directory and list number of files
for dirpath, dirnames, filenames in os.walk("10_food_classes_all_data"):
  print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'.")

In [37]:
train_dir = "10_food_classes_all_data/train/"
test_dir = "10_food_classes_all_data/test/"
import pathlib
import numpy as np
data_dir = pathlib.Path(train_dir)
class_names = np.array(sorted([item.name  for item in data_dir.glob('*')]))
class_names

In [38]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen_aug = ImageDataGenerator(rescale =1/255.,
                                   shear_range =0.2,
                                   zoom_range = 0.2,
                                   rotation_range =20,
                                   width_shift_range=0.2,
                                   height_shift_range = 0.2,
                                   horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale=1/255.)

train_data = train_datagen_aug.flow_from_directory(train_dir,
                                                  target_size=(224,224),
                                                  batch_size =32,
                                                  class_mode ='categorical')
test_data = test_datagen.flow_from_directory(test_dir,
                                            target_size=(224,224),
                                             batch_size =32,
                                             class_mode ='categorical')

In [43]:
model_6 = Sequential([
    Conv2D(16,(5,5),activation='relu',input_shape=(224,224,3)),
#     Conv2D(64,(5,5),activation='relu'),
#     Conv2D(64,(3,3),activation='relu'),
    MaxPool2D((2,2)),
#     Conv2D(64,(3,3),activation='relu'),
#     Conv2D(64,(3,3),activation='relu'),
    Conv2D(8,(3,3),activation='relu'),
    MaxPool2D(),
    Flatten(),
#     Dense(10,activation='relu'),
    Dense(10,activation='softmax')
])

model_6.compile(loss='categorical_crossentropy',
               optimizer = tf.keras.optimizers.Adam(),
               metrics=['accuracy'])
hist_6 = model_6.fit(train_data,
                    epochs =10,
                    steps_per_epoch= len(train_data),
                    validation_data = test_data,
                    validation_steps = len(test_data))

Now we've got augmented data, let's see how it works with the same model as before (model_10).

Rather than rewrite the model from scratch, we can clone it using a handy function in TensorFlow called clone_model which can take an existing model and rebuild it in the same format.

The cloned version will not include any of the weights (patterns) the original model has learned. So when we train it, it'll be like training a model from scratch.

🔑 Note: One of the key practices in deep learning and machine learning in general is to be a serial experimenter. That's what we're doing here. Trying something, seeing if it works, then trying something else. A good experiment setup also keeps track of the things you change, for example, that's why we're using the same model as before but with different data. The model stays the same but the data changes, this will let us know if augmented training data has any influence over performance.

In [44]:
# -q is for "quiet"
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-pizza-dad.jpeg
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-hamburger.jpeg
!wget -q https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-sushi.jpeg

In [50]:
# Adjust function to work with multi-class
def pred_and_plot(model, filename, class_names):
    """
    Imports an image located at filename, makes a prediction on it with
    a trained model and plots the image with the predicted class as the title.
    """
    img = pre_proc(filename)
    pred = model.predict(tf.expand_dims(img, axis=0))
    if len(pred[0]) > 1: # check for multi-class
        pred_class = class_names[pred.argmax()] # if more than one output, take the max
    else:
        pred_class = class_names[int(tf.round(pred)[0][0])] # if only one output, round
    plt.imshow(img)
    plt.title(f"Prediction: {pred_class}")
    plt.axis(False);

In [51]:
pred_and_plot(model_6, "03-pizza-dad.jpeg", class_names)

In [53]:
pred_and_plot(model_6, "03-sushi.jpeg", class_names)

In [55]:
pred_and_plot(model_6, "03-hamburger.jpeg", class_names)

In [56]:
pred_and_plot(model_6, "03-steak.jpeg", class_names)