# Apple or Cherry?

In this Notebook we will finally be solving an image classification problem, where our goal will be to tell which class an  input image belongs to. The way we are going to achieve it is by training an Convolutional Neural Network on few thousand images of cats and dogs and make the NN (Neural Network) learn to predict which class the image belongs to, next time it sees an image having a cat or dog in it.

## Part 1 - Building the CNN

First let us import all the required keras packages using which we are going to build our CNN.

In [1]:
# importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout

Using TensorFlow backend.


- In line 1, we've imported Sequential from keras.models, to initialise our neural network model as a sequential network.
- In line 2, we’ve imported Conv2D from keras.layers, this is to perform the convolution operation i.e the first step of a CNN.
- In line 3, we’ve imported MaxPooling2D from keras.layers, which is used for pooling operation. In line 4, we’ve imported Flatten from keras.layers, which is used for Flattening. Flattening is the process of converting all the resultant 2 dimensional arrays into a single long continuous linear vector. 
- And finally in line 5, we’ve imported Dense from keras.layers, which is used to perform the full connection of the neural network.
- Line 6 will be explained later.

We will create an object of the sequential class.

In [2]:
# initialising the CNN
model = Sequential()

### Step 1 - Convolution

Let us now code the Convolution step, you will be surprised to see how easy it is to actually implement these complex operations in a single line of code in Python, thanks to Keras.

In [3]:
model.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))

We took the object (model) and added a convolution layer by using the *Conv2D* function. The Conv2D function is taking 4 arguments, the first is the number of filters i.e 32 here, the second argument is the shape each filter is going to be i.e 3x3 here, the third is the input shape and the type of image (RGB or Black and White) of each image i.e the input image our CNN is going to be taking is of a 64x64 resolution and *3* stands for RGB. The fourth argument is the activation function we want to use, here *relu* stands for a rectifier function.

### Step 2 - Pooling

Now, we need to perform pooling operation on the resultant feature maps we get after the convolution operation is done on an image. The primary aim of a pooling operation is to reduce the size of the images as much as possible. We start by taking our classifier object and add the pooling layer. We use Max Pooling on 2x2 matrices.

In [4]:
model.add(MaxPooling2D(pool_size = (2, 2)))

W1106 14:53:41.065963 27464 deprecation_wrapper.py:119] From C:\Users\u0063152\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.



Now we will make a dropout layer to prevent overfitting, which functions by randomly eliminating some of the connections between the layers (0.2 means it drops 20% of the existing connections).

In [5]:
model.add(Dropout(0.2))

### Repeat step 1 and 2

We will build a second convolution layer, Max Pooling and dropout layer with the same parameters.

In [6]:
model.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Dropout(0.2))

The exact number of pooling layers you should use will vary depending on the task you are doing, and it's something you'll get a feel for over time. 

### Step 3 - Flattening & Full connection

It’s time for us to now convert all the pooled images into a continuous vector through Flattening. In the last step we need to create a fully connected layer, and to this layer we are going to connect the set of nodes we got after the flattening step, these nodes will act as an input layer to these fully-connected layers.

In [7]:
model.add(Flatten())
model.add(Dense(activation="relu", units=128))

As you can see, Dense is the function to add a fully connected layer, *units* is where we define the number of nodes that should be present in this hidden layer, these units value will be always between the number of input nodes and the output nodes but the art of choosing the most optimal number of nodes can be achieved only through experimental tries. Though it’s a common practice to use a power of 2. And the activation function will be a rectifier function.

### Output Layer

Now it’s time to initialise our output layer, which should contain only one node, as it is binary classification. This single node will give us a binary output of either a Cat or Dog. We will be using a sigmoid activation function for the final layer.

In [8]:
model.add(Dense(3, activation="softmax"))

### Compiling the model

Now that we have completed building our CNN model, it’s time to compile it.

In [9]:
# compiling the CNN
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

We can print out the model summary to see what the whole model looks like.

In [10]:
print(model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 62, 62, 32)        896       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 31, 31, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 31, 31, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 29, 29, 32)        9248      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 14, 14, 32)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 6272)             

As you can see, there are nearly 1 milion parameters that's need to be trained!

## Part 2 - Fitting the CNN to the images

### Step 1 - Data Augmentation

While training your data, you need a lot of data to train upon. Suppose we have a limited number of images for our network. What to do now??

You don’t need to hunt for novel new images that can be added to your dataset. Why? Because, neural networks aren’t smart to begin with. For instance, a poorly trained neural network would think that these three tennis balls shown below, are distinct, unique images.

<img src="./resources/tennis.jpeg"  style="height: 150px"/>

So, to get more data, we just need to make minor alterations to our existing dataset.

In [11]:
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

training_set = train_datagen.flow_from_directory('datasets/colruyt/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

test_set = test_datagen.flow_from_directory('datasets/colruyt/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Found 1111 images belonging to 3 classes.
Found 272 images belonging to 3 classes.


If you want, you can find the complete explanation of each of the above parameters, in the keras documentation page. But what you need to understand as a whole of whats happening above is that we are creating synthetic data out of the same images by performing different type of operations on these images like flipping, rotating, blurring, etc.

One important parameter is target_size which is 64x64, the same as the input_shape.

### Step 2 - Training our network

Now lets fit the data to our model! Training the network might take a while! Meanwwile you can read on!

In [12]:
# was 100, 100 (door rik)
# Epoch 100/100
# 100/100 [==============================] - 37s 368ms/step - loss: 0.3335 - accuracy: 0.8519 - val_loss: 0.6576 - val_accuracy: 0.7655

model.fit_generator(training_set,
                    steps_per_epoch = 100,
                    epochs = 20,
                    validation_data = test_set)

W1106 14:55:53.168974 27464 deprecation_wrapper.py:119] From C:\Users\u0063152\AppData\Local\Continuum\anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.callbacks.History at 0x2790a318eb8>

In the above code, an __epoch__ is a single step in training the neural network. __steps_per_epoch holds__ the number of training images that is used during every step. So we are using 100 images during each step and train the network in 20 steps. Especially the number of images is far to less, but otherwise it would take to long. You can try to modify these parameters yourself later (1000 might be better).

### How to interpret loss and accuracy?

- The loss value implies how well or poorly a certain model behaves after each iteration of optimization. Ideally, one would expect the reduction of loss after each, or several, iteration(s) or epochs. So the lower the loss, the better a model (unless the model has over-fitted to the training data).

- The loss is calculated on training (loss) and test (val_loss) data and its interperation is how well the model is doing for these two sets. Unlike accuracy, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets.

- __The loss value should be better after each epoch.__


- The accuracy is also calculated for the training (accuracy) and the test (val_accuracy) data. The accuracy of a model for the test data, is usually determined after the model parameters are learned and fixed and no learning is taking place.

- After each epoch, the test samples are fed to the model and the number of mistakes (zero-one loss) the model makes are recorded, after comparison to the true targets. Then the percentage of misclassification is calculated.

- __If the model has a good accuracy but bad val_accuracy, it performs way better for the train data then for the test data, it is overfitted.__

### Saving and loading the weights

Since it might take a while to train the model, after completing the fit, you can save the calculated weights and load them again the next time you want to use the model again.

```python
# save weights
model.save_weights('saved_models/modelcats&dogs.h5')
# load weights
model.load_weights('saved_models/modelcats&dogs.h5')

```

You can load the weights of a model I've trained with an accuracy 85%.

In [13]:
model.save_weights('saved_models/modelallfruits.h5')

## Part 3. - Making new predictions from our trained model

Now lets test some random images. In the `singe_image` folder you will find some images of cats and dogs to test you model.

In [36]:
import numpy as np
from keras.preprocessing import image

test_image = image.load_img("datasets/colruyt/single_images/img05.jpg", target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = model.predict(test_image)

print(result[0])
print(training_set.class_indices)

if result[0][0] == 1:
    prediction = "appel"
if result[0][1] == 1:
    prediction = "kers"
if result[0][2] == 1:
    prediction = "kiwi"
print(prediction)

[0. 0. 1.]
{'apple': 0, 'cherry': 1, 'kiwi': 2}
kiwi


The test_image holds the image that needs to be tested on the CNN. Once we have the test image, we will prepare the image to be sent into the model by converting its resolution to 64x64 as the model only excepts that resolution. Then we are using predict() method on our classifier object to get the prediction. As the prediction will be in a binary form, we will be receiving either a 1 or 0, which will represent a dog or a cat respectively.

Though it is not 100% accurate but it will give correct predictions most of the times. Try adding more convolutional and pooling layers, play with the number of nodes and epochs, and you might get high accuracy result.

Maybe you've got a cat or a dog yourself? Make a picture of it and see if the model can predict if it's a cat or dog? You can even try it with your own image and see what it predicts. Whether you look close to a dog or a cat.