<a href="https://colab.research.google.com/github/anuva04/ML_Beginners/blob/main/convolutional_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convolutional Neural Network
- an image is inputted, and the output is a label

## Step-1: Convolution operation
- the image is represented in terms of a 2D or 3D array of values from 0 to 255
- another matrix called the feature detector (also called kernel or filter) is used, which is usually a 3x3 matrix, but can take up any other value as well depending on the use case
- the 2 arrays are convolved with each other
- the result is called a feature map (or activation map)
- this process reduces the size of the image
- this process ofcourse leads to losing some of the information, but it makes some features more distinguishable and hence easier to detect
- the pattern of the feature detector is detected using this process
- various convolutions are performed with various feature detectors

#### Step-1(b): reLU (rectifier linear unit)
- a rectifier function function is applied to the output to increase the non-linearity

## Step-2: Max-pooling (also called down-sampling)
- when a cnn model is learning a particular feature and finds it a particular position in the image, it will try to look for this feature at the exact same position in all images
- since, the feature will not be at the same position in all images, the model will turn out to be very inefficient
- to avoid this, the model should be spatially invariant
- for max-pooling, in the feature map obtained in the previous step, take 2x2 groups of cells and record only the maximum value in the group
- do this for all adjacent groups
- through this process, we get rid of 75% of the information which is not the feature
- also we account for some amount of distortion, ie. if the maximum value in a certain group lied in any of the 4 cells, we'd still have it in the same position in the pooled image
- hence if the image is rotated or squashed a bit, we can still detect it
- it also helps in preventing over-fitting

## Step-3: Flattening
- the matrix is put row by row into a 1D array, this is called flattening
- this is done to feed this data into an ANN later

## Step-4: Full connection
- this 1D array is fed into a fully connected ANN
- from the outputs obtained from the ANN, the error is calculated using a loss function
- the error is backpropagated
- this error is used to adjust the weights in the ANN and also the feature detectors in the CNN

## Softmax and Cross-Entropy
- in the output layer, there can be any number of neurons
- each neuron output can be any real number, which do not necessarily need to add up to 1
- in order to make the values add up to 1, softmax function is used
- hence all the output values are between 0 and 1
- the cross-entropy function is the loss function which needs to be minimised during the training
- cross-entropy is a better measure of error compared to mean squared error because in the inital stages of training and backpropagation, the output values are very low and hence the slope of gradient descent curve is also very low. MSE won't be able to properly detect it and start moving along the gradient. However, cross-entropy can actually detect this and help in moving forward


### Importing the libraries

In [None]:
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

In [None]:
tf.__version__

## Part 1 - Data Preprocessing

### Preprocessing the Training set
- some transformations are applied to the training set to avoid overfitting
- **rescale** is used to apply feature scaling, **shear_range** applies transvection, **zoom_range** is used to zoom the image, **horizontal_flip** is self-explanatory
- **target_size** is to resize the images

In [None]:
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)
training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

### Preprocessing the Test set
- transformations are not applied to test dataset ofcourse
- only feature scaling is applied

In [None]:
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

## Part 2 - Building the CNN

### Initialising the CNN

In [None]:
cnn = tf.keras.models.Sequential()

### Step 1 - Convolution
- since the images have been resized to 64x64, so the input shape will be [64, 64, 3] where 3 stands for RGB colored images. If the images were black and white, this parameter would have been 1
- kernel_size is the size of the feature detector matrix
- filters stands for number of feature detector

In [None]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))

### Step 2 - Pooling

In [None]:
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Adding a second convolutional layer
- here input_shape is not required as it is connected to the previous layer, not the input layer

In [None]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Step 3 - Flattening

In [None]:
cnn.add(tf.keras.layers.Flatten())

### Step 4 - Full Connection

In [None]:
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))

### Step 5 - Output Layer

In [None]:
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

## Part 3 - Training the CNN

### Compiling the CNN

In [None]:
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

### Training the CNN on the Training set and evaluating it on the Test set

In [None]:
cnn.fit(x = training_set, validation_data = test_set, epochs = 25)

## Part 4 - Making a single prediction
- during data preprocessing, an extra dimension of batch size was required
- this needs to be added for any prediction we make
- this is done using np.expand_dims with axis=0 which means this will be the first dimension
- this is intuitive because first we load a batch of images and then process each image in that batch, hence batch size should be the first dimension

In [None]:
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/single_prediction/cat_or_dog_1.jpg', target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = cnn.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
  prediction = 'dog'
else:
  prediction = 'cat'

In [None]:
print(prediction)