- since it is not a industrial application, we have differnet steps for pre-processing, implementation etc 

# THEORY

# 1. Convolution

- uses an <b> input image </b>, a <b> feature detector </b> to map the input image to a <b> feature map </b>
- say input image is 7x7, feature detector basically  <i><b> convolutes </b></i> a 3x3 from the input image with another 3x3 grid (separately chosen) and assigns a number to it on the feature map (here it is 5x5)
- convolution amongst matrices can be simply thought of as multiplying corresponding elements
- another name for the feature map is convolved feature/activation map <br><br>
- first main objective is to reduce the size to process it faster, with some loss of info; but we are mainly detecting features and patterns, and not pixel by pixel, so it doesn't really matter<br>
- many differnet convolutions are obtained, that is many different feature maps are obtained by changing the feature detector; this constitutes a <b> convolution </b> type of layer <br><br>
- a specific example of feature detector could be the sharpen; it is 5x5 matrix with value 5 in the middle, surrounded by -1, and then 0s, ultimately giving more weightage to the middle pixel in every convolution <br><br>
- next we apply ReLU, it helps to break linearity, and speed up training, it's a standard step

# 2. Pooling

- next step after Convolution and ReLU activation function
- helps in image recognition where we need to detect object despite their spatial variety, i.e, even the if the images are tilted, compressed or blurred out, the NN should be able to detect the object as is in training set
- we create a pooled feature map from the orig feature map:
    - in <b> max pooling </b>, the pooled feature map contains only max values from a certain sized grid, say we considered a 5x5 feature map, now we want a pooled feature map viz 3x3, then we consider 2x2 grids from the feature map, and choose only the max values (the size of skips in between grids can be varied)
    - we still preserve the feature, because when we choose the max value corresponding to a particular feature, we retain the weights associated
    - in a way, it also helps to prevent overfitting
    - all the parameters regarding grid size, selecting max value, stride etc can be tuned and tested
- other methods of pooling are sub-sampling/ <b> mean pooling</b>, <b> sum pooling </b>

# 3. Flattening

- simple step after convolution and max pooling
- row by row transferring such that final pooled feature map becomes a 1D vector
- input layer for Artificial NN

# 4. Full Connection using ANN

- flattened layer is the input layer
- all hidden layers are completely connected format
- basically now it is similar to ANN, all hidden layers choose more set of features and create newer features, use of back prop to propogate the error backwards, we <b>also update feature detectors</b>
- so propogation goes from the very first convolution layer to output layer and vice-versa

### Softmax

- generalisation of logistic regression, when input classes are more than 2, the output probabilities are obtained using the softmax formula
- we use this along with minimising <b>Categorical CrossEntropy</b> loss function for the classification task in the ANN

# IMPLEMENTATION

In [1]:
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

- helps data augmentation

In [2]:
tf.__version__

'2.13.0'

## 1. Preprocessing
- to prevent overfitting

### 1.1 Preprocessing on Train Data
- image augmentation process: shifting of pixels, changing intensitoes, zoom in/out, rotataion of pixels
- specifically "shear range", "zoom range", "horizontal flip" 

In [3]:
# this object will apply all the transformations on the image data sets
train_datagen= ImageDataGenerator(
            rescale=1./255,       # feature scaling, i.e, it divides all pixel values by 255
            shear_range=0.2,      # literally, applying a shear strain to the image
            zoom_range=0.2,
            horizontal_flip=True)

- we need to split image datasets into images before passing it into the neural network

In [4]:
train_data= train_datagen.flow_from_directory(
            'dataset/training_set', # how it is stored in the directory
            target_size=(64,64),    # size of image being sent into NN
            batch_size=32,          # number of images ina batch=32
            class_mode='binary')    # output layer's prediction

Found 8000 images belonging to 2 classes.


In [5]:
test_datagen=ImageDataGenerator(rescale=1./255)
test_data= test_datagen.flow_from_directory(
            'dataset/test_set',     # how it is stored in the directory
            target_size=(64,64),    # size of image being sent into NN
            batch_size=32,          # number of images ina batch=32
            class_mode='binary')    # output layer's prediction

Found 1998 images belonging to 2 classes.


## 2. Builidng the CNN

In [6]:
cnn=tf.keras.models.Sequential()

### 2.1 Convolution
- parameters:-
    - <mark>filters</mark>: number of feature detectors
               aka kernel
    - <mark>kernel_size</mark>: dimensions of feature detector (n x n)
    - <mark>activation</mark> : ReLU (for all non output layers)
    - 4th parameter comes under <b> **kwargs </b> so no need to mention parameter name, this is actually the shape of the input image, viz r x c x n
        - r is number of pixels in a row, c is number of pixels in a column, n is number of base colors, here = 3, as we use RGB scheme
    - others include <mark>stride</mark> (r x c), <mark>padding</mark> etc

In [7]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size= 3, activation= 'relu', input_shape=[64,64,3])) 
# Conv2D helps to build the convolution layer (diff from dense),
# we are using the classical architecture

### 2.2 Pooling
- parameters:-
    - <mark>pool_size</mark> : the dimensions of small square frame which is used to read off the feature map and obtain the new values, in case of max_pooling, we get the maximum amongst these values in the frame
    - <mark>stride</mark>: number of shifts from one frame to next frame
    - <mark>padding</mark>: set as 'valid' when we need to ignore extra space in the frame which doesn't cover the feature map

In [8]:
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides= 2, padding= 'valid'))

- 2nd convolution and pooling

In [9]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size= 3, activation= 'relu')) # no need of input shape, keep the no.of filter same
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides= 2, padding= 'valid'))

### 2.3 Flattening

In [10]:
cnn.add(tf.keras.layers.Flatten())

### 2.4 Full Connection

In [11]:
cnn.add(tf.keras.layers.Dense(units=128, activation= 'relu'))
cnn.add(tf.keras.layers.Dense(units=1, activation= 'sigmoid'))

## 3. Training the CNN
- compilation followed by training 

### 3.1 Compilation

In [12]:
cnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # same as ann

### 3.2 Training

In [13]:
cnn.fit(x=train_data, validation_data=test_data, epochs= 25)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.src.callbacks.History at 0x1a087cac810>

### 3.3 Evaluation on Test Set

In [14]:
import numpy as np
from keras.preprocessing import image

In [15]:
# store the path of the image in a variable and resize the image acc to input to convolution layer
test_image = image.load_img("dataset/single_prediction/cat_or_dog_4.jpg", target_size=(64,64))
# this gives in PIL format, convert to array format
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
# the CNN is modelling the data in certain number of batches, each of size 32, therefore even the image to be tested on needs
# to be in the form of a batch, this ensures that the predict method sees the extra dimension corresponding to the batch
# axis of the new expanded array to which we need to add the dimension is the 1st dimension, therefore axis=0

In [16]:
test_image.shape

(1, 64, 64, 3)

In [17]:
train_data.class_indices

{'cats': 0, 'dogs': 1}

In [18]:
result = cnn.predict(test_image/255.)
final_prediction = 'dog' if result[0][0] > 0.5 else 'cat'



In [19]:
result.shape

(1, 1)

In [20]:
print(final_prediction)

cat
