# Convolutional Neural Network

Computer vision is a machine learning technique used to solve problems (find patterns) that involve images and one of these techniques is called CNN. Not all problems are the same and just like in the <a href="https://github.com/4igeek/TensorFlow/tree/main/Classification">classification section</a> of this project we saw two kinds of problems:

1) Binary class problems
2) Multi-class problems

## Typical architecture for a CNN

<table style="width:100%">
    <thead>
        <tr>
            <th style="width:15%">Layer type</th>
            <th style="width:35%">What it does</th>
            <th style="width:50%">Default values</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Input layer</td>
            <td>Processes input images (in the form of tensors)</td>
            <td>Input shape = (batch_size, image_height, image_width, color_channels)</td>
        </tr>
        <tr>
            <td>Convolution layer</td>
            <td>Extracts the key features from images</td>
            <td><span style="font-family: Consolas;">tf.keras.layers.convXD</span>- where X can be any number i.e. 1 for text or 2 for img.</td>
        </tr>
        <tr>
            <td>Hidden activation</td>
            <td>Adds non-linearity to the extracted features</td>
            <td>Usually reLU <span style="font-family: Consolas;">tf.keras.activations.relu</span></td>
        </tr>
        <tr>
            <td>Pooling layer</td>
            <td>Reduces dimensionality of the extracted features</td>
            <td>
            Avg: <span style="font-family: Consolas;">tf.keras.layers.AvgPool2D</span> or 
            Max: <span style="font-family: Consolas;">tf.keras.layers.MaxPool2D</span>
            </td>
        </tr>
        <tr>
            <td>Fully connected layer</td>
            <td>Refines features passed by convolution layers   </td>
            <td><span style="font-family: Consolas;">tf.keras.layers.Dense</span></td>
        </tr>
        <tr>
            <td>Output layer</td>
            <td>Takes learned features and outputs labels</td>
            <td>Output shape = [number_of_classes]</td>
        </tr>
        <tr>
            <td>Output activation</td>
            <td>Adds non-linearity to output layer</td>
            <td>
            Binary: <span style="font-family: Consolas;">tf.keras.activations.sigmoid</span>
            or   
            Multi-class: <span style="font-family: Consolas;">tf.keras.activations.softmax</span>
            </td>
        </tr>
    </tbody>
</table>

There are a lot of ways (almost unlimited) you could make a CNN model. Usually you have a "pooling layer" follow a "conv2D layer" and you may have a number of pooling and conv2D layers stacked on top of one another (in a single model).

## Getting some data

We're going to use kaggle.com to get a dataset for the next notebook. We're going to use the Food 101 dataset. The dataset has been modified to only include two classes i.e pizza and steak. The reason this has been done (and we're not just going in with the 101 different classes) is so we can get a working/proven model before giving it more data (as the more data, the longer it takes to train etc).

As soon as we are happy with our solution we can then scale up and add more classes. We're going to focus on starting simple and then adding complexity (something we should do often when solving ML problems).

In [1]:
# We're going to "wget" the required zip file and then we're going to extract the data from the zip file.
import zipfile

!wget https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip

# Unzip the downloaded file
zip_ref = zipfile.ZipFile("pizza_steak.zip")
zip_ref.extractall()
zip_ref.close()

--2024-05-04 10:30:06--  https://storage.googleapis.com/ztm_tf_course/food_vision/pizza_steak.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.169.27, 216.58.212.219, 216.58.212.251, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.169.27|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 109540975 (104M) [application/zip]
Saving to: 'pizza_steak.zip'

     0K .......... .......... .......... .......... ..........  0%  339K 5m15s
    50K .......... .......... .......... .......... ..........  0%  495K 4m26s
   100K .......... .......... .......... .......... ..........  0% 1.23M 3m25s
   150K .......... .......... .......... .......... ..........  0% 1.19M 2m56s
   200K .......... .......... .......... .......... ..........  0% 1.47M 2m35s
   250K .......... .......... .......... .......... ..........  0% 2.16M 2m17s
   300K .......... .......... .......... .......... ..........  0% 2.81M 2m3s
   350K .......... ..

## Taking a look at the data

It's really important to have a look at the data before moving on to trying to solve the problem. For computer vision this usually involves looking at individual bits of data to get a feel for how the data is made up.

The first thing we may do is find out how many items of data we have in each of the various sets i.e. training (pizza & steak) and testing (pizza & steak). 

In [10]:
import os

for dirpath, dirnames, filenames in os.walk("pizza_steak"):
    print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dirpath}'")

There are 2 directories and 0 images in 'pizza_steak'
There are 2 directories and 0 images in 'pizza_steak\test'
There are 0 directories and 250 images in 'pizza_steak\test\pizza'
There are 0 directories and 250 images in 'pizza_steak\test\steak'
There are 2 directories and 0 images in 'pizza_steak\train'
There are 0 directories and 750 images in 'pizza_steak\train\pizza'
There are 0 directories and 750 images in 'pizza_steak\train\steak'


We can see from the output above that we have 250 test images (each) of both steak and pizza and we have 750 training images (for each). So we have 1,000 images for both steak and pizza.