# Image classification - Convolutional Neural Networks (CNN)

For *structured* data we can use the "classic" Neural Network algorithm:

<br>
<center><img src="./images/simple-neural-network.png"/></center>
<br>


- A node for each input feature
- One or more intermediate layers called *hidden layers*, with one or more nodes
- One layer to predict the output
- Each node of a layer is connected with all the nodes from the previous and the next layers


This complex structure, with lots of connections, does not fit very well with *unstructured* data, like images.

### Image as data?

We can decompose a 260 colored image into a set of 3 matrices:
- each matrix has the same dimension, 260 rows and 194 columns
- the 3 matrices represent the **RGB decomposition** of the image: red, green and blue
- the cell (x, y) of the matrix z contains a number between 0 and 256. This number represents the intensity of the color z in the pixel (x, y)

<br>
<center><img src="./images/rgb_matrix.png"/></center>
<br>
If we want to build a Neural Network model to distinguish if an image contains a cat or a dog, we would have 260x194x3 = 151320 nodes in the input layer. Even if we use a simple network, like with 1 hidden layer of 100 nodes and the output layer with a single node (dog/cat), we would have

$$\underbrace{(151320\,x\,100)}_{input\,to\,hidden}\,\,+ \overbrace{(100\,x\,1)}^{hidden\,to\,output} = 15,132,100$$

More than 15M connections (aka **parameters**) to learn from our ML processs! That's a lot!!!

Beside that, we would lose every spacial information. The input pixels are put in one single parallel layer, and the network does not take into account the order of the input features. This means that the very upper-left pixel is treated as the more centered pixels, but we know that the latters should bring more information than the former.

## The Convolutional Neural Network

In [None]:
from tensorflow import keras