# Introduction to convolutional neural network

Here's we what are going to do in this notebook:

1. Get to know **deep neural network** (DNN)
2. Get to know **convolutional neural network** (CNN)
    - Motivation for CNN
    - Key components that define a CNN
3. Build a simple CNN **image classifier** using `tensorflow.keras`

## Deep neural network

Let's first see the big picture.

Wikipedia: **Machine learning** (ML) is the study of computer algorithms that improve automatically through experience.

Machine learning is often sliced into

* Supervised learning (predicting a label, i.e. classification, or a continuous variable),
* Unsupervised learning (pattern recognition for unlabelled data, e.g., clustering),
* Reinforcement learning (algorithms learn the best way to "behave", e.g. AlphaGo Zero, self-driving cars). 

Deep learning is a powerful form of machine learning that has garnered much attention for its successes in computer vision (e.g. image recognition), natural language processing, and beyond. 

DNN is probably the most well-known network for deep learning.
- Originally inspired by information processing and communication nodes in biological systems.
- Input data is passed through layers of the network, which contain a number of nodes, analogous to "neurons". 
- DNN systems can be trained to learn the features of the data very well.

![Deep neural network](../img/deep-nn.jpg)
Image credit: Waldrop, M. M. (2019). News Feature: What are the limits of deep learning?. Proceedings of the National Academy of Sciences, 116(4), 1074-1077.

Roughly speaking, there are two important operations that make a neural network.
1. **Forward propagation**
2. **Backpropagation**

### Forward propagation
+ The network reads the input data, computes its values across the network and gives a final output value.
+ This is the **prediction** step.

How does the network computes an output value?

Let's see what happens in a single layer network when it does one prediction.
1. Inputs: a vector of numbers.
2. Weights: each node has its own weight.
3. Weighted sum: as the name suggests, a weighted sum of the inputs.
3. Activation: the weighted sum is "activated" through a (usually nonlinear) activation function, e.g. step function.

![title](../img/perceptron.jpg)

Image [credit](https://deepai.org/machine-learning-glossary-and-terms/perceptron).

If you know a bit about algebra, this is what the operation is doing:
- $y = f(\mathbf{w}\cdot \mathbf{x} + b) $

where $\mathbf{w}\cdot \mathbf{x} + b$ is the weighted sum, $f(\cdot)$ is the activation function, and $y$ is the output.

Now, in a deeper neural network, the procedure is essentially the same. The input --> weighted sum --> activation process is done for each layer. 

![title](../img/MLP.png)

Image [credit](https://www.cs.purdue.edu/homes/ribeirob/courses/Spring2020/lectures/03/MLP_and_backprop.html).

### Backpropagation

+ By comparing the predictions and the ground truth values (loss), the network adjusts its parameters so that the performance is improved. 
+ This is the **training** step.

How does the network adjust the weights through training?

This is done through an operation called **backpropagation**, or backprop. The network takes the loss and recursively calculates the slope of the loss function with respect to each network parameter. Calculating these slopes requires the usage of chain rule from calculus, you can read more about it [here](https://sebastianraschka.com/faq/docs/backprop-arbitrary.html).

An optimization algorithm is then used to update network parameters using the gradient information untill the performance cannot be improved anymore. One commonly used optimizer is stochastic gradient descent. 

One analogy often used to explain gradient-based optimization is hiking:
+ Training the network so that its loss is minimized is like trying to get to the lowest point from a mountain.
+ Backprop operation finding the loss function gradients is like finding the path on your way down.
+ Optimization algorithm is the step where you actually take the path and eventually reach the lowest point.

![title](../img/gradient-descent.png)
Image [credit](https://www.datasciencecentral.com/profiles/blogs/alternatives-to-the-gradient-descent-algorithm).

So now you already know that DNN
- is a powerful **machine learning** technique
- can be used to tackle **supervised**, **unsupervised** and **reinforcement learning** problems
- consisits of forward propagation (**input to ouput**) and backpropagation (**error to parameter update**)

We are ready to talk about CNN!

## Convolutional neural network

## A simple CNN image classifier 

# What's next?

In the next notebook, 

# Further resources

You can learn more about CNN 👉:
- [CS231n Convolutional Neural Networks for Visual Recognition](https://cs231n.github.io/)
- [DeepMind x UCL | Convolutional Neural Networks for Image Recognition
](https://www.youtube.com/watch?v=shVKhOmT0HE&ab_channel=DeepMind)

and how to implement them 👉:
- [Introduction to Keras for Engineers
](https://keras.io/getting_started/intro_to_keras_for_engineers/)
- [Tensorflow Keras CNN Guide](https://www.tensorflow.org/tutorials/images/cnn)

Enjoy! 👏👏👏