In [None]:
%%HTML
<link rel="stylesheet" type="text/css" href="../css/custom.css">

# Deep Learning Overview

![footer_logo](../images/logo.png)

## Goal

In this notebook we shall introduce Deep Learning and its applications, as well as how it relates to the field of Artificial Intelligence and traditional Machine Learning techniques. We shall also discuss the history of Deep Learning and what led to its rise to prominence over the last decade.

## Program

- [AI vs ML vs DL: what is what?]()
- [Applications of Deep Learning]()
- [History of Deep Learning]()
- [The rise of Deep Learning]()

# AI vs ML vs DL: what is what?

![half center](../images/deep_learning_overview/ai-ml-deeplearning.jpeg)

[source](https://www.viatech.com/en/2018/05/history-of-artificial-intelligence/)

- **Artificial intelligence** is the field of study that focusses on building intelligent programs and machines that can solve problems, or make decisions, which have typically been considered a human prerogative.


- **Machine learning** is a subset of artificial intelligence (AI). Machine learning algorithms are not explicitly porgrammed, instead they are learnt by eposure to datasets and improve automatically through experience. There are a broad range of ML algortithms, with different algorithms can be suitable for different tasks, e.g.
    - Decision trees for explainability.
    - XGBoost for high performance.

- **Deep learning** is a subset of machine learning. It uses *deep* neural networks to analyze different factors with a structure that is similar to the human neural system.

## Traditional ML vs DL

![half center](../images/deep_learning_overview/ml_vs_dl.png)

<sub>Source: George Seif, Medium</sub>

Typically, datasets require processing to extract the important features before they are suitable for traditional ML models. 

This is not the case for DL, which combines learning the important features for the task whilst doing the task.

Not only is this efficient, but it removes human bias as to what is important for the performance of the task.



# Applications of Deep Learning
* **Translation**
* Object detection 
* image captioning
* Text and image generation
* Games 


![half center](../images/deep_learning_overview/google-translate.png)


# Applications of Deep Learning
* Translation
* **Object detection**
* image captioning
* Text and image generation
* Games 

![half center](../images/deep_learning_overview/image-captioning.png)

# Applications of Deep Learning
* Translation
* Object detection 
* **image captioning**
* Text and image generation
* Games 

![half center](../images/deep_learning_overview/gemeente.png)

# Applications of Deep Learning
* Translation
* Object detection 
* image captioning
* **Text and image generation**
* Games 

<img src="../images/deep_learning_overview/notexist.png" width="500" align="center">


Source: [thispersondoesnotexist.com](https://www.thispersondoesnotexist.com)


# Applications of Deep Learning
* Translation
* Object detection 
* image captioning
* Text and image generation
* **Games** 

<img src="../images/deep_learning_overview/alphago.jpg" width="600">


# History of Deep Learning

![half center](../images/deep_learning_overview/ai-ml-deeplearning.jpeg)

[source](https://www.viatech.com/en/2018/05/history-of-artificial-intelligence/)


## History

- 1956 - Dartmouth conference: birth of AI as a field of study and the "golden years" of AI.

However, the development of AI solutions has not been a smooth ride; there have been many peeks and troughs in productivity.

## Rosenblatt's perceptron (1958)

A promising development that was able to perform binary classification.

![half center](../images/deep_learning_overview/perceptron.png)

- A feature vector is passed into the perceptron,
- Each feature in the vector is combined with a weight and summed,
- A non-linear activation function is applied,
- This produces an ouput that can be compared with the true y value,
- The error is used to update the weights so that the model improves.

## Perceptron for logical operations

Can you think of a perceptron to solve the following logical operations?

* Logical operator AND
* Logical operator OR
* Logical operator XOR 

| p | q | p AND q | p OR q | p XOR q | 
|---|---|---------|--------|---------|
| 0 | 0 |    0    |    0   |    0    |
| 0 | 1 |    0    |    1   |     1   |
| 1 | 0 |    0    |    1   |    1    |
| 1 | 1 |    1    |    1   |    0    |

[Thought Experiment Exercise](../exercises/01_00_perceptron_thought_experiment.ipynb)


## Minsky (1969)

Work on this field came to a hault when Minsky demonstrated the severe limitations of these single layer networks.

The XOR problem was impossible to solve for a single-layer perceptron!

| p | q | p XOR q |
|---|---|---------|
| 0 | 0 |    0    |
| 0 | 1 |    1    |
| 1 | 0 |    1    |
| 1 | 1 |    0    | 

## Solution? More layers

Minksy never said XOR cannot be solved by neural networks - only that XOR cannot be solved with 1-layer perceptrons!

![neuron_comparison center half](../images/deep_learning_overview/neuron.png)

Multi-layer perceptrons (MLP) can solve XOR:
- One layer’s output is input to the next layer,

- Non-linearities are included between layers, e.g. sigmoids,

 - In fact, 9 years earlier Minsky built such a multi-layer perceptron!

## Problem

Rosenblatt’s algorithm not applicable for training a multi-layer perceptrons. 

![neuron_comparison center half](../images/deep_learning_overview/neuron.png)

Why?
- Learning depends on comparing the output to the "ground truth"

- For the intermediate neurons there is no “ground truth”

- The Rosenblatt algorithm cannot train intermediate layers

## AI winter

AI had been overhyped and underdelivered, this led to funding cuts and the first "AI winter".

However, in the intervening years significant discoveries were made:

- Backpropagation $\rightarrow$ Learning algorithm for MLPs
    - 1986 - Backpropagation - Geoffrey Hinton
    - 1989 - Backpropagation to multi-layer perceptron - Yan LeCun
- Recurrent networks $\rightarrow$  Neural Networks for infinite sequences
    - 1986 - RNNSs - David Rumelhart
    - 1997 - LSTMs - Hochreiter and Schmidhuber
- Convolutional networks $\rightarrow$  Neural Networks that capture spatial properties
    - 1998 - CNNs - Yan Lecun

## The last decade has seen great progress in deep learning

- 2009 - GPU for deep learning - Andrew Ng
- 2011 - Demonstration of ReLu for deep neural networks - Yoshua Bengio
- 2012 - AlexNet wins ImageNet 25% to 16% error
- 2012 - Dropout technique - Geoffrey Hinton
- 2014 - Generative adversarial networks - Ian Goodfellow & Yoshua Bengio
- 2015 - CNN beats human error in ImageNet 5% to 3% (ResNet)
- 2016 - AlphaGo - Google DeepMind
- 2018 - "NLP's ImageNet moment has arrived" - Sebastian Ruder
- 2019 - Human error for GLUE - BERT, ELMo, Open AI's GPT
- 2020 - GPT-3 performs SOTA language generation

# The rise of deep learning



1. GPUs for fast computation

    _(GPU for deep learning by Andrew Ng, 2009)_

2. Effective neural network components

    _e.g. dropout (Hinton, 2012) and ReLU (Bengio, 2011)_




3. Large annotated datasets 

    _(e.g. ImageNet, 2011)_
    


4. Frameworks 

    _(Keras, PyTorch, Theanos etc.)_


## ImageNet dataset
![](../images/deep_learning_overview/dog_breeds.png)

## Frameworks
![three_quarters center](../images/deep_learning_overview/frameworks-time.jpg)


## Frameworks

* Theano (2007 - 2017-ish): open source project by the University of Montréal. 
* Keras (March 2015): 
    - originally a standalone API with various backends 
    - now part of TensorFlow
* TensorFlow (November 2015): open source library by Google
* PyTorch (September 2016): open source library by Facebook 


<img src="../images/deep_learning_overview/pytorch.png" width="200">
<img src="../images/deep_learning_overview/tensorflow.png" width="200">
<img src="../images/deep_learning_overview/keras.png" width="200">


# Summary

In this notebook we covered, 
- [AI vs ML vs DL: what is what?]()
- [Applications of Deep Learning]()
- [History of Deep Learning]()
- [The rise of Deep Learning]()


![footer_logo](../images/logo.png)