<center><h1 style="color:maroon">An Introduction to Neural Networks</h1>
    <img src="figures/09-neural_networks.jpeg" style="width:1300px">
    <h3><span style="color: #045F5F">Data Science & Machine Learning for Planet Earth Lecture Series</span></h3><h6><i> by Cédric M. John <span style="size:6pts">(2022)</span></i></h6></center>

## Plan for today's Lecture 🗓 

* Introduction to <code>TensorFlow.keras</code>
* Overview of the building blocks of neural networks
* Compiling and training neural networks
* Deep-Learning: adapting neural network architectures to specific tasks

## Intended learning outcomes 👩‍🎓

* Be comfortable using <code>TensorFlow.keras</code>
* Correctly selet the type of neural layer for your task
* Compile, train and assess neural networks
* Gain confidence in the method in preparation for your Deep-Learning module

http://beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html

https://erogol.com/

# Introduction to Neural Networks
<br>

<center><img src="figures/DALLE_perceptron.png" style="width:900px;">
 © Cédric John, 2022; Image generated with <a href="https://openai.com/blog/dall-e/">DALL-E</a>
<br>Prompt: <i>A photo of the Mark I Perceptron machine in dramatic lighting with scientists in white lab coat fretting about</i>.</center>

# Neural Network: A Historical Perspective

#### Neural Networks have a long history starting with the "Perceptron"

<img src="figures/Mark_I_perceptron.jpeg" style="height:600px;padding:5px" align="left">
<img src="figures/Rosenblatt.jpg" style="height:600px;padding:3px" align="left">
<a href="https://en.wikipedia.org/wiki/Perceptron">Wikipedia</a>

## Neural Network date back to the 70's

<img src="figures/ML-popularity.jpg" style="width:1600px">
<a href="https://erogol.com/">Eren Gölge, 2015</a>

## Technical Advancements for neural networks

<img src="figures/nn_timeline.jpg" style="width:1600px">
<a href="http://beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html">Beam, A., 2017</a>

### Compute Power using Graphic Processor Units (GPUs) and the rise of Deep Learning

<img src="figures/imagenet_progress.png" style="width:1300px">
<a href="http://beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html">Beam, A., 2017</a>

### Availability of (Labelled) Data

<img src="figures/ai-winter.png" style="width:1500px">
<a href="https://link.springer.com/article/10.1007/s10506-022-09309-8">Francesconi, 2022</a>



### Neural Networks (and the field of Deep-Learning) are a subfield of Machine Learning

<img src="figures/Circles.png" style="width:600px">
<a href="https://link.springer.com/article/10.1007/s10506-022-09309-8">Francesconi, 2022</a>



📃 **Deep Learning not (yet) better** than other ML approaches at predicting tabular data

🖼️ **Deep Learning** very **powerful with unstructured data** (images, NLP, sound, video, ...) and some highly structured data (time series)

🏎️ **Most recent progress in ML** are in the field of Deep-Learning

### In the `Deep-Learning` module you will learn to handle complex model architecture


<img src="figures/model_zoo1.png" style="width:250px" align="left">
<img src="figures/model_zoo2.jpeg" style="height:700px" align="left">

3️⃣ These will include e.g. <span style="color:blue">**Convolutional Neural Networks**</span>, <span style="color:brown">**Transformers**</span> and <span style="color:purple">**Recurrent Neural Networks**</span>

1️⃣ Today, we are limiting ourselves to <span style="color:teal">**Feedforward Neural Networks**</span>

# Ok, but what ARE Neural Networks?

## Biological Neurons

<img src="figures/Neuron3.png" style="width:1000px">
<a href="https://www.youtube.com/watch?v=_HMLZHQpQDI">Krigolson, 2019</a> (YouTube video)

* 🔣 Biological neurons take inputs through their dendrite, transform the input, yield an output

* 🔥 The neuron <span style="color:brown">**only fires if a threshold in signal is reached**</span> (non-linear)

* 🤖 Artificial neurons are <span style="color:blue">*loosely*</span> inspired from the biological neuron

## Artificial Neurons

The combination of the linear regression followed by an activation function is effectively what is known as an **ARTIFICIAL NEURON**!

<img src="figures/neuron.png" style="width:800px">
<a href="https://www.amazon.com/dp/0131471392">Haykin, 2008</a>




* Notice the <span style="color:blue">**Activation Function**</span>

### Let's write one from scratch in Python!

Neural networks are surprisingly easy to code. Let's imagine that we have feature vector (`X`, shape=4) that pertain to weather, and a label (`y`) that indicates whether it will rain (`1`) or not (`0`) in the next hour. Here is how a sample would look:


In [None]:
import numpy as np

# Example of a 'rainy day':

y = 1 
X = [1., -3.1, -7.2, 2.1]



☔ We want to **predict whether or not it will rain**!


We can write a function that returns a linear regression wiht some weights:


In [None]:
def linreg_1(X):
    return -3 + 2.1*X[0] - 1.2*X[1] + 0.3*X[2] + 1.3*X[3]

out_1 = linreg_1(X)

## Activation function

* 📏 As writen above, our <span style="color:red">algorithm is simply a linear regression</span>.

 
* ❄️ The trick is to take the output of the linear function, and <span style="color:blue">transform it via an activation function</code>. 

* 🚧 The activation function will <span style="color:teal">only output the value if certain conditions are met</span>

**Some well-known activation functions:**
<img src="figures/activation_functions.png" style="width:1000px">
                                                   
<a href="https://medium.com/@shrutijadon/survey-on-activation-functions-for-deep-learning-9689331ba092"> Jadon, 2018</a>

<span style="color:brown">(**ReLU** most commonly used these days)</span>

### Implementing the 'ReLU' function

In [None]:
def activation(x):
    if x > 0:
        return x
    else:
        return 0

out_1 = activation(out_1)

### Adding more neurons with the same inputs but different weights

We can
* apply **other** linear regressions (neurons) to the same input X
* followed by the **same** activation function
* but with different **(trainable) weights and biases**

In [None]:
def linreg_2(X):
    return -5 - 0.1*X[0] + 1.2*X[1] + 4.9*X[2] - 3.1*X[3]

out_2 = activation(linreg_2(X))



and:


In [None]:
def linreg_3(X):
    return -8 + 0.4*X[0] + 2.6*X[1] +- 2.5*X[2] + 3.8*X[3]

out_3 = activation(linreg_3(X))



### We just wrote a layer of neurons!
Each neuron receives the same input (`X`), has different weights, and uses the same activation function.

In neural networks, the next step is to give the output of these neurons as input to the next layer of neurons


## Building a Neural Network

A neural network is a complex function $f_{\theta}$:
$$f_{\theta}(X) = \hat{y}$$

Where <span style="color:blue">$X$</span> is the feature vectors, <span style="color:purple">$\theta$</span> are the weights and biases of the linear regressions that take place within each neuron, and <span style="color:brown">$\hat{y}$</span> is the prediction output of the function.

![net](figures/neuralnet_4.png)
<a href="http://www.astroml.org/_images/fig_neural_network_1.png">Ž. Ivezić et al., 2014</a>

The way the neurons and the weight and biases are connected is known as the **architecture** of your neural network. 


### Implementing the next layer (output)

In [None]:

def linreg_next_layer(X):
    return 5.1 + 1.1*X[0] - 4.1*X[1] - 0.7*X[2]

def activation_next_layer(x):
    # this is known as the sigmoid activation, used for clasification task!
    return 1. / (1 + np.exp(-x))

def neural_net_predictor(X):
    
    out_1 = activation(linreg_1(X))
    out_2 = activation(linreg_2(X))
    out_3 = activation(linreg_3(X))
    
    outs = [out_1, out_2, out_3]
    
    y_pred = activation_next_layer(linreg_next_layer(outs))
    
    return y_pred



In [None]:

# Final prediction
y_pred = neural_net_predictor(X)

print(f' Probability of rain: {y_pred:.02f}')



### 🎉 Congrats! You just build your first (artificial) neural network

* and because it is in pure `Python`, it is **super inefficent**!
* Also, the weights and biases in our function are fixed (not trainable)

## Deep-Learning with <code>scikit-learn</code>

Although there is a module called <code>neural_network</code> in <code>scikit-learn</code>, it contains only three algorithms (<code>BernoulliRBM()</code>,<code>MLPClassifier()</code>, and <code>MLPRegressor()</code> and, in practice, it is very limited:

In [None]:
from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(30,45, 100,23), activation='relu', max_iter=1000)

x = [[1.2,0.1], [2.3,0.4], [1.3,0.2], [1.5,0.2], [4.6,0.12], [2.3,0.23]]
y = [0,1,1,1,0,1]

mlp.fit(x,y)

🚨 This works, but we are not able to devise our own architecture beyond deciding how many layers and number of neuron per layers.

💡 For this, we need a framework specific for Deep-Learning: the two most popular today are <code>TensorFlow.keras</code> written and supported by **Google**, and <code>PyTorch</code> written and supported by **Meta**. Let's explore their popularity!

# Introduction to <code>TensorFlow.keras</code>
<br>

<center><img src="figures/DALLE_tfkeras.png" style="width:900px;">
 © Cédric John, 2022; Image generated with <a href="https://openai.com/blog/dall-e/">DALL-E</a>
<br>Prompt: <i>The tensorflow keras popular deep-learning library having a drink with a few friends, digital art</i>.</center>

### The PyTorch vs TensorFlow War: Google Searches
<img src="figures/tf_vs_pt.png" style="width:300px">
<img src="figures/tensorflow-vs-pytorch.png" style="width:1800px">
<a href="https://buggyprogrammer.com/pytorch-vs-tensorflow-which-one-is-better/">Kumar, A., 2022</a>

### PyTorch is now the preferred framework for research...
<img src="figures/research_papers.png" style="width:800">
<a href="https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/">O'Connor, 2021</a>

### ... but TensorFlow remains the industry standard
<img src="figures/tf_jobs.png" style="width:800">
<a href="https://www.reddit.com/r/MachineLearning/comments/rga91a/d_are_you_using_pytorch_or_tensorflow_going_into/">O'Connor, 2021</a>

## We believe that:

🚀 Fundamental principles matter more than what framework you use!

🍰 We will uee <code>TensorFlow.keras</code> today, because it has a nice abstraction and makes it easier to understand the fundamentals of Deep-Learning the first time you encounter them. 

🏭 This will also give you valuable experience with the industry standard, and you will use <code>TensorFlow.keras</code> later in the course if you are on **EDSML**.

🤖 For your **Deep-Learning Module**, you will work with <code>PyTorch</code>: this will give you experience with it and with all the newest research models out there!

✨ For your **Independent Research Project** and your future career, you should feel free to use either framework, or indeed any newer and better library out there!

## What is TensorFlow.Keras?

TensorFlow is the **backend**, i.e. the compute module, for deep-learning written by Google. Keras, written by François Chollet (Google), is the high-level API sitting on top of TensorFlow and making writing neural networks easy! 

<img src="figures/keras_and_tf.png" style="width:1300px">
<a href="https://drek4537l1klr.cloudfront.net/chollet2/v-3/Figures/keras_and_tf.png">Chollet, J.F.</a>


⚠️ <code>TensorFlow</code> and <code>Keras</code> used to be separate libraries before 2019: when googling for help, make sure to search solutions for **tf.keras**, NOT simply keras! The two framework are very similar, but not fully compatible.

## Importing <code>TensorFlow.keras</code>

In [None]:
# Let's import it!

import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np

print(tf.__version__)

When using `tf.keras` (and more generally whenever building deep-learning models) there are three steps to follow:

* 1️⃣ **Define the model architecture** of your model; this will be initiated with random weights and biases

* 2️⃣ **Define the methods** used to evaluate and train your model: this include the cost function you want to use, learning rates (remember the class on gradient descent?), etc...

* 3️⃣ **Fit the model** to your feature vector `X` in order to obtain the best estimates for your weights $\theta$ in order to predict your $\hat{y}$

# Building A Feedforward Neural Network
<br>

<center><img src="figures/DALLE_panda.png" style="width:900px;">
 © Cédric John, 2022; Image generated with <a href="https://openai.com/blog/dall-e/">DALL-E</a>
<br>Prompt: <i>A 35 mm photography of a panda construction worker with a yellow hard hat putting together a complex array of electrical components,  digital art</i>.</center>

## Defining model architecture with the <code>tf.keras</code> <code>sequential API</code>

The `Sequential` API consists at adding each layer one-by-one to the network. There is also a `Functional` API, where layers are modelled as functions, and the output of each layer is the input of the next layer.

Let's re-create our very simple neural network model from before:

In [None]:
from tensorflow.keras import Sequential, layers

model = Sequential()

# First layer : 3 neurons and ReLU as activation function
model.add(layers.Dense(3, activation='relu',input_dim=4)) 


➡️ The `input_dim` captures the shape of the input

➡️ The `Dense` class referes to a **fully connected** layer of neurons (in this case, 3 neurons)

We also need to add the output layer: because this is a binary classifier, we only need one output with the `sigmoid` activation:

In [None]:
model.add(layers.Dense(1, activation='sigmoid'))

That is it, our model is built! Now we can obtain useful information about the architecture of our model by calling the `.summary()` function on it:

In [None]:
model.summary()

As you can see, this gives you the type of each layer (both are dense layers), their shape (3, 1) and the number of trainable parameters in each layer and in total

## Some Ground Rules for building Neural Networks

The problem of course is that you can build very complex architectures. For instance, we can do this:

![architecture](figures/neuralnet_0.png)


### Many architectures are possible, but there are a few ground rules:

1. <span style="color:darkgreen">**The FIRST LAYER**</span> needs the size of your input (we have seen this above)

2. <span style="color:brown">**The LAST LAYER**</span> is dictated by your task: 
    * A regression task requires a layer with **1 neuron** and a **linear** activation function  
    * Binary classification requires a layer with **1 neuron** and a **sigmoid** activation function
    * Multiclass classification requires a layer with as many neuron as there are classes, and a **softmax** activation function

# A full example using the MNIST digits dataset


### Dataset

<img src="figures/mnist.png" style="padding:10px;width:800px;" align="left"/>

<span style="color:teal">**Today's dataset:** </span> Today we will use a classic data, the "Hello World" of Deep-Learning: <a href="https://paperswithcode.com/dataset/mnist"> The MNIST Hand Written Digits dataset</a>. It was introduced by **Yann LeCun**, and contains over 60000 training images and 10000 test images.


In [None]:
from tensorflow.keras.datasets.mnist import load_data

(X_train, y_train), (X_test, y_test) = load_data()

In [None]:
import matplotlib.pyplot as plt

first_image = X_train[0]
plt.imshow(first_image, cmap='gray');

In [None]:
X_train.shape

## Reshape array and normalize

In [None]:
X_train_norm = X_train.reshape(X_train.shape[0], 28*28)/255
X_test_norm = X_test.reshape(X_test.shape[0], 28*28)/255

## Transform `y's` to categorical

In [None]:
from keras.utils import to_categorical

y_train_oh = to_categorical(y_train, 10)
y_test_oh = to_categorical(y_test, 10)

## Building ou model

Let's build our model! Our model will contain the following:

0️⃣ an input layer of `28*28` (784) neurons

1️⃣ a first hidden layer of **100 fully-connected neurons** and a **ReLu activation** function

2️⃣+3️⃣ Two more hidden layers (**30 & 10 fully-connected neurons**) with **ReLu activation**

4️⃣ a final layer appropriate for our multiclass classification task (10 tasks => **10 neurons** with `softmax` activation)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

# Model definition
model = Sequential()
model.add(layers.Dense(100, activation='relu', input_dim=784))
model.add(layers.Dense(30, activation='relu'))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()



# Compiling and Training our Neural Network
<br>

<center><img src="figures/DALLE_train.png" style="width:900px;">
 © Cédric John, 2022; Image generated with <a href="https://openai.com/blog/dall-e/">DALL-E</a>
<br>Prompt: <i>A neural network being trained, digital art</i>.</center>

## Compiling your model

Deep-Learning models need to be `compiled` with `TensorFlow` before we can `fit` them. When we do this, `TensorFlow` builds an optimized calculation graph for our deep-learning model. The minimum parameters we need to specify are:

🛵 What <span style="color:red">**Optimizer**</span> we want to use to solve our loss function (i.e. a flavour of gradient descent or SGD)

🌡️ What <span style="color:Blue">**loss function**</span> we will use:  common choices include **MAE** and **MSE** (regression), **binary crossentropy** (binary classification), or **categorical crossentropy** (multiclass classification)

📐 What <span style="color:teal">**metric**</span> we will use to evaluate our model during training (using the validation set)

## Optimizer (solvers) for Deep-Learning

<img src="figures/solvers.gif" style="width:1000px">
<a href="https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c">Jiang,L., 2020</a>: link contains really cool animations of Gradient Descent!

⛰️ Many flavours of **gradient descent** for deep-learning: *Momentum*, *AdaGrad*, *RMSProp* and *Adam*

🚠 For all intent and purposes, ***Adam*** (**Ada**ptive **M**oment Estimation) combines the best of all of the previous generations of solvers and is the <span style="color:blue">GoTo</span> solver

🫶 Much more on this in the **Deep-Learning Module** and **Opitimization Module**

In [None]:
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy', 
    metrics = 'Accuracy')

## Fitting the model

Once the model is compiled, we can `fit()` it in a manner similar to `scikit-learn`. The parameters we need here are:

1️⃣ Specify the features (`X_train_norm`)

2️⃣ Specify the labels  (`y_train_oh`)

3️⃣ Specify what portion (if any) of the training set should be used for validation

4️⃣ Specify the batch size, i.e. how many images should the network try to fit at once. Typical batch sizes are 8, 16, 32, and sometimes 64. Larger batch sizes.

5️⃣ Specify how often the full dataset needs to be used in training, i.e. how many  `epochs` to use

## How are the weights and bias of the model adjusted?
<br>
<img src="figures/backpropagation.gif" syle="width:1600px">
<a href="https://machinelearningknowledge.ai/">MLK, date unknown</a>

* Uses derivative of gradients to attribute error for each weights of the network
* Computationally efficient as the partial derivatives are computed only once per epoch (during forward pass)
* **You will see details** during the **Deep-Learning module**

#### Let's fit and train the model!

* This will take a little while (so we limit oursleves to 8 epochs here).
* We also save the training data into an object as we will want to inspect this later

In [None]:
training_data = model.fit(X_train_norm, 
                       y_train_oh, 
                       batch_size=256, 
                       epochs=8, 
                       validation_split=0.2)

##  Evaluating our model

We can test how well our model works by using the `.evaluate` method on the test set:


In [None]:

model.evaluate(X_test_norm, y_test_oh) # N.B. The X_test needs to be prepared as the X_train but you have already scaled it



## Using the training curve

The training curve is used a lot in **Deep-Learning** to evaluate models:

In [None]:
# Our training data contains a history of training. We can plot it!
history = training_data.history

In [None]:
history.keys()

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1,1, figsize=(20, 12))
ax.plot(history['loss'], label='Training loss');ax.plot(history['val_loss'], label='Validation loss')
ax.set_xlabel('nb Epochs', size=16); ax.set_ylabel('Loss', size=16); plt.legend();

##  Specialized layers and conveniences in `TensorFlow`

🏺 `TensorFlow` comes loaded with a lot of convenient functions and classes to process data, build, and train neural networks


🌟 In the coming weeks, you will learn about some of those in `PyTorch`. Keep in mind that for most (if not all) of them there is a `TensorFlow` equivalent

🌼 These include specialized layers such as `convolution`, `dropout`, and `maxpool` 

🐱‍💻 But also facilities to load images, automatically perform data augmentation, and much more. Read the <a href="https://www.tensorflow.org/api_docs/python/tf/keras">tensorflow.keras documentation</a> for more info!

🍀 As an example, we can redo our exerice but rather than manually prepare our digits, we can add a `Flatten` and a `Normalize` layer in our architecture:

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers

# Model definition
model = Sequential()
model.add(layers.Flatten(input_shape=(28,28)))
model.add(layers.Rescaling(scale=1./255.))
model.add(layers.Dense(100, activation='relu'))
model.add(layers.Dense(30, activation='relu'))
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.summary()

### Compile our new model

In [None]:
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy', 
    metrics = 'Accuracy')

### Will it run???

In [None]:


training_data = model.fit(X_train, 
                       y_train_oh, 
                       batch_size=256, 
                       epochs=2, 
                       validation_split=0.2)

# Suggested Resources

## 📺 Videos 
#### Short videos from my Undegraduate Machine Learning Classes:
* 📼 <a href="https://youtu.be/-ohZINc7OCY?list=PLZzjCZ3QdgQCcRIwQdd-_cJNAUgiEBB_n">Introduction to Neural Networks</a>
* 📼 <a href="https://youtu.be/A1-HocOPXms?list=PLZzjCZ3QdgQCcRIwQdd-_cJNAUgiEBB_n">Convolutional Neural Networks</a>

#### Others
* 📼 <a href="https://youtu.be/aircAruvnKk">But what is a neural network? | Chapter 1, Deep learning</a>, by 3Blue1Brown
* 📼 <a href="https://youtu.be/04L4ZHiJbjs">TensorFlow & Keras Tutorial 2022 | Deep Learning With TensorFlow & Keras</a>, by Simplilearn (> 7 hours of video, full course!)




## 📚 Further Reading 
* 📖 <a href="https://towardsdatascience.com/machine-learning-for-beginners-an-introduction-to-neural-networks-d49f22d238f9">Machine Learning for Beginners: An Introduction to Neural Networks</a> by Victor Zhou, 2019
* 📖 <a href="https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/">
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition</a> Aurélien Géron, 2019