## Questions

- What is deep learning?
- What is a neural network?
- Which operations are performed by a single neuron?
- How do neural networks learn?
- When does it make sense to use and not use deep learning?
- What are tools involved in deep learning?
- What is the workflow for deep learning?
- Why did we choose to use Keras in this lesson?

## Objectives

- Define deep learning
- Describe how a neural network is build up
- Explain the operations performed by a single neuron
- Describe what a loss function is
- Recall the sort of problems for which deep learning is a useful tool
- List some of the available tools for deep learning
- Recall the steps of a deep learning workflow
- Test that you have correctly installed the Keras, Seaborn and scikit-learn libraries


## What is Deep Learning?


### Deep Learning, Machine Learning and Artificial Intelligence

Deep learning (DL) is just one of many techniques collectively known as machine learning. Machine learning (ML) refers to techniques where a computer can "learn" patterns in data, usually by being shown numerous examples to train it. People often talk about machine learning being a form of artificial intelligence (AI). Definitions of artificial intelligence vary, but usually involve having computers mimic the behaviour of intelligent biological systems. Since the 1950s many works of science fiction have dealt with the idea of an artificial intelligence which matches (or exceeds) human intelligence in all areas. Although there have been great advances in AI and ML research recently we can only come close to human like intelligence in a few specialist areas and are still a long way from a general purpose AI.
The image below shows some differences between artificial intelligence, machine learning and deep learning.


![](https://github.com/carpentries-lab/deep-learning-intro/raw/main/episodes/fig/01_AI_ML_DL_differences.png){
alt='An infographic showing the relation of artificial intelligence, machine learning, and deep learning. Deep learning is a specific subset of machine learning algorithms. Machine learning is one of the approaches to artificial intelligence.'
width='60%'
}


#### Neural Networks

A neural network is an artificial intelligence technique loosely based on the way neurons in the brain work.
A neural network consists of connected computational units called **neurons**.
Let's look at the operations of a single neuron.

##### A single neuron
 Each neuron ...

- has one or more inputs ($x_1, x_2, ...$), e.g. input data expressed as floating point numbers
- most of the time, each neuron conducts 3 main operations:
  + take the weighted sum of the inputs where ($w_1, w_2, ...$) indicate weights
  + add an extra constant weight (i.e. a bias term) to this weighted sum
  + apply an **activation function** to the output so far, we will explain activation functions
- return one output value, again a floating point number.
- one example equation to calculate the output for a neuron is: $output = Activation(\sum_{i} (x_i*w_i) + bias)$


![](https://github.com/carpentries-lab/deep-learning-intro/raw/main/episodes/fig/01_neuron.png){alt='A diagram of a single artificial neuron combining inputs and weights using an activation function.' width='600px'}

##### Activation functions
The goal of the activation function is to convert the weighted sum of the inputs to the output signal of the neuron.
This output is then passed on to the next layer of the network.
There are many different activation functions, 3 of them are introduced in the exercise below.

## Challenge: Activation functions

Look at the following activation functions:

**A. Sigmoid activation function**
The sigmoid activation function is given by:
$$ f(x) = \frac{1}{1 + e^{-x}} $$

![](https://github.com/carpentries-lab/deep-learning-intro/raw/main/episodes/fig/01_sigmoid.svg){alt='Plot of the sigmoid function' width='70%' align='left'}
<br clear="all" />

**B. ReLU activation function**
The Rectified Linear Unit (ReLU) activation function is defined as:
$$ f(x) = \max(0, x) $$

This involves a simple comparison and maximum calculation, which are basic operations that are computationally inexpensive.
It is also simple to compute the gradient: 1 for positive inputs and 0 for negative inputs.

![](https://github.com/carpentries-lab/deep-learning-intro/raw/main/episodes/fig/01_relu.svg){alt='Plot of the ReLU function'  width='70%' align='left'}
<br clear="all" />

**C. Linear (or identity) activation function (output=input)**
The linear activation function is simply the identity function:
$$ f(x) = x $$

![](https://github.com/carpentries-lab/deep-learning-intro/raw/main/episodes/fig/01_identity_function.svg){alt='Plot of the Identity function'  width='70%' align='left'}
<br clear="all" />


Combine the following statements to the correct activation function:

1. This function enforces the activation of a neuron to be between 0 and 1
2. This function is useful in regression tasks when applied to an output neuron
3. This function is the most popular activation function in hidden layers, since it introduces non-linearity in a computationally efficient way.
4. This function is useful in classification tasks when applied to an output neuron
5. (optional) For positive values this function results in the same activations as the identity function.
6. (optional) This function is not differentiable at 0
7. (optional) This function is the default for Dense layers (search the Keras documentation!)

*Activation function plots by Laughsinthestocks - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=44920411,
https://commons.wikimedia.org/w/index.php?curid=44920600, https://commons.wikimedia.org/w/index.php?curid=44920533*

##### Combining multiple neurons into a network
Multiple neurons can be joined together by connecting the output of one to the input of another. These connections are associated with weights that determine the 'strength' of the connection, the weights are adjusted during training. In this way, the combination of neurons and connections describe a computational graph, an example can be seen in the image below.

In most neural networks, neurons are aggregated into layers. Signals travel from the input layer to the output layer, possibly through one or more intermediate layers called hidden layers.
The image below shows an example of a neural network with three layers, each circle is a neuron, each line is an edge and the arrows indicate the direction data moves in.

![
Image credit: Glosser.ca, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons, 
[original source](https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg)
](fig/01_neural_net.png){
alt='A diagram of a three layer neural network with an input layer, one hidden layer, and an output layer.'
}

## Challenge: Neural network calculations

.

#### 1. Calculate the output for one neuron
Suppose we have:

- Input: X = (0, 0.5, 1)
- Weights: W = (-1, -0.5, 0.5)
- Bias: b = 1
- Activation function _relu_: `f(x) = max(x, 0)`

What is the output of the neuron?

_Note: You can use whatever you like: brain only, pen&paper, Python, Excel..._

#### 2. (optional) Calculate outputs for a network

Have a look at the following network where:

* $X_1$ and $X_2$ denote the two inputs of the network.
* $h_1$ and $h_2$ denote the two neurons in the hidden layer. They both have ReLU activation functions.
* $h_1$ and $h_2$ denotes the output neuron. It has a ReLU activation function.
* The value on the arrows represent the weight associated to that input to the neuron.
* $b_i$ denotes the bias term of that specific neuron
![](https://github.com/carpentries-lab/deep-learning-intro/raw/main/episodes/fig/01_xor_exercise.png){alt='A diagram of a neural network with 2 inputs, 2 hidden layer neurons, and 1 output.' width='400px'}

a. Calculate the output of the network for the following combinations of inputs:

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | ..|
| 0  | 1  | ..|
| 1  | 0  | ..|
| 1  | 1  | ..|

b. What logical problem does this network solve?

In [None]:
import tensorflow
print(tensorflow.__version__)

You should get a version number reported. At the time of writing 2.17.0 is the latest version.

## Testing Seaborn Installation
Lets check you have a suitable version of seaborn installed.
In your Jupyter notebook or interactive python console run the following commands:

In [None]:
import seaborn
print(seaborn.__version__)

You should get a version number reported. At the time of writing 0.13.2 is the latest version.

## Testing scikit-learn Installation
Lets check you have a suitable version of scikit-learn installed.
In your Jupyter notebook or interactive python console run the following commands:

In [None]:
import sklearn
print(sklearn.__version__)

You should get a version number reported. At the time of writing 1.5.1 is the latest version.


## Keypoints

- Machine learning is the process where computers learn to recognise patterns of data.
- Artificial neural networks are a machine learning technique based on a model inspired by groups of neurons in the brain.
- Artificial neural networks can be trained on example data.
- Deep learning is a machine learning technique based on using many artificial neurons arranged in layers.
- Neural networks learn by minimizing a loss function.
- Deep learning is well suited to classification and prediction problems such as image recognition.
- To use deep learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, refine the model, measuring performance before we can classify data.
- Keras is a deep learning library that is easier to use than many of the alternatives such as TensorFlow and PyTorch.