![Practicum AI Logo image](images/practicum_ai_logo.png) <img src='images/practicumai_deep_learning.png' alt='Practicum AI: Deep Learning Foundations icon' align='right' width=50>


***
# *Practicum AI:* Deep Learning - Perceptron


> This exercise adapted from the [W3 Schools Perceptrons](https://www.w3schools.com/ai/ai_perceptrons.asp) article and from Baig et al. (2020) The Deep Learning Workshop from [Packt Publishers](https://www.packtpub.com/product/the-deep-learning-workshop/9781839219856) (Exercise 2.01, page 55).

<img alt="A cartoon of Dr. Amelia, a nutrition researcher, sitting at a computer thinking about food items which appear in a thought bubble." src="images/DrAmelia.jpg" align="right" width=250>Amelia is back! This time, she needs your help to analyze some of her survey data. As part of Amelia's dietary study, participants are also asked to follow a special nutrition plan, the **Dr. Amelia Recommended Nutrition Plan (the DARN Plan)**. We'll use a simple [perceptron](https://developers.google.com/machine-learning/glossary#perceptron) to predict if participants follow the DARN Plan. 

**Note:** Dr. Amelia's cartoon was generated with AI's assistance.
 
As a note, this exercise lies somewhere between coding everything from scratch and relying on the pre-coded APIs (Application Programming Interfaces) that underlie the power of TensorFlow, Keras, and Pytorch. **You will not need to create weight tensors beyond this exercise**. Still, hopefully, by doing it this time, you will have a better understanding (*and appreciation*) of the details often lost in an API call to `model.fit()`, for example.

The table below shows some data Amelia has gathered from participant surveys about their nutrition. She is looking at how different factors predict if participants follow her DARN Plan ($y$, the output or [labels](https://developers.google.com/machine-learning/glossary#label) in our example) based on three input variables: if participants submit photos of three meals a day ($x_1$), if participants report being satisfied with their food choices ($x_2$), and if participants report being generally happy ($x_3$). We will combine $x_1$, $x_2$, and $x_3$ into our input tensor $X$. Here, we are simplifying the question of the likelihood of following the DARN Plan to a Yes/No. 

Case # | Photos of 3 meals submitted? ($x_1$) | Satisfied with food choices? ($x_2$) | Generally happy? ($x_3$) | Following the DARN Plan? ($y$)
--|--------------------------|---------------------|-----------------------|----------------
1 | 1 (Yes) | 1 (Yes) | 1 (Yes) | Yes (1)
2 | 0 (No) | 1 (Yes) | 1 (Yes) | Yes (1)
3 | 1 (Yes) | 0 (No) | 1 (Yes) | Yes (1)
4 | 0 (No) | 0 (No) | 1 (Yes) | Yes (1)
5 | 1 (Yes) | 1 (Yes) | 0 (No) | Yes (1)
6 | 0 (No) | 1 (Yes) | 0 (No) | No (0)
7 | 1 (Yes) | 0 (No) | 0 (No) | No (0)
8 | 0 (No) | 0 (No) | 0 (No) | No (0)


## 1. Import libraries

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'> Note

> * We'll probably stop reminding you after this, but... remember not all red output is bad!
> * Also, remember to check that the correct kernel is selected.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt

## 2. Create an input data matrix

Create a 3 x 8 matrix for our input data. Remember that we have three input variables (we'll call them $x_1$, $x_2$, and $x_3$ for now). These variables are the columns in our input data.

The matrix below has the three input columns of our data table, using just the 0/1 values corresponding to the no/yes entries in the table. The comments help match rows of the table with entries in our `X` variable. (Remember, we are using the capital letter `X` as our variable name here to remind us that this is a matrix with our input data).

In [2]:
X = torch.tensor([[1.,1.,1.], # Case 1
                 [0.,1.,1.], # Case 2
                 [1.,0.,1.], # Case 3
                 [0.,0.,1.], # Case 4
                 [1.,1.,0.], # Case 5
                 [0.,1.,0.], # Case 6
                 [1.,0.,0.], # Case 7
                 [0.,0.,0.]], # Case 8
                 dtype=torch.float32)  # 8x3, input data table
print(X)

tensor([[1., 1., 1.],
        [0., 1., 1.],
        [1., 0., 1.],
        [0., 0., 1.],
        [1., 1., 0.],
        [0., 1., 0.],
        [1., 0., 0.],
        [0., 0., 0.]])


## 3. Create a label tensor

Create a tensor of labels to hold our 'ground truth'. This indicates, for each set of input, whether or not the participant is following the DARN Plan. 

```python
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8 - one for each case in the table         
y = torch.tensor([1, 1, 1, 1, 1, 0, 0, 0], dtype=torch.float32) 

# Reshape to be 8 rows of 1 column  
y = y.reshape(8, 1) 
print(y)
```

In [3]:
# Code it!
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8 - one for each case in the table         
y = torch.tensor([1, 1, 1, 1, 1, 0, 0, 0], dtype=torch.float32) 

# Reshape to be 8 rows of 1 column  
y = y.reshape(8, 1) 
print(y)

tensor([[1.],
        [1.],
        [1.],
        [1.],
        [1.],
        [0.],
        [0.],
        [0.]])


## 4. Define some constants to set the shape of the weight matrix

Define two constants to be used in the next step when we define the connections weight matrix.

We can use the number of columns in the X table to determine the number of features or how many $x_i$ we have and, therefore, how many weights we need to store (one for each feature). We only need one output value since we are looking for a binary decision about plan adherence (Yes/No).

```python
num_features = X.shape[1]
output_size = 1
```

In [4]:
# Code it!
num_features = X.shape[1]
output_size = 1

***

## 5. Define connections weight matrix

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. The perceptron body multiplies the inputs by the weights and sums them and the bias, resulting in the output--whether or not the participant is following the DARN Plan. The three weights are highlighted here.](images/02_perceptron_section5.png)

In our weight matrix, we will need one weight for each feature, $x_i$ (three photos submitted, satisfied with food choices, etc.), labeled $X$. These weights are our $w_i$. We don't know what value they should take so we will initialize them with a random, positive number - this is one reason different runs of model training may give different answers. Another common option is to use 0 to initialize the weights, though that can have issues in training. 

The `requires_grad=True` part indicates that these parameters need to be updated using the [**gradients**](https://developers.google.com/machine-learning/glossary#gradient) during [**back propagation**](https://developers.google.com/machine-learning/glossary#backpropagation) when training the model.

```python
W = torch.rand(num_features, output_size, requires_grad=True)
print(W)
```

In [6]:
# Code it!
W = torch.rand(num_features, output_size, requires_grad=True)
print(W)

tensor([[0.6634],
        [0.8432],
        [0.6727]], requires_grad=True)


***

## 6. Define bias variable

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. This is similar to the above image, but is highlighting the bias term](images/02_perceptron_section6.png)

Since we only have one neuron, we only need one bias value. Again, we'll initialize it to a random number - 0 would be another option here. We can write each bias term as $b_i$ and the matrix of all biases as $B$.

```python
B = torch.randn(output_size, 1, requires_grad=True)
print(B)
```

In [7]:
# Code it!
B = torch.randn(output_size, 1, requires_grad=True)
print(B)

tensor([[-1.6720]], requires_grad=True)


***

## 7. Define a perceptron function

![Diagram of the perceptron with 3 input variables (x1, x2, x3), 3 weights (w1, W2, w3) and the bias term. This is similar to the above image, but is highlighting the perceptron body.](images/02_perceptron_section7.png)


In the following code block, we define a perceptron function with one input argument, $X$, containing our three input data features. 

The function's first line implements a net input function.  It multiplies the input data matrix ($X$) by the weights ($W$) using the matrix multiplication function (torch.matmul).  It then adds the bias ($B$) value to that product.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'>Note
> This is the essential function of a neuron: gather the inputs, multiply each input by the weight for that input, add the products up and add in the bias.

The function's second line implements an activation function. The activation function determines how the neuron's output (calculated above) is changed before passing it on. Here, we use the `tanh` activation function.  However, there are other PyTorch options.  For example, you could use the `torch.sigmoid` function.  Or, select a function from the PyTorch `nn` library.  Search the [PyTorch documentation](https://docs.pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) for a complete list of available functions.

Try out these other options, retrain the network, and see what happens.

```python
output = torch.sigmoid(z)
output = torch.relu(z)
output = z  # linear activation
```

In [8]:
def perceptron(X):
    z = torch.matmul(X, W) + B
    output = torch.tanh(z)        # Activation function
    return output

Execute the perceptron function to see its initial predictions before any training.

In [10]:
# Execute the perceptron to see its initial predictions before training.
print(perceptron(X))       

tensor([[ 0.4678],
        [-0.1549],
        [-0.3238],
        [-0.7613],
        [-0.1639],
        [-0.6798],
        [-0.7652],
        [-0.9318]], grad_fn=<TanhBackward0>)


If our model is doing well, the preditions would correspond with the known `y` values: [1, 1, 1, 1, 1, 0, 0, 0]. It's very unlikely that this will be the case, since we haven't trained the model yet!

## 8. Training the Perceptron

Now that we have the elements of a simple, single-node perceptron in place, let's train the network using an algorithm called "stochastic gradient descent" (SGD). The purpose of SGD is to iteratively adjust the weights and bias parameters of the single neuron in our model and eventually, we hope, find values that make our neuron's predictions as good as possible. PyTorch implements this algorithm for us, so we don't need to code it ourselves.

The [learning rate](https://developers.google.com/machine-learning/glossary#learning-rate) determines the size of the steps taken towards the global minimum. Here, the Stochastic Gradient Descent (SGD) optimizer has been selected.

In [11]:
learning_rate = 0.01
optimizer = optim.SGD([W, B], lr=learning_rate)

## 9. Train the perceptron for 1000 epochs

An [epoch](https://developers.google.com/machine-learning/glossary#epoch) is a complete training pass over the entire dataset. Our loss or error function is defined using PyTorch's binary cross entropy with logits loss function. We use the `nn.BCEWithLogitsLoss` function, an appropriate choice for this application, to calculate how far our predicted results are from the known results. We will not get into the technical details here as that is outside the scope of this learning experience. Our SGD optimizer seeks to minimize the model's total error.

### <img src='images/note_icon.svg' width=40, align='center' alt='Note icon'>Note
> The code below uses a `for` loop. This common programming construct allows you to loop, or iterate, through
> a list of items (the numbers 0 to 999 in our case). *Implicitly*, training will use `for` loops - for each epoch, do 
> this thing. *Explicitly*, however, after this notebook, we will use the API that automatically does this for us.
> Thus we dropped coverage of `for` loops and other "flow control" methods from the *Python for AI* course. It's
> helpful to know about them, but are less common as APIs develop.
> [Click here for more details](https://wiki.python.org/moin/ForLoop).

In [12]:
no_of_epochs = 1000
criterion = nn.BCEWithLogitsLoss()

for n in range(no_of_epochs):
    # Zero the gradients
    optimizer.zero_grad()
    
    # Forward pass--make predictions
    predictions = perceptron(X)
    
    # Calculate loss
    loss = criterion(predictions, y)
    
    # Backward pass--calculate gradients
    loss.backward()
    
    # Update weights using the gradients
    optimizer.step()

## 10. Print the weights
<img alt="AI Generated cartoon of happy people eating healthy food." src="images/happy_people.jpg" align="right" width="300">

Notice that the model has learned that the general happiness of a participant is the best predictor of whether or not they are following the DARN Plan! The value, the 3rd weight has the largest value.

Given that the input from each feature will be a 0 or a 1, multiplying by a larger weight will increase the contribution of that feature in the summation of all input-by-weight products ($x_i * w_i$) in determining the output of the neuron.

The perceptron has learned how to take the three input variables and weigh them to predict the output. 

**Note:** The image was generated with AI's assistance.

```python
print(W)
```

In [13]:
# Code it!
print(W)

tensor([[0.8934],
        [0.9721],
        [1.7131]], requires_grad=True)


## 11. Print the bias

```python
print(B)
```

In [14]:
# Code it!
print(B)

tensor([[-1.2701]], requires_grad=True)


## 12. Test the perceptron

The numbers in the output tensor reflect the perceptron's predictions for each input case. These are not probabilities but the **model's estimate of the output value**. We could set a threshold value and conclude the participant is following the DARN Plan when the value exceeds some number.

```python
print(perceptron(X))
```

In [15]:
# Code it!
print(perceptron(X))

tensor([[ 0.9804],
        [ 0.8886],
        [ 0.8708],
        [ 0.4161],
        [ 0.5338],
        [-0.2895],
        [-0.3599],
        [-0.8538]], grad_fn=<TanhBackward0>)


### Print things more clearly

Let's bring the `X`, `y` and predictions together to make it easier to read. Remember that `Yes=1` and `No=0` in the table.

In [16]:
X_df = pd.DataFrame(X.detach().numpy(), columns=['Photos of 3 meals submitted?', 'Satisfied with food choices?', 'Generally happy?'])
y_df = pd.DataFrame(y.detach().numpy(), columns=['Following the DARN Plan?'])
pred_df = pd.DataFrame(perceptron(X).detach().numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df

Unnamed: 0,Photos of 3 meals submitted?,Satisfied with food choices?,Generally happy?,Following the DARN Plan?,Predictions
0,1.0,1.0,1.0,1.0,0.980428
1,0.0,1.0,1.0,1.0,0.888568
2,1.0,0.0,1.0,1.0,0.870797
3,0.0,0.0,1.0,1.0,0.416107
4,1.0,1.0,0.0,1.0,0.533754
5,0.0,1.0,0.0,0.0,-0.289494
6,1.0,0.0,0.0,0.0,-0.35986
7,0.0,0.0,0.0,0.0,-0.85383


## 13. Let's see how different choices would change the results

### Did we run enough epochs?

You may want to increase the number of epochs used.

### Change the outcomes

Usually, we don't change the data we are working with, but in this example, we do this so that you can see the link between the input data and the weights learned. Let's change the participant outcomes and see what happens to the learned weights and predictions.

#### Change 1: participants are more likely to follow the DARN Plan when they like the food choices:

`y = tf.Variable([1, 1, 0, 0, 1, 0, 0, 0], dtype = tf.float32)`

#### Change 2: participants are more likely to follow the DARN Plan when they regularly submit three photos a day:

`y = tf.Variable([1, 0, 1, 0, 1, 0, 1, 0], dtype = tf.float32)`

Feel free to play with other parts of the model; everything but the X inputs is replicated below to put it all in one place for easy reference. Comments point out hyperparameters that you might want to change.

In [17]:
## From step 3
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8 - one for each case in the table         
y = torch.tensor([1, 1, 0, 0, 1, 0, 0, 0], dtype=torch.float32)  # Change 1 has been made, you'll need to make change 2
y = y.reshape(8, 1)  # convert to 8x1

## From step 4
num_features = X.shape[1]
output_size = 1

## From step 5
W = torch.zeros(num_features, output_size, requires_grad=True)

## From step 6
B = torch.zeros(output_size, 1, requires_grad=True)

## From step 7
def perceptron(X):
    z = torch.matmul(X, W) + B.T      
    output = torch.tanh(z)                  # Activation function is a good hyperparameter to change 
    return output

## From step 8
learning_rate = 0.01  # Learning rate is a good hyperparameter to change
optimizer = optim.SGD([W, B], lr=learning_rate)

## From step 9
no_of_epochs = 1000  # Number of epochs is a good hyperparameter to change
criterion = nn.BCEWithLogitsLoss()

for n in range(no_of_epochs):
    optimizer.zero_grad()
    predictions = perceptron(X)
    loss = criterion(predictions, y)
    loss.backward()
    optimizer.step()
    
## From steps 10 on, printing the output
print(f'Weights: {W}')
print(f'Bias: {B}')

X_df = pd.DataFrame(X.detach().numpy(), columns=['Photos of 3 meals submitted?', 'Satisfied with food choices?', 'Generally happy?'])
y_df = pd.DataFrame(y.detach().numpy(), columns=['Following the DARN Plan?'])
pred_df = pd.DataFrame(perceptron(X).detach().numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df

Weights: tensor([[0.1511],
        [0.9819],
        [0.1511]], requires_grad=True)
Bias: tensor([[-0.7947]], requires_grad=True)


Unnamed: 0,Photos of 3 meals submitted?,Satisfied with food choices?,Generally happy?,Following the DARN Plan?,Predictions
0,1.0,1.0,1.0,1.0,0.453712
1,0.0,1.0,1.0,1.0,0.325948
2,1.0,0.0,1.0,0.0,-0.456218
3,0.0,0.0,1.0,0.0,-0.567347
4,1.0,1.0,0.0,1.0,0.325948
5,0.0,1.0,0.0,0.0,0.185055
6,1.0,0.0,0.0,0.0,-0.567347
7,0.0,0.0,0.0,0.0,-0.661051


## Before continuing
###  <img src='images/alert_icon.svg' alt="Alert icon" width=40 align=center> Alert!
> Before continuing to another notebook within the same Jupyter session,
> use the **"Running Terminals and Kernels" tab** (below the File Browser tab) to **shut down this kernel**. 
> This will free up this notebook's GPU memory, making it available for
> your next notebook.
>
> Every time you run multiple notebooks within a Jupyter session with
> a GPU, this should be done.

----
## Push changes to GitHub <img src="images/push_to_github.png" alt="Push to GitHub icon" align="right" width=150>

 Remember to **add**, **commit**, and **push** the changes you have made to this notebook to GitHub to keep your repository in sync.

In Jupyter, those are done in the git tab on the left. In Google Colab, use File > Save a copy in GitHub.
