![Practicum AI Logo image](https://github.com/PracticumAI/practicumai.github.io/blob/main/images/logo/PracticumAI_logo_250x50.png?raw=true) <img src='https://github.com/PracticumAI/practicumai.github.io/blob/main/images/icons/practicumai_beginner.png?raw=true' align='right' width=50 padding=50>
***
# *Practicum AI:* Deep Learning - Perceptron

> This exercise adapted from the [W3 Schools Perceptrons](https://www.w3schools.com/ai/ai_perceptrons.asp) article and from Baig et al. (2020) The Deep Learning Workshop from [Packt Publishers](https://www.packtpub.com/product/the-deep-learning-workshop/9781839219856) (Exercise 2.01, page 55).

 Amelia is back! This time, she's been "asked" to pick the next best day to host the picnic... We'll use a simple [perceptron](https://developers.google.com/machine-learning/glossary#perceptron) to predict when is the right day for the picnic based on certain conditions. As a note, this exercise lies somewhere between coding everything from scratch and relying on the pre-coded APIs (Application Programming Interfaces) that underlie the power of TensorFlow, Keras, and Pytorch. You will not need to create weight tensors beyond this exercise, but hopefully by doing it this time, you will have a better understanding (and appreciation) of the details that are often lost in an API call to `model.fit()`, for example.

The table below shows some data Amelia's gathered about previous Annual Picnics. She's looking at how different factors affected faculty turnout ($y$, the output or [labels](https://developers.google.com/machine-learning/glossary#label) in our example) based on three input variables, if rain was predicted ($x_1$), if there was a guest speaker ($x_2$), and if a lakefront location was available ($x_3$). Together, $x_1$, $x_2$, and $x_3$ will be combined into our input tensor $X$.

Case # | Rain Predicted? ($x_1$) | Speaker Announced? ($x_2$) | Lakefront Available? ($x_3$) | Turnout > 70% ($y$)
--|--------------------------|---------------------|-----------------------|----------------
1 | 1 (Yes) | 1 (Yes) | 1 (Yes) | Yes (1)
2 | 0 (No) | 1 (Yes) | 1 (Yes) | Yes (1)
3 | 1 (Yes) | 0 (No) | 1 (Yes) | Yes (1)
4 | 0 (No) | 0 (No) | 1 (Yes) | Yes (1)
5 | 1 (Yes) | 1 (Yes) | 0 (No) | Yes (1)
6 | 0 (No) | 1 (Yes) | 0 (No) | No (0)
7 | 1 (Yes) | 0 (No) | 0 (No) | No (0)
8 | 0 (No) | 0 (No) | 0 (No) | No (0)


In [None]:
import tensorflow as tf
import pandas as pd
from tensorflow.keras import activations

from matplotlib import pyplot as plt

#### 2. Create an input data matrix

Create a 3 x 8 matrix for our input data. Remember that we have three inputs (we'll call them $x_1$, $x_2$, and $x_3$ for now), these are the columns in our input data.

The matrix below has the three input columns of our data table, using just the 0/1 values corresponding to the no/yes entries in the table. The comments help line up rows of the table with entries in our X variable.

```python
X = tf.Variable([[1.,1.,1.], # Case 1
                 [0.,1.,1.], # Case 2
                 [1.,0.,1.], # Case 3
                 [0.,0.,1.], # Case 4
                 [1.,1.,0.], # Case 5
                 [0.,1.,0.], # Case 6
                 [1.,0.,0.], # Case 7
                 [0.,0.,0.]], # Case 8
                 dtype = tf.float32)  # 3x8, input data table
print(X)
```

In [None]:
# Code it!


#### 3. Create a label tensor
Create a tensor of labels to hold our 'ground truth'. This is the decision for each set of input whether or not turnout at previous events was greater than 70%. 

```python
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8--one for each case in the table         
y = tf.Variable([1, 1, 1, 1, 1, 0, 0, 0], dtype = tf.float32) 

y = tf.reshape(y, [8,1])  
print(y)
```

In [None]:
# Code it!


#### 4. Define some constants to set shape of weight matrix
Define two constants to be used in the next step when we define the connections weight matrix.
We can use the number of columns in the X table to determine the number of features or how many $x_i$'s we need in our model. Since we are looking for a binary decision (Yes/No) we only need one output value.

```python
NUM_FEATURES = X.shape[1]
OUTPUT_SIZE = 1
```

In [None]:
# Code it!


***

#### 5. Define connections weight matrix

We will need one weight for each feature, $x_i$ (rain predicted, is a speaker announced, etc.), in our feature matrix, labelled $X$. These weights are our $w_i$'s. We don't know what value they should take, so we can initialize them to 0. Another common option is to use a random number to initialize the weights--this is one reason different runs of model training may give different answers.

```python
W = tf.Variable(tf.zeros([NUM_FEATURES, OUTPUT_SIZE]), dtype = tf.float32)
print(W)
```

In [None]:
# Code it!


***

#### 6. Define bias variable

![](images/02.1_image_6.jpg)

Since we only have one neuron, we only need one bias value. Again, we'll initialize it to 0--a random number would be another option here. Again, each bias can be written as $b_i$ and the matrix of all biases as $B$.

```python
B = tf.Variable(tf.zeros([OUTPUT_SIZE, 1]), dtype = tf.float32)
print(B)
```

In [None]:
# Code it!


***

#### 7. Define a perceptron function

In this code block, we define a perceptron function, with one input argument, X, containing our three input data features. 

The function's first line implements a net input function.  It multiplies the input data matrix (X) by the weights (W) using the matrix multiplication function (matmul).  It then adds the bias (B) value to that product.

<div style="padding: 10px;margin-bottom: 20px;border: thin solid  #65BB7B;border-left-width: 10px;background-color: #fff"><strong>Note:</strong> This is the essential function of a neuron: gather the inputs, multiply each input by the weight for that input, add the products up and add in the bias.</div>

The function's second line implements an activation function. This determines how, if at all, the output of the neuron (calculated above) is changed before passing it on. Here we use the `tanh` activation function.  However, there are other TensorFlow options.  For example, you could use the `tf.sigmoid` function.  Or, select a function from the Keras activation (`activations`) library.  Do a search of the Keras documentation for a complete list of available functions.

Try out these other options, retrain the network, and see what happens.

```python
output = tf.sigmoid(z)
output = activations.relu(z)
output = activations.linear(z)
```

Here is the code for the perceptron function:

```python   
def perceptron(X):
    z = tf.add(tf.matmul(X, W), B)  # Net input function
    output = tf.tanh(z)             # Activation function
    return output
```

In [None]:
# Code it!



Execute the perceptron function to see its initial predictions before any training.  All of its predictions ought to be 0 (remember we set all the weights and the bias to 0--so whatever the inputs are, they are all multiplied by 0 and have 0 added to the sum). 

```python
print(perceptron(X))        # Execute the perceptron to see its initial predictions before training.
```

In [None]:
# Code it!


#### 8. Training the Perceptron

Now that we have the elements of a simple, single node perceptron in place, let's train the network using backpropogation. The backpropagation is implemented in the optimizer algorithm, so we don't need to code that ourselves.

The [learning rate](https://developers.google.com/machine-learning/glossary#learning-rate) determines the size of the steps taken towards the global minimum while the optimizer manages the weight update process during backpropagation.  Here the Stochastic Gradient Descent (SGD) optimizer has been selected.

```python
learning_rate = 0.01
optimizer = tf.optimizers.SGD(learning_rate)
```

In [None]:
# Code it!



#### 9. Train the perceptron for 1000 epochs

An [epoch](https://developers.google.com/machine-learning/glossary#epoch) is a full training pass over the entire dataset.  Our loss or error function is defined as a lambda function (a single-line,inline function) in the first line of code in the loop block.  We use the `sigmoid_cross_entropy_with_logits` function, an appropriate choice for this application to calculate how far our predicted results are from the known results. We will not get into the technical details here as that is outside the scope of this learning experience. The second line is where our SGD optimizer seeks to minimize the model's total error.

```python
no_of_epochs = 1000

for n in range(no_of_epochs):
    loss = lambda:abs(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = y, logits = perceptron(X))))
    optimizer.minimize(loss, [W, B])
```

In [None]:
# Code it!



#### 10. Print the weights
<img alt="Friendship gif" src="https://i.giphy.com/media/3oEdvaba4h0I536VYQ/giphy.webp" align="right" width="300">

Notice that the model has learned that the faculty is awfully fond of the lakefront location! Of the weights, the 3rd one has the largest value.

Given that the input from each feature will be a 0 or a 1, multiplying by a larger weight will increase the contribution of that feature in the summation of all input-by-weight products ($x_i * w_i$) in determining the output of the neuron.

The perceptron has learned how to take the three input variables and weight them to predict the output. 

```python
print(W)
```

In [None]:
# Code it!


#### 11. Print the bias

```python
print(B)
```

In [None]:
# Code it!


#### 12. Test the perceptron

The numbers in the output tensor reflect the perceptron's predictions for each of the input cases. These are not probabilities, but the **model's estimate of the output value**. We could set a threshhold value and pick the next day the value is over some number to hold the picnic.

```python
print(perceptron(X))
```

In [None]:
# Code it!


Let's bring the X, y and predictions together to make it easier to read. Remember that `Yes=1` and `No=0` in the table.

```python
X_df = pd.DataFrame(X.numpy(), columns=['Rain Predicted?', 'Speaker Announced?', 'Lakefront Available?'])
y_df = pd.DataFrame(y.numpy(), columns=['Turnout > 70%?'])
pred_df = pd.DataFrame(perceptron(X).numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df
```

In [None]:
# Code it!


#### 13. Let's see how different choices would change the results

Let's make some changes to the data and see what happens to the learned weights and predictions. 

##### Change 1: The faculty are extremely fond of being talked at while they eat:

`y = tf.Variable([1, 1, 0, 0, 1, 0, 0, 0], dtype = tf.float32)`

##### Change 2: The faculty doesn't seem to mind the rain:

`y = tf.Variable([1, 0, 1, 0, 1, 0, 1, 0], dtype = tf.float32)`

Feel free to play with other parts of the model, everying but the X inputs is replicated below to put it all in one place for easy reference. Comments point out hyperparameters that you might want to change.

In [None]:
## From step 3
# Outputs:       1, 2, 3, 4, 5, 6, 7, 8--one for each case in the table         
y = tf.Variable([1, 0, 1, 0, 1, 0, 1, 0], dtype = tf.float32) # Change 1 has been made, you'll need to make change 2
y = tf.reshape(y, [8,1])  # convert to 4x1

## From step 4
NUM_FEATURES = X.shape[1]
OUTPUT_SIZE = 1

## From step 5
W = tf.Variable(tf.zeros([NUM_FEATURES, OUTPUT_SIZE]), dtype = tf.float32)

## From step 6
B = tf.Variable(tf.zeros([OUTPUT_SIZE, 1]), dtype = tf.float32)

## From step 7
def perceptron(X):
    z = tf.add(tf.matmul(X, W), B)      
    output = tf.tanh(z)                  # Activation function is a good hyperparameter to change 
    return output

## From step 8
learning_rate = 0.01  # Learning rate is a good hyperparameter to change
optimizer = tf.optimizers.SGD(learning_rate)

## From step 9
no_of_epochs = 1000 # Number of epochs is a good hyperparameter to change

for n in range(no_of_epochs):
    loss = lambda:abs(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels = y, logits = perceptron(X))))
    optimizer.minimize(loss, [W, B])
    
## From steps 10 on, printing the output
print(f'Weights: {W}')
print(f'Bias: {B}')

X_df = pd.DataFrame(X.numpy(), columns=['Rain Predicted?', 'Speaker Announced?', 'Lakefront Available?'])
y_df = pd.DataFrame(y.numpy(), columns=['Turnout > 70%?'])
pred_df = pd.DataFrame(perceptron(X).numpy(), columns=['Predictions'])
df = pd.concat([X_df, y_df, pred_df], axis=1)
df

## Bonus Exercises:

1. Adjust some of the model's hyperparameters and see how they affect its performance.
2. With so few inputs and samples, it's possible to get a sense of how the model would behave by just looking at the data. Add more data (features and/or samples) and see how that changes the model's behavior.
3. Using the following code as an example, add another layer to the network:

```python
# Define the perceptron function
def perceptron(X):
    z1 = tf.add(tf.matmul(X, W1), B1)  # Net input function for the first layer
    output1 = tf.tanh(z1)              # Activation function for the first layer
    z2 = tf.add(tf.matmul(output1, W2), B2)  # Net input function for the second layer
    output2 = tf.tanh(z2)              # Activation function for the second layer
    return output2
```
NOTE: You will need to make changes throughout the notebook to reflect the new layer and it's output.

4. Plot the decision boundary for the model.  This is a line that separates the two classes of data points.  

Example matplotlib code for plotting the decision boundary is provided below.  You will need to modify the code to work with your model.

```python
# Plot the decision boundary
import numpy as np

x1 = np.linspace(-1, 1, 100)
x2 = np.linspace(-1, 1, 100)
X1, X2 = np.meshgrid(x1, x2)
X_grid = np.c_[X1.ravel(), X2.ravel()]
y_grid = perceptron(X_grid)
y_grid = y_grid.reshape(X1.shape)
plt.contourf(X1, X2, y_grid, cmap=plt.cm.bwr, alpha=0.2)
plt.scatter(X[:,0], X[:,1], c=y, cmap=plt.cm.bwr)
plt.xlabel('x1')
plt.ylabel('x2')
plt.show()
```


