## Introduction



This notebook is part of the workshop "Mathematics of Deep Learning" run
by Aggregate Intellect Inc. ([https://ai.science](https://ai.science)), and is released
under 'Creative Commons Attribution-NonCommercial-ShareAlike CC
BY-NC-SA" license. This material can be altered and distributed for
non-commercial use with reference to Aggregate Intellect Inc. as the
original owner, and any material generated from it must be released
under similar terms.
([https://creativecommons.org/licenses/by-nc-sa/4.0/](https://creativecommons.org/licenses/by-nc-sa/4.0/))

In this notebook we will now look at the non-linear functions used throughout a neural network. 



## Work with non-linearities:



For each non-linearity function, compute it's output on some input $x$



### Affine map



The basis of neural networks, the affine map is an abstraction of a neuron which multiplies each input by a weight and then sums them up. A bias is added to properly center the result.   

Affine transformations are defined as 

$$ y = Ax + b $$

where $A$ is a matrix of weights, $x$ is the input vector, and $b$ is a bias vector. It can be seen as a linear transformation.


In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt

In [None]:
#what is this?
torch.manual_seed(1)  

<torch._C.Generator at 0x7f3a400e5930>

The matrix form of linear transformation for the data is defined as:


$$y= xA^T +b$$

In [1]:
# define an affine map where the input's size is 5 and output size is 3 (with nn.Linear)

In [None]:
# define a 2x5 data
data = torch.randn(2, 5)
print(data)

tensor([[ 0.6614,  0.2669,  0.0617,  0.6213, -0.4519],
        [-0.1661, -1.5228,  0.3817, -1.0276, -0.5631]])


In [None]:
# apply it to the data
print(lin(data))

In [None]:
# what happened?
# print the weights in lin.weight
# print the bias in lin.bias

## Non-linear functions

### Plot a rectified linear unit (ReLu) function



The ReLu function is defined by:

$$ y = \begin{cases} 
x &; \quad if \quad x > 0 \\ 
0 &; \quad otherwise 
\end{cases} $$



In [None]:
# define x to be a range of (-5,5)
x = 

In [None]:
# define y as relu(x) using torch.relu
y = 
print(y)

In [1]:
# plot
plt.plot(x,y)

### Plot a sigmoid function



Sigmoid function is defined as:

$$ y = \frac{1}{1 + e^{-x}} $$



In [None]:
# define x to be a range of (-5,5)
x = 

In [None]:
# define y as sigmoid(x) using torch.sigmoid
y = 

In [None]:
# plot()

### Plot a tanh function



tanh function is defined as:

$$ y = \frac{e^x - e^{-x}}{e^x + e^{-x}} $$



In [None]:
# define x to be a range of (-5,5)
x = 

In [None]:
# define y as tanh(x) using torch.tanh
y = 

In [None]:
# plot()

### Define a ReLU layer in PyTorch



In [None]:
import torch.optim as optim

torch.manual_seed(1)

<torch._C.Generator at 0x7f3a400e5930>

In [None]:
lin = nn.Linear(5, 3) 
data = torch.randn(2, 5)
torch.relu(lin(data))

tensor([[0.1755, 0.0000, 0.0000],
        [0.0000, 0.2260, 0.1089]], grad_fn=<ReluBackward0>)

Try and apply ReLU to the image we downloaded in the last notebook



In [2]:
img = plt.imread('img.jpg')

### Experiment with softMax and verify it is a probability distribution function



Softmax is defined as:

$$ y = \frac{e^{x_j}}{\sum_j{e^{x_j}}} $$


In [None]:
# define x to be a random tensor with the size of 5
x = 
print(x)

In [None]:
# apply softmax to x along the 0th dimension
torch.softmax(x, dim=0)

In [None]:
# verify that it returns a normalized probability distribution along that axis
torch.softmax(x, dim=0).sum()

In [None]:
#take the output of the softmax, if this is a classification problem, what is the probability of the most probable class?

In [None]:
# repeat with a 5x2 random tensor - remember to use .sum(0) instead of sum()

Why do we bother with exponents in softmax? Why not just normalize the output so that we have a probability distribution? \\
The answer is that we want to mimic the *argmox* function; but a 'soft', probabilistic version of it.

Let's say our model output vector is `[1, 1, 5, 3]` \\
The argmax of the output is `[0, 0, 1, 0]`

Let's compare a normalized version with softmax:



In [None]:
logits = np.array([1, 1, 5, 3])

In [None]:
# if we just normalized the output we would get
print(logits/np.sum(logits))

In [None]:
# now lets look at the softmax version
print(np.exp(logits)/np.sum(np.exp(logits)))

We can see that softmax exaggerates the differences so that the maximum value is much greater than the other values. This is much closer to *argmax*, which is what we want.



## Hands-on Challenge



Now that you have seen the different non-linear functions, it's time to implement them ourselves. We will implement each function as a layer for our MLP: 

1.  linearlayer
2.  softmax
3.  relu
4.  sigmoid
5.  tanh

All the layers will have the same basic structure: 

-   initialize the required data structure(s)
-   forward function that is called to process input data
-   a reset function to clear out the outputs.

**Note** There is a reason to have an *out* tensor and not just pass through the results. Later we will need the output results to compute the gradients.

The code below is Code Challenge 1 + non-linear layers.

In [None]:
class layer():
    
    #layer class with only node_dim to specify
    #This is the representation of just one layer
    #we will use this as a base class for linearlayer and non-linear functions.
    
    def __init__(self, node_dim):
        """
        This init should be called via super() with the number
        of nodes as an argument.
        """
        #define basics that a layer would have: input, input_grad
        
    def forward(self, x):
        #define input as x
        
    def zero_grad(self):
        #clean your input_grad

However, the **linearlayer** adds two more arguments: the input dimension and a switch to include bias. Normally we want bias, but we will make it an option just in case.



In [None]:
class linearlayer(layer):
    # This could be a linear layer (with inputs and outputs)
    def __init__(self, in_dim, node_dim, bias=True):
         #You can inherit from your base class with super() builtin function.
        self.out = #Initialize your output
        self.weights = #Initialize your weights
        if bias:
            self.bias = #create tensor biases
        
        if bias:
            self.bias = np.zeros(in_dim)
            
    def forward(self, x):
        #pass inputs and create your output (Remember W.X + b)
        return #out
    
    def reset(self):
       #clean outputs

Go ahead and write the code for all the non-linearity layers using numpy + maths formulas. Then append the layers on your simple MLP, try and run it.


Here's a template.

In [None]:
class relu(layer):
    def __init__(self, node_dim):
        #Inheritate 
        
    def forward(self, x):
        #return non-linear values.
        return #np.clip(..)
    
class softmax(layer):
    def __init__(self, node_dim):
        #Inheritate 
        
    def forward(self, x):
        #return non-linear values.
        return #np.exp..
    
class sigmoid(layer):
    def __init__(self, node_dim):
        #Inheritate 
        
    def forward(self, x):
        #return non-linear values.
        return 
    
class tanh(layer):
    def __init__(self, node_dim):
        #Inheritate 
        
    def forward(self, x):
        #return non-linear values.
        return 

In [None]:
class MLP():
    def __init__(self):

        # The MLP will be a list with each layer as an item.
        self.net = []

        self.net.append(linearlayer(10, 20))
        self.net.append(relu(20))
        self.net.append(linearlayer(20, 4))
        self.net.append(softmax(4))

    def forward(self, x):

        # Input x for each layer and return the result back into x,
        # ready as input for the next layer.
        for layer in self.net:
            x = layer.forward(x)
        return x

    def reset(self):
        # traverse the MLP and call each layers 'reset' method
        for layer in self.net:
            layer.reset()

Let's see if it works



In [None]:
model = MLP()

x = np.random.random(10)

print(x)
print(model.forward(x))

Go ahead and try all the layers you made and see how they work.

