# Neural Networks in PyTorch

[PyTorch](http://pytorch.org/) is a framework for building and training neural networks. 


## Neural Networks

- Deep Learning is based on artificial Neural Networks (NNs) 
- the NNs are built from individual neurons (also called units) 
- the goal of the NN is to learn to recognize patterns in your data
- once the NN has been trained on samples of your data, it can make predictions by detecting similar patterns in future data

### Layers of Neural Networks 
NNs have certain special architecture with layers: 

![Layers of a Neural Network](./imgs/nn_layers.png)


**input layer**
- the 1st layer
- contains the inputs ($x_1, x_2...x_n$)

**hidden layer**
- a set of linear models created with the input layer

**output layer**
- final layer
- where the linear models get combined to obtain a nonlinear model 

### What is happening inside the node?

If we would zoom in to one of the hidden or output nodes, we would see the following: 

![Hidden Layer](./imgs/nn_layers_zoom.png)

1. Each unit has some weights and a bias 
    - they are updated during the network training depending on the error
2. These weighted inputs are summed together (= a linear combination) 
3. Then passed through an activation function to get the unit's output
 
<div class="alert alert-block alert-success">
<b>Activation function:</b> 
<li> puts a nonlinear transformation to the linear combination, which generates the output </li>
<li> the output of the activation function is used as a input in the next layer </li>
<li> examples: sigmoid, softmax, tanh, ReLu function </li>
</div>

Mathematically this looks like: 

$$
\begin{align}
y &= f(w_1 x_1 + w_2 x_2 + b) \\
y &= f\left(\sum_i w_i x_i +b \right)
\end{align}
$$

You can think of input and weights as vectors.<br> 
With vectors this is the dot/inner product of two vectors:

$$
h = \begin{bmatrix}
x_1 \, x_2 \cdots  x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_1 \\
           w_2 \\
           \vdots \\
           w_n
\end{bmatrix}
$$



## Tensors

<div class="alert alert-block alert-success">
<b>Tensors:</b> 
<li> are a generalization of vectors and matrices </li>
<li> the fundamental data structure for neural networks  </li>
<li> PyTorch (and other deep learning frameworkas as well) is built around them </li>
<li> behaves like numpy arrays </li>
</div>

**1D tensor**:
- a vector (is an instance of a tensor)
- we just have a single 1D array of values
<br>

**2D tensor**: 
- matrix
- we have values going in 2 directions --> from left to right and from top to bottom
 * so that we have individual rows + columns
 * we can do operations accross the columns
 
<br>
**3D tensor**:
- an array with 3 indices (e.g. RGB color images) 
 * for every pixel there's some value for all red, green and blue channels --> for every individual pixel in a 2D image, we have 3 values
<br>

You can actually have 4D, 5D, 6D ... etc. tensors, but we normally work with 1D, 2D and 3D tensors. <br> 

![Tensors](./imgs/tensors.png)

- NNs computations are just a bunch of linear algebra operations on tensors <br>

- PyTorch tensors 
    - can be added, multiplied, subtracted, etc, just like Numpy arrays.
    - we'll use them  pretty much the same way we'd use Numpy arrays
    - they come with some nice benefits though such as GPU acceleration 

## Numpy vs Pytorch 

PyTorch 
- takes these tensors and makes it simple to move them to GPUs for the faster processing when training neural networks
- provides modules that 
    - automatically calculates gradients (for backpropagation!) and 
    - another specifically for building neural networks

All together, PyTorch ends up being more coherent with Python and the Numpy/Scipy stack compared to TensorFlow and other frameworks. <br>

Numpy arrays can easily be replaced with tensorflow’s tensor, but the reverse is not true.


## EXERCISE 1: Calculate the output of  a single neuron
<br> 

### STEPS
1. Set each unit random weights and a bias 
2. Calculate the output of the network <br> 
    2.1. Sum the weighted inputs together (= a linear combination)  <br> 
    2.2. Then pass through an activation function to get the unit's output <br> 

![Simple Neuron](./imgs/simple_neuron.png)

**Parts of this neural network:** <br>
$x_1, x_2, ... x_n$: input features (row vector) - features of the input data for your network <br>
$w_1, w_2, ... w_n$: weights (row vector) <br>
$h$: activation function <br>
$y$: output (prediction)


### 1. Generating random data for features + weights, and set bias 

#### STEPS 
1. We generate some random data for features (matrix(1,5) - a vector with 5 elements) 
2. We generate some random data for weights (matrix(1,5) - a vector with 5 elements) 
3. Set bias (matrix(1,1))

> #### PYTORCH HELP 
> ##### Random Sampling

> `torch.manual_seed(seed)`
-  Sets the seed for generating random numbers.
-  We need to set this because we want to get the same each number each time we run this notebook. 
>
>`torch.rand(sizes)`
 - returns a tensor filled with random numbers from a uniform distribution 
 - size: 
    * defining the shape of the tensor 
    * a tuple/list of the size that you want
>
> `torch.randn_like(input)`
 - returns a tensor 
    * with the same size as input
    * that is filled with random numbers from a uniform distribution
 - input: a tensor which shape we would like to copy 
 - is equivalent to torch.rand(input.size()) 

>[Random Sampling (Documentation)](https://pytorch.org/docs/stable/torch.html#random-sampling)    

In [1]:
# First, import PyTorch
import torch

In [2]:
#We set the random seed so we can reproduce the same results each time we use this given seed. 
torch.manual_seed(7)

<torch._C.Generator at 0x240f676f2d0>

#### 1.1. Random features 

This case we want the features to be a matrix: 
- a 2D tensor of 1 row and 5 columns (= a row vector that has 5 elements) 
- containing random normal distributed data

In [3]:
features = torch.randn((1, 5))
features

tensor([[-0.1468,  0.7861,  0.9468, -1.1143,  1.6908]])

#### 1.2. Random Weights  
we create another 2D tensor 
- with the same shape as `features`
- again containing values from a normal distribution

In [4]:
weights = torch.randn_like(features)
weights

tensor([[-0.8948, -0.3556,  1.2324,  0.1382, -1.6822]])

#### 1.2. Set bias
we create a single value for the bias 
- also from a normal distibution

In [5]:
# and a true bias term
bias = torch.randn((1, 1))
bias

tensor([[0.3177]])

### 2. Calculating the output of the neuron

Use the generated data to calculate the output of this simple single layer network.

#### STEPS 
1. Define the activation function (sigmoid) 
2. Calculate the output of the neuron

#### 2.1. Define the activation function 

<div class="alert alert-block alert-success">
<b>Sigmoid function:</b> 
   <li> squeezes the input values between 0 and 1 </li>
   <li> really useful for providing a probability </li>
    <li>  if you want your NN to output a probability, then sigmoid is what you want to use </li>
</div>

![Sigmoid](./imgs/sigmoid.png)


>####  PYTORCH HELP
> `torch.exp(x)`
- Returns a new tensor with the exponential of the elements of the input tensor input
>
> [torch.exp() (Documentation)](https://pytorch.org/docs/stable/torch.html?highlight=exp#torch.exp)


In [6]:
def activation(x):
    """ Sigmoid activation function 
    
        Arguments
        ---------
        x: torch.Tensor
    """
    return 1/(1+torch.exp(-x))

>####  PYTORCH HELP
> `torch.sum(input)` + `tensor.sum(input)`
- doing the same
- sums up all the elements in the given input tensor 
  
> [torch.sum() (Documentation)](https://pytorch.org/docs/stable/torch.html?highlight=torch%20sum#torch.sum)  


### 2.1. Solution 1 - torch.sum()

In [7]:
y = activation(torch.sum(features*weights) + bias)
y 

tensor([[0.1595]])

### 2.2. Solution 2 - tensor.sum()
**Steps**:
1. we're doing elementwise multiplication 
2. we sum up all the values in the tensors 

In [8]:
y = activation((features * weights).sum() + bias)
y

tensor([[0.1595]])

### 2.3. Solution 3 - Matrix multiplications 


- You can do the multiplication and sum in the same operation using a matrix multiplication. 
- In general, we'll want to use matrix multiplications since
    * they are more efficient and 
    * these linear algebra oprations have been accelerated using modern libraries and high-performance computing on GPUs

> #### PYTORCH HELP 

> ##### Matrix multiplication
> - there are 2 ways to do matrix multiplication in Pytorch
> - both functions' result is the matrix product of two tensors

>`torch.matmul(tensor1, tensor2)`
- supports broadcasting 
    * its behavior depends on the dimensionality of the tensors

><div class="alert alert-block alert-success">
<b>Broadcasting:</b> <br>
if a PyTorch operation supports broadcast, then its Tensor arguments can be automatically expanded to be of equal sizes <br>
(without making copies of the data)<br>
</div>
        

>`torch.mm(tensor1, tensor2,)`
- does not broadcast 
    - more simple and more strict about the tensors that you pass in
    - if we got something wrong, it's going throw an error instead of just doing it and continuing the calculations 
        * e.g. If we try to do it with `features` and `weights` as they are, we'll get an error:

> ```python
> >> torch.mm(features, weights)

> ---------------------------------------------------------------------------
> RuntimeError                              Traceback (most recent call last)
> <ipython-input-13-15d592eb5279> in <module>()
> ----> 1 torch.mm(features, weights)

> RuntimeError: size mismatch, m1: [1 x 5], m2: [1 x 5] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1524590658547/work/aten/src/TH/generic/THTensorMath.c:2033
> ```
>
> - we see a RuntimeError: size mismatch error  <br>
>    * we'll see this error very often when designing NNs. <br>
>    * the problem: our tensors aren't the correct shapes to perform a matrix multiplication <br>

> **Most of the error we are going to see when we are buliding networks and lot of the difficulty when it comes to designing the architecture of neural networks is getting the shapes of your tensors to work right together. **
-  the large part of debuggin we're actually going to be trying to look at the shape of your tensors as they are going through our network
    
>`tensor.shape`:
- to see the shape of the tensor
- it works like this in other deep learning frameworks as well
> 
> [torch.mm (Documentation)](https://pytorch.org/docs/stable/torch.html#torch.mm) <br> 
> [torch.matmul (Documentation)](https://pytorch.org/docs/stable/torch.html#torch.matmul) <br>

In [9]:
features.shape

torch.Size([1, 5])

In [10]:
weights.shape

torch.Size([1, 5])

#### They have the same shape. What is the problem? <br>

<div class="alert alert-block alert-danger">
<b> REMEBER FROM LINEAR ALGEBRA</b> <br>
for matrix multiplications, the number of columns in the first tensor must equal to the number of rows in the second column <br>
</div>

This means we need to change the shape of `weights` (to 5*1)  to get the matrix multiplication to work.

> #### PYTORCH HELP

> ##### Reshaping Tensors 
> There are 3 options: 
>
> `tensor.reshape(a, b)` <br>
- sometimes it will return a new tensor with the same data as the tensor with size `(a, b)`, 
- and sometimes a clone, as in it copies the data to another part of memory.
> 
> `tensor.resize_(a, b)` 
- returns the same tensor with a different shape. 
- if the new shape results in 
    - fewer elements than the original tensor:  some elements will be removed from the tensor (but not from memory)
    - more elements than the original tensor: new elements will be uninitialized in memory
- this method is performed **in-place**

> <div class="alert alert-block alert-success">
> <b>In-place operation:</b> 
>    <li> an operation changes directly the content of a given tensor without making a copy </li>
>    <li> in PyTorch they are always postfixed with a _, like .add_() or .scatter_()  </li>
>    <li> Python operations like += or *= are also in-place operations  </li> 
> </div>
> 
> `weights.view(a, b)` 
> - will return a new tensor with the same data as `weights` with size `(a, b)`
> - this is the most recommended to use for reshaping a tensor

> [Math Operations(Documentation)](https://pytorch.org/docs/stable/torch.html?highlight=torch%20mm#math-operations) <br>
> [Broadcasting (Documentation)](https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics)<br>
> [Reshape tensors (Documentation)](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape)<br>
> [Resize tensors (Documentation)](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.resize_)<br>
> [View tensors (Documentation)](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view)<br>
> [In-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244)<br>

In [11]:
weights.view(5,1)

tensor([[-0.8948],
        [-0.3556],
        [ 1.2324],
        [ 0.1382],
        [-1.6822]])

In [12]:
weights.view(5,1).shape

torch.Size([5, 1])

Calculate the output of this network using matrix multiplication:

In [13]:
y = activation(torch.mm(features, weights.view(5,1)) + bias)
y

tensor([[0.1595]])

## EXCERCISE 2: Calculate the output of a  Multilayer Neural Network 
<br>
- We can stack these single neurons up into a multi-layer neural network, and this basically gives our network greater power to capture patterns and correlations in our data. 
- This way the output of one layer of neurons becomes the input for the next layer. 
- **With multiple input units and output units, we now need to express the weights as a matrix.**  <br>
<br>
<br>
![Multilayer Neural Network](./imgs/multilayer_diagram_weights.png)

**Parts of this neural network:** <br>
*the input layer *
- $x_1, x_2, ... x_n$: input features (row vector) <br>
- $w_1, w_2, ... w_n$: weights (matrix) <br>
    * they connect our input to one hidden unit in this middle layers 

*the hidden layer *
- $h$: activation function <br>

*the output layer*
- $y$: output (prediction)

<br> 

**Get the values of the hidden layer**:
- To get the values for the hidden layer,  we do a matrix multiplication between our feature vector ($x_1  to  x_n$) and our weight matrix. 
- We can express this network mathematically with matrices again and use matrix multiplication to get linear combinations for each unit in one operation. <br> 

*For example, the hidden layer ($h_1$ and $h_2$ here) can be calculated *

$$
\vec{h} = [h_1 \, h_2] = 
\begin{bmatrix}
x_1 \, x_2 \cdots \, x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_{11} & w_{12} \\
           w_{21} &w_{22} \\
           \vdots &\vdots \\
           w_{n1} &w_{n2}
\end{bmatrix}
$$

When we multiply our inputs (features) with
- the first column of the weight matrix, then we are going to get the output $h_1$. 
- the second column of the weight matrix, then we are going to get the output $h_2$. 


The output for this small network is found by treating the hidden layer as inputs for the output unit. The network output is expressed simply

$$
y =  f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
$$

<br> 

### STEPS 
1. Generate some random data for features (matrix(1,3) - row vector)  
2. Define the size of each layer in the network (hidden: 2) 
3. Generate some random data for  <br>
    3.1. weights W1, W2 (matrix(1,3)) <br>
    3.2. bias terms B1, B2 (matrix(1,1)) <br>
4. Calculate the output of the network<br>
    4.1. Calculate the result of a single neuron (a hidden layer) <br>
    4.2. Stack the neurons up<br>

### 1. Generate some random data for features

In [14]:
### Generate some data
torch.manual_seed(7)

# Features are 3 random normal variables
features = torch.randn((1, 3))
features

tensor([[-0.1468,  0.7861,  0.9468]])

### 2. Define the size of each layer in our network

It is important so that we would know how many rows and columns are needed in the weight matrices. 

In [15]:
n_input = features.shape[1]     # Number of input units, must match number of input features
n_hidden = 2                    # Number of hidden units 
n_output = 1                    # Number of output units

### 3. Generate random data
#### 3.1. for weights 

for the hidden and the output layer 

In [16]:
# Weights for inputs to hidden layer
W1 = torch.randn(n_input, n_hidden)
# Weights for hidden layer to output layer
W2 = torch.randn(n_hidden, n_output)

print('Weights for inputs to hidden layer:', W1)
print('Weights for hidden layer to output layer:', W2)

Weights for inputs to hidden layer: tensor([[-1.1143,  1.6908],
        [-0.8948, -0.3556],
        [ 1.2324,  0.1382]])
Weights for hidden layer to output layer: tensor([[-1.6822],
        [ 0.3177]])


#### 3.2. for bias terms 

for the hidden and the output layer 

In [17]:
# and bias terms for hidden and output layers
B1 = torch.randn((1, n_hidden))
B2 = torch.randn((1, n_output))

print('Bias terms for hidden layer:', B1)
print('Bias term for output layer:', B2)

Bias terms for hidden layer: tensor([[0.1328, 0.1373]])
Bias term for output layer: tensor([[0.2405]])


### 4. Calculate the output of the network

#### 4.1. calculate the result of a single neuron (a hidden layer)

In [18]:
h = activation(torch.mm(features, W1) + B1)
h

tensor([[0.6813, 0.4355]])

#### 4.2. Stack the neurons up 

with using the hidden layer calculated in the previous step as an input for the next layer ofr our NN 

In [19]:
output = activation(torch.mm(h, W2) + B2)
output

tensor([[0.3171]])

### The number of hidden units a parameter of the network

- are often called a **hyperparameter** to differentiate it from the weights and biases parameters
- the more hidden units a network has, and the more layers, the better able it is to learn from data and make accurate predictions

# Resources 

## Udacity
[The Original Notebook](https://github.com/udacity/deep-learning-v2-pytorch/tree/master/intro-to-pytorch) <br> 
[Single Layer Neural Networks](https://youtu.be/6Z7WntXays8) <br> 
[Single layer neural networks solution](https://youtu.be/mNJ8CujTtpo) <br> 
[Networks Using Matrix Multiplication](https://youtu.be/QLaGMz8Ca3E) <br> 
[Multilayer Networks Solution](https://youtu.be/iMIo9p5iSbE)
[Feedforward](https://youtu.be/hVCuvMGOfyY) <br> 


## Other 
[What is a Neural Net?](http://www.cormactech.com/neunet/whatis.html) <br>
[Linear Algebra cheatsheet Deep Learning](https://towardsdatascience.com/linear-algebra-cheat-sheet-for-deep-learning-cd67aba4526c)<br>
[25 Must Know Terms & concepts for Beginners in Deep Learning](https://www.analyticsvidhya.com/blog/2017/05/25-must-know-terms-concepts-for-beginners-in-deep-learning/) <br> 
[Applied Deep Learning - Part 1: Artificial Neural Networks](https://towardsdatascience.com/applied-deep-learning-part-1-artificial-neural-networks-d7834f67a4f6) <br> 
[Activation functions and it’s types-Which is better?](https://towardsdatascience.com/activation-functions-and-its-types-which-is-better-a9a5310cc8f) <br> 