# Introduction to Deep Learning with PyTorch

In this notebook, you'll get introduced to [PyTorch](http://pytorch.org/), a framework for building and training neural networks. PyTorch in a lot of ways behaves like the arrays you love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calculates gradients (for backpropagation!) and another module specifically for building neural networks. All together, PyTorch ends up being more coherent with Python and the Numpy/Scipy stack compared to TensorFlow and other frameworks.



## Neural Networks

Deep Learning is based on artificial neural networks which have been around in some form since the late 1950s. The networks are built from individual parts approximating neurons, typically called units or simply "neurons." Each unit has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit's output.

<img src="assets/simple_neuron.png" width=400px>

Mathematically this looks like: 

$$
\begin{align}
y &= f(w_1 x_1 + w_2 x_2 + b) \\
y &= f\left(\sum_i w_i x_i \right)
\end{align}
$$

With vectors this is the dot/inner product of two vectors:

$$
h = \begin{bmatrix}
x_1 \, x_2 \cdots  x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_1 \\
           w_2 \\
           \vdots \\
           w_n
\end{bmatrix}
$$

### Stack them up!

We can assemble these unit neurons into layers and stacks, into a network of neurons. The output of one layer of neurons becomes the input for the next layer. With multiple input units and output units, we now need to express the weights as a matrix.

<img src='assets/multilayer_diagram_weights.png' width=450px>

We can express this mathematically with matrices again and use matrix multiplication to get linear combinations for each unit in one operation. For example, the hidden layer ($h_1$ and $h_2$ here) can be calculated 

$$
\vec{h} = [h_1 \, h_2] = 
\begin{bmatrix}
x_1 \, x_2 \cdots \, x_n
\end{bmatrix}
\cdot 
\begin{bmatrix}
           w_{11} & w_{12} \\
           w_{21} &w_{22} \\
           \vdots &\vdots \\
           w_{n1} &w_{n2}
\end{bmatrix}
$$

The output for this small network is found by treating the hidden layer as inputs for the output unit. The network output is expressed simply

$$
y =  f_2 \! \left(\, f_1 \! \left(\vec{x} \, \mathbf{W_1}\right) \mathbf{W_2} \right)
$$

## Tensors

It turns out neural network computations are just a bunch of linear algebra operations on *tensors*, a generalization of matrices. A vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.

<img src="assets/tensor_examples.svg" width=600px>

With the basics covered, it's time to explore how we can use PyTorch to build a simple neural network.

In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import numpy as np
import torch

import helper

First, let's see how we work with PyTorch tensors. These are the fundamental data structures of neural networks and PyTorch, so it's imporatant to understand how these work.

In [2]:
x = torch.rand(3, 4)
x

tensor([[0.3191, 0.0705, 0.3836, 0.7089],
        [0.8608, 0.1609, 0.5945, 0.7253],
        [0.8961, 0.8829, 0.8684, 0.6993]])

In [3]:
x.size()

torch.Size([3, 4])

In [4]:
y = torch.ones(x.size())
y

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [11]:
w = torch.ones(4,3);w

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [25]:
z = x + y
z

tensor([[1.3191, 1.0705, 1.3836, 1.7089],
        [1.8608, 1.1609, 1.5945, 1.7253],
        [1.8961, 1.8829, 1.8684, 1.6993]])

In [16]:
s = torch.mm(w,x);s

tensor([[2.0761, 1.1143, 1.8466, 2.1336],
        [2.0761, 1.1143, 1.8466, 2.1336],
        [2.0761, 1.1143, 1.8466, 2.1336],
        [2.0761, 1.1143, 1.8466, 2.1336]])

In [18]:
m = torch.ones(3,4)*2;m

tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

RuntimeError: dot: Expected 1-D argument self, but got 2-D

In [7]:
q = x+1;q

tensor([[1.3191, 1.0705, 1.3836, 1.7089],
        [1.8608, 1.1609, 1.5945, 1.7253],
        [1.8961, 1.8829, 1.8684, 1.6993]])

In general PyTorch tensors behave similar to Numpy arrays. They are zero indexed and support slicing.

In [20]:
z[:,0]

tensor([1.3191, 1.8608, 1.8961])

In [21]:
z[:, 1:]

tensor([[1.0705, 1.3836, 1.7089],
        [1.1609, 1.5945, 1.7253],
        [1.8829, 1.8684, 1.6993]])

Tensors typically have two forms of methods, one method that returns another tensor and another method that performs the operation in place. That is, the values in memory for that tensor are changed without creating a new tensor. In-place functions are always followed by an underscore, for example `z.add()` and `z.add_()`.

In [26]:
# Return a new tensor z + 1
z.add(1)

tensor([[2.3191, 2.0705, 2.3836, 2.7089],
        [2.8608, 2.1609, 2.5945, 2.7253],
        [2.8961, 2.8829, 2.8684, 2.6993]])

In [27]:
# z tensor is unchanged
z

tensor([[1.3191, 1.0705, 1.3836, 1.7089],
        [1.8608, 1.1609, 1.5945, 1.7253],
        [1.8961, 1.8829, 1.8684, 1.6993]])

In [28]:
# Add 1 and update z tensor in-place
z.add_(1)

tensor([[2.3191, 2.0705, 2.3836, 2.7089],
        [2.8608, 2.1609, 2.5945, 2.7253],
        [2.8961, 2.8829, 2.8684, 2.6993]])

In [29]:
# z has been updated
z

tensor([[2.3191, 2.0705, 2.3836, 2.7089],
        [2.8608, 2.1609, 2.5945, 2.7253],
        [2.8961, 2.8829, 2.8684, 2.6993]])

### Reshaping

Reshaping tensors is a really common operation. First to get the size and shape of a tensor use `.size()`. Then, to reshape a tensor, use `.resize_()`. Notice the underscore, reshaping is an in-place operation. Reshaping is done by selecting the subarray of the array being resized. 

In [30]:
z.size()

torch.Size([3, 4])

In [31]:
z.resize_(2, 3)

tensor([[2.3191, 2.0705, 2.3836],
        [2.7089, 2.8608, 2.1609]])

In [24]:
z

tensor([[1.3191, 1.0705, 1.3836],
        [1.7089, 1.8608, 1.1609]])

In [33]:
p = torch.rand(3,2);p

tensor([[0.8234, 0.5502],
        [0.1638, 0.6242],
        [0.5250, 0.2211]])

In [34]:
p.resize_(2,3)

tensor([[0.8234, 0.5502, 0.1638],
        [0.6242, 0.5250, 0.2211]])

## Numpy to Torch and back

Converting between Numpy arrays and Torch tensors is super simple and useful. To create a tensor from a Numpy array, use `torch.from_numpy()`. To convert a tensor to a Numpy array, use the `.numpy()` method.

In [36]:
a = np.random.rand(4,3)
a

array([[0.35161117, 0.0659321 , 0.66173763],
       [0.17159856, 0.71392869, 0.75407157],
       [0.26950784, 0.45262912, 0.74377045],
       [0.30317221, 0.78095687, 0.99047428]])

In [37]:
b = torch.from_numpy(a)
b

tensor([[0.3516, 0.0659, 0.6617],
        [0.1716, 0.7139, 0.7541],
        [0.2695, 0.4526, 0.7438],
        [0.3032, 0.7810, 0.9905]], dtype=torch.float64)

In [38]:
b.numpy()

array([[0.35161117, 0.0659321 , 0.66173763],
       [0.17159856, 0.71392869, 0.75407157],
       [0.26950784, 0.45262912, 0.74377045],
       [0.30317221, 0.78095687, 0.99047428]])

The memory is shared between the Numpy array and Torch tensor, so if you change the values in-place of one object, the other will change as well.

In [39]:
# Multiply PyTorch Tensor by 2, in place
b.mul_(2)

tensor([[0.7032, 0.1319, 1.3235],
        [0.3432, 1.4279, 1.5081],
        [0.5390, 0.9053, 1.4875],
        [0.6063, 1.5619, 1.9809]], dtype=torch.float64)

In [40]:
# Numpy array matches new values from Tensor
a

array([[0.70322235, 0.1318642 , 1.32347526],
       [0.34319712, 1.42785737, 1.50814315],
       [0.53901569, 0.90525825, 1.4875409 ],
       [0.60634443, 1.56191374, 1.98094856]])