# Introduction to Deep Learning with PyTorch

In this notebook, you will get introduced to [PyTorch](https://pytorch.org), a framework for building and training neural networks. PyTOrch in a lot of ways behaves like the arrays you love from NumPy. These NumPy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calculates gradients (for backpropagation) and another module specifically for building neural networks. All together, PyTorch ends up being more coherent with Python and the NumPy/Scipy stack compared to TensorFlow and other frameworks.

## Neural Networks

Deep Learning is based on artificial neural networks which have been around in some form since the late 1950s. The networks are built from individual parts approximating neurons, typically called units or simply "neurons." Each unit has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation functions to get the unit's output.

<img src="https://github.com/udacity/deep-learning-v2-pytorch/raw/master/intro-to-pytorch/assets/simple_neuron.png" width=400px>

Mathematically, this looks like: 
$$
\begin{align}
y &= f(w_1 x_1 + w_2 x_2 + b) \\
y &= f\left(\sum_i w_i x_i +b \right)
\end{align}
$$

With vectors this is the dot/inner production of two vectors:

$$
h = \begin{bmatrix}
x_1 \, x_2 \cdots x_n
\end{bmatrix}
\cdot
\begin{bmatrix}
            w_1 \\
            w_2 \\
            \vdots \\
            w_n
\end{bmatrix}
$$

## Tensors

It turns out neural network computations are just a bunch of linear algebra operations on *tensors*, a generalization of matrices. A vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with 3 indices is a 3-dimensinoal tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.

<img src="https://github.com/udacity/deep-learning-v2-pytorch/raw/master/intro-to-pytorch/assets/tensor_examples.svg" width=600px>

With the basics covered, it's time to explore how we can use PyTorch to build a simple neural network.

In [1]:
# First, import PyTorch
import torch

In [2]:
def activation(x):
  """ Sigmoid activation function

      Arguments
      ---------
      x: torch.Tensor
  """
  return 1/(1+torch.exp(-x))

In [3]:
### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are random normal variables
features = torch.randn((1,5))

# True weights for our data, random normal variables again
weights = torch.randn_like(features)

# And a true bias term
bias = torch.randn((1,1))

Above I generated data we can use to get the output of our simple network. This is all just random for now, going forward we will start using normal data. Going through each relevant line:

`features = torch.randn((1,5))` creates a tensor with shape `(1,5)`, one row and five columns, that contains values randomly distributed according to the normal distribution with a mean of zero and standard deviation of one.

`weights = torch.randn_like(features)` creates another tensor with the same shape as `features`, again containing values from a normal distribution.

Finally, `bias = torch.randn((1,1))` creates a single value from a normal distribution.

PyTorch tensors can be added, multiplied, subtracted, etc. just like NumPy arrays. In general, you will use PyTorch tensors pretty much the way you'd use NumPy arrays. They come with some nice benefits such as GPU acceleration. For now, use the generated data to calculate the output of this simple single layer network.
> **Exercise**: Calculate the output of the network with input features `features`, weights `weights`, and bias `bias`. Similar to NumPy, PyTorch has a [`torch.sum()`](https://pytorch.org/docs/stable/torch.html#torch.sum) function, as well as a `.sum()` method on tensors, for taking sums. Use the function `activation` defined above as the activation function.


In [15]:
## Calculate the output of this network using the weights and bias tensors

output = features.matmul(weights.T) + bias
print(output)

tensor([[-1.6619]])


You can do the multiplication and sum in the same operation using a matrix multiplication. In general, you will want to use matrix multiplications since they are more efficient and accelerated using modern libraries and high-performance computing on GPUs.

Here, we want to do a matrix multiplication of the features and the weights. For this we can use `torch.mm()` or `torch.matmul()` which is somewhat more complicated and supports broadcasting. If we try to do it with `features` and `weights` as they are, we'll get an error.