# Introduction

In deep learning, it is common to see a lot of discussion around tensors as the cornerstone data structure. Tensor even appears in the name of Google’s flagship machine learning library: “TensorFlow”. Tensors are a type of data structure used in linear algebra, and like vectors and matrices, you can calculate arithmetic operations with tensors.

On the other hand, [PyTorch](https://pytorch.org/) is a python package built by Facebook that provides two high-level features: 1) Tensor computation (like Numpy) with strong GPU acceleration and 2) Deep Neural Networks built on a tape-based automatic differentiation system.

## Introduction to Tensors:
A tensor is a generalization of vectors and matrices and is easily understood as a multidimensional array. According to the popular deep learning book called "Deep Learning" (Goodfellow et al.) -

"In the general case, an array of numbers arranged on a regular grid with a variable number of axes is known as a tensor."

A scalar is zero-order tensor or rank zero tensor. A vector is a one-dimensional or first order tensor, and a matrix is a two-dimensional or second order tensor.

The following info-graphic describes tensors in a very convenient way:

![](assets/tensors.jpg)

Let's build the intuition behind tensors in a more lucid way now.

A tensor is the basic building block of modern machine learning. At its core, it's a data container. Mostly it contains numbers. Sometimes it even includes strings, but that is rare. So think of it as a bucket of numbers.

But often, people confuse tensors with multi-dimensional arrays. As per StackExchange:

Tensors and multidimensional arrays are different types of object. The first is a type of function The second is a data structure suitable for representing a tensor in a coordinate system.

Mathematically, tensors are defined as a multi-linear function. A multi-linear function consists of various vector variables. A tensor field is a tensor-valued function. For a rigorous mathematical explanation, you can read [here](https://math.stackexchange.com/questions/10282/an%C2%ADintroduction%C2%ADto%C2%ADtensors?%20noredirect=1&lq=1).

So, tensors are functions or containers which you need to define. The actual calculation happens when there is data fed. What you see as arrays or multi-dimensional (1D, 2D, …, ND) can be considered as generic tensors.

Now, let's talk a bit about Tensor notation.

Tensor notation is much like matrix notation with a capital letter representing a tensor and lowercase letters with subscript integers representing scalar values within the tensor.

![](assets/tensor_notation.jpg)

Many of the operations that can be performed with scalars, vectors, and matrices can be reformulated to be performed with tensors.

As a tool, tensors and tensor algebra is widely used in the fields of physics and engineering. It is a term, and set of techniques known in machine learning in the training and operation of deep learning models can be described regarding tensors.

## Introducing PyTorch:
PyTorch is a Python-based scientific computing package targeted for:

 - A replacement for NumPy to use the power of GPUs.
 - A deep learning research platform that provides maximum flexibility and speed.
 
 
Let's quickly summarize the unique features of PyTorch 

 - PyTorch provides a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, math operations, linear algebra, reductions. And they are fast.


 - PyTorch has a unique way of building neural networks: using and replaying a tape recorder.
 

 - Most frameworks such as TensorFlow, Theano, Caffe and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.


 - PyTorch uses a technique called Reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as autograd, autograd, Chainer, etc.

(While this technique is not unique to PyTorch, it’s one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research.)

 - PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (CuDNN, NCCL) to maximize speed. At the core, it’s CPU and GPU Tensor, and Neural Network backends (TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API. They are mature and have been tested for years.
 
(Hence, PyTorch is quite fast – whether you run small or large neural networks.)

 - The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives. PyTorch's creators have written custom memory allocators for the GPU to make sure that your deep learning models are maximally memory efficient. This enables you to train bigger deep learning models than before.

## Installing PyTorch

Installation of PyTorch is pretty straightforward. As PyTorch supports efficient GPU computation, it efficiently communicates with your Cuda drivers and performs things faster.

You will be needing torch and torchvision for using PyTorch. Let's install them for a **Windows environment** ( No CUDA support).

In [None]:
$ pip3 install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-win_amd64.whl
    
$ pip3 install torchvision

(Please keep in mind that PyTorch does not support Python 2.7. So it is must have a Python version => 3.5.)

Let's see the installation steps for a **Linux environment** ( No CUDA support). 

In [None]:
$ pip3 install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-linux_x86_64.whl
    
$ pip3 install torchvision

Yes, you guessed it right! It is the same as the Windows' one. :)

## Tensor arithmetic with PyTorch:

First, let's import all the required libraries.

In [None]:
from __future__ import print_function
import torch

If the PyTorch installation was successful, then running the above lines of code won't give you any errors.

Now, let's construct a 5x3 matrix, uninitialized:

In [None]:
x = torch.rand(5, 3)
print(x)

Construct a matrix filled zeros and of data type long:

In [None]:
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

Construct a tensor directly from data:

In [None]:
x = torch.tensor([5.5, 3])
print(x)

If you understood Tensors correctly, tell me what kind of Tensor x is in the comments section!

You can create a tensor based on an existing tensor. These methods will reuse properties of the input tensor, e.g. dtype (data type), unless new values are provided by user:

In [None]:
x = x.new_ones(5, 3, dtype=torch.double)    
print(x)

x = torch.randn_like(x, dtype=torch.float)    
print(x)

Get its size:



In [None]:
print(x.size())

Note that torch.Size is in fact a tuple, so it supports all tuple operations.


### Tensor addition:
The element-wise addition of two tensors with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise addition of the scalars in the parent tensors.

![](assets/tensor_addition.jpg)

In [None]:
# Syntax 1 for Tensor addition in PyTorch
y = torch.rand(5, 3)
print(x)
print(y)
print(x + y)

In [None]:
# Syntax 2 for Tensor addition in PyTorch
print(torch.add(x, y))

### Tensor subtraction:

The element-wise subtraction of one tensor from another tensor with the same dimensions results in a new tensor with the same dimensions where each scalar value is the element-wise subtraction of the scalars in the parent tensors4.

![](assets/tensors_substraction.jpg)

### Tensor Product:
Performs a matrix multiplication of the matrices mat1 and mat2.

If mat1 is a (n×m) tensor, mat2 is a (m×p) tensor, out will be a (n×p) tensor.

You do Tensor products in PyTorch like the following:

In [None]:
mat1 = torch.randn(2, 3)
mat2 = torch.randn(3, 3)
print(mat1)
print(mat2)
print(torch.mm(mat1, mat2))

### Broadcasting:

The term broadcasting describes how arrays are treated with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across, the broader array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

Two tensors are “broadcastable” if the following rules hold:

 - Each tensor has at least one dimension.
 - When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.
 
Let's understand this with PyTorch using the following code snippet:

In [None]:
x=torch.empty(5,7,3)
y=torch.empty(5,7,3)
# same shapes are always broadcastable (i.e. the above rules always hold)

In [None]:
x=torch.empty((0,))
y=torch.empty(2,2)
# x and y are not broadcastable, because x does not have at least 1 dimension

In [None]:
# can line up trailing dimensions
x=torch.empty(5,3,4,1)
y=torch.empty(  3,1,1)
# x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1
# 3rd trailing dimension: x size == y size
# 4th trailing dimension: y dimension doesn't exist

The tensor product is the most common form of tensor multiplication that you may encounter, but many other types of tensor multiplications exist, such as the tensor dot product and the tensor contraction.

Converting a Torch Tensor to a NumPy array and vice versa is a breeze. The concept is called **Numpy Bridge**. Let's take a look at that.

### Numpy Bridge:

The Torch Tensor and NumPy array will share their underlying memory locations, and changing one will change the other.

Converting a Torch Tensor to a NumPy Array.

In [None]:
# A 1D tensor of 5 ones
a = torch.ones(5)
print(a)

In [None]:
# Convert the Torch tensor to a NumPy array
b = a.numpy()
print(b)

## Implementing a simple neural network using PyTorch:

Let's discuss a bit about a concept called [Automatic Differentiation](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) which is central to all neural networks in PyTorch. This is particularly useful for calculating gradients in the course of doing backpropagation.

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backpropagation is defined by how your code is run and that every single iteration can be different.

Let's see Automatic Differentiation in action with a straightforward code example.

In [None]:
# Create a Rank-2 tensor of all ones
x = torch.ones(2, 2, requires_grad=True)
print(x)

Do an addition operation.



In [None]:
y = x + 2
print(y)

Do some more operations on y.



In [None]:
z = y * y * 3
out = z.mean()

print(z, out)

Let’s backprop now because out contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1)).

In [None]:
out.backward()

# print gradients d(out)/dx
print(x.grad)

You’ll create a simple neural network with one hidden layer and a single output unit. You will use the ReLU activation in the hidden layer and the sigmoid activation in the output layer.

First, you need to import the PyTorch library. Neural networks can be constructed using the torch.nn package.

In [None]:
import torch
import torch.nn as nn

Then you e define the sizes of all the layers and the batch size:



In [None]:
n_in, n_h, n_out, batch_size = 10, 5, 1, 10

And now, you will create some dummy input data x and some dummy target data y. You will use PyTorch Tensors to store this data. PyTorch Tensors can be used and manipulated just like NumPy arrays but with the added benefit that PyTorch tensors can be run on the GPUs. But you will simply run them on the CPU for this tutorial. Although, it is quite simple to transfer them to a GPU.

In [None]:
x = torch.randn(batch_size, n_in)
y = torch.tensor([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]])

And now, you will define our model in one line of code.

In [None]:
model = nn.Sequential(nn.Linear(n_in, n_h),
                     nn.ReLU(),
                     nn.Linear(n_h, n_out),
                     nn.Sigmoid())

his creates a model that looks like input -> linear -> relu -> linear -> sigmoid. There is another way to define your models which is used to define more complicated and custom models. It is done by defining our model in a class. You can read about it [here](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-custom-nn-modules).

Now, it is time to construct your loss function. You will use the Mean Squared Error Loss.

In [None]:
criterion = torch.nn.MSELoss()

Also, don’t forget to define your optimizer. You will use the mighty Stochastic Gradient Descent in this one and a learning rate of 0.01. model.parameters() returns an iterator over your model’s parameters (weights and biases).

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

Now, you will run Gradient Descent for 50 epochs. This does the forward propagation, loss computation, backward propagation and parameter updates in that sequence.

In [None]:
for epoch in range(50):
    # Forward Propagation
    y_pred = model(x)
    # Compute and print loss
    loss = criterion(y_pred, y)
    print('epoch: ', epoch,' loss: ', loss.item())
    # Zero the gradients
    optimizer.zero_grad()

    # perform a backward pass (backpropagation)
    loss.backward()

    # Update the parameters
    optimizer.step()

 - **y_pred** gets the predicted values from a forward pass of our model. You pass this, along with target values y to the criterion which calculates the loss.
 

 - Then, **optimizer.zero_grad()** zeroes out all the gradients. You need to do this so that previous gradients don’t keep on accumulating.
 

 - Then, **loss.backward()** is the main PyTorch magic that uses PyTorch’s Autograd feature. Autograd computes all the gradients w.r.t. all the parameters automatically based on the computation graph that it creates dynamically. Basically, this does the backward pass (backpropagation) of gradient descent.
 

 - Finally, you call **optimizer.step()** which does a single update of all the parameters using the new gradients.
 

So, you have made till the end. In this post, you covered a whole bunch of things starting from Tensors to Automatic Differentiation and what not! You also implemented a simple neural net using PyTorch and its tensor system.

## Useful links
- https://towardsdatascience.com/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b
- https://pytorch.org/docs/stable/notes/broadcasting.html
- https://mc.ai/broadcasting-with-pytorch/
- https://github.com/yunjey/pytorch-tutorial