# PyTorch Introduction
## PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
### Can be seen as substitute of NumPy with GPU capabilities
A Tensor is just a more generic term than matrix or vector.
PyTorch Tensors There appear to be 4 major types of tensors in PyTorch: Byte, Float, Double, and Long tensors. Each tensor type corresponds to the type of number (and more importantly the size/preision of the number) contained in each place of the matrix.


## NumPy:
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
* For Numpy docs: Refer the following https://docs.scipy.org/doc/numpy-1.13.0/reference/

In [1]:
import torch

ModuleNotFoundError: No module named 'torch'

## A torch.Tensor is a multi-dimensional matrix containing elements of a single data type.
A very similar package for `numpy.ndarray`. It basically supports almost every major computation of numpy

### Lets dive in and see some basic pythonic operations

In [None]:
x = torch.zeros(2,3)
x

In [None]:
x = torch.ones(2,3)
x

In [None]:
# torch.arange(start,end,step=1) -> [start,end) with step
x = torch.arange(0,3,step=0.5)
x


In [None]:
# torch.FloatTensor(size or list)
# tensor is just an array
x = torch.FloatTensor(2,3)
x


## Convert NumPy to PyTorch and vice-versa
With almost no computation cost, you can convert PyTorch tensor to NumPy array and any change in the converted NumPy array will reflect on the original PyTorch tensor

In [None]:
import numpy as np

# torch.from_numpy(ndarray) -> tensor

x1 = np.ndarray(shape=(2,3), dtype=int, buffer=np.array([1,2,3,4,5,6]))
x2 = torch.from_numpy(x1)

x2

In [None]:
# tensor.numpy() -> ndarray
x3 = x2.numpy()
x3

# Defining a numpy array and converting it to a Torch tensor
# a = np.ndarray(shape=(2,3), dtype=float)
# a = torch.from_numpy(a)
# a

In [None]:
x = torch.FloatTensor([[1,2,3],[4,5,6]])
x

In [None]:
# x_gpu = x.cuda()   nvidia
# x_gpu

In [None]:
# tensor.size() -> indexing also possible

x = torch.FloatTensor(10,12,3,3)

x.size()[:]

# x.size()[:2]
# x.size()

## The contents of a tensor can be accessed and modified using Python’s indexing and slicing notation:

In [None]:
# torch.index_select(input, dim, index)

x = torch.rand(4,3)
out = torch.index_select(x,0,torch.LongTensor([0,3]))

x,out


### Pythonic Indexing

In [None]:
# pythonic indexing also works

x[:,0],x[0,:],x[0:2,0:2]

# name = 'abhishek'
# name = name[0:2]
# name

### Torch masking

In [None]:
# torch.masked_select(input, mask)

x = torch.randn(2,3)
mask = torch.ByteTensor([[0,0,1],[0,1,0]])
out = torch.masked_select(x,mask)

x, mask, out

In [None]:
# torch.cat(seq, dim=0) -> concatenate tensor along dim

x = torch.FloatTensor([[1,2,3],[4,5,6]])
y = torch.FloatTensor([[-1,-2,-3],[-4,-5,-6]])
z1 = torch.cat([x,y],dim=0)
z2 = torch.cat([x,y],dim=1)

x,y,z1,z2


## Math Functions

### Torch provides MATLAB-like functions for manipulating Tensor objects. 

`torch.add(tensor, value)`
Add the given value to all elements in the `Tensor`.


`y = torch.add(x, value)` returns a new `Tensor`.

`x:add(value)` add `value` to all elements in place.

In [None]:
# torch.add()

x1 = torch.FloatTensor([[1,2,3],[4,5,6]])
x2 = torch.FloatTensor([[1,2,3],[4,5,6]])
add = torch.add(x1,x2)

x1,x2,add,x1+x2,x1-x2


### Matrix matrix product of `mat1` and `mat2`. 
If mat1 is a n × m matrix, mat2 a m × p matrix, res must be a n × p matrix.

`torch.mm(x, y)` puts the result in a new Tensor.

`torch.mm(M, x, y)` puts the result in M.

`M:mm(x, y)` puts the result in M.

In [None]:
# torch.mm(mat1, mat2) -> matrix multiplication

x1 = torch.FloatTensor(3,4)
x2 = torch.FloatTensor(4,5)

torch.mm(x1,x2)

In [None]:
# torch.eig(a,eigenvectors=False) -> eigen_value, eigen_vector

x1 = torch.FloatTensor(4,4)

x1 = torch.eig(x1,True)
x1

## PyTorch Autograd

In [None]:
from torch.autograd import Variable

### Autograd is now a core torch package for automatic differentiation. It uses a tape based system for automatic differentiation.

In autograd, there is a Variable class, which is a very thin wrapper around a Tensor. 
You can access the raw tensor through the `.data attribute`, and after computing the backward pass, a gradient w.r.t. this variable is accumulated into `.grad attribute`.

#### We wrap our PyTorch Tensors in Variable objects; a Variable represents a node in a computational graph. If x is a Variable then `x.data` is a Tensor, and `x.grad` is another Variable holding the gradient of x with respect to some scalar value.

In [None]:
dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

Create random Tensors to hold input and outputs, and wrap them in Variables.
Setting `requires_grad=False` indicates that we do not need to compute gradients with respect to these Variables during the backward pass.


In [None]:
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

#### Create random Tensors for weights, and wrap them in Variables.
#### Setting requires_grad=True indicates that we want to compute gradients with respect to these Variables during the backward pass.

In [None]:
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

### Forward pass: 
compute predicted y using operations on Variables


### Use autograd:
to compute the backward pass. This call will compute the gradient of loss with respect to all Variables with requires_grad=True.

After this call w1.grad and w2.grad will be Variables holding the gradient of the loss with respect to w1 and w2 respectively.


### Update weights:
using gradient descent; w1.data and w2.data are Tensors, w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are Tensors.

#### Manually zero the gradients after running the backward pass

In [None]:
learning_rate = 1e-6
for t in range(500):
  y_pred = x.mm(w1).clamp(min=0).mm(w2)
  
  # Compute and print loss using operations on Variables.
  # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
  # (1,); loss.data[0] is a scalar value holding the loss.
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.data[0])


  loss.backward()

  w1.data -= learning_rate * w1.grad.data
  w2.data -= learning_rate * w2.grad.data

  # Manually zero the gradients after running the backward pass
  w1.grad.data.zero_()
  w2.grad.data.zero_()

# PyTorch NN module

The nn package defines a set of Modules, which you can think of as a neural network layer that has produces output from input and may have some trainable weights.



In [None]:
N, D_in, H, D_out = 64, 1000, 100, 10

### Create random Tensors to hold inputs and outputs, and wrap them in Variables.


In [None]:
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

### Use the nn package to define our model as a sequence of layers.
nn.Sequential is a Module which contains other Modules, and applies them in sequence to produce its output. 
Each Linear Module computes output from input using a linear function, and holds internal Variables for its weight and bias.


In [None]:
model = torch.nn.Sequential(
          torch.nn.Linear(D_in, H),
          torch.nn.ReLU(),
          torch.nn.Linear(H, D_out),
        )

### Loss Function:
The nn package also contains definitions of popular loss functions; in this case we will use Mean Squared Error (MSE) as our loss function.
We'll also initialise the learning rate


In [None]:
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4

## Optimization:
Use the optim package to define an Optimizer that will update the weights of the model for us. 

Here we will use Adam; the optim package contains many otheroptimization algoriths. 
The first argument to the Adam constructor tells the optimizer which Variables it should update.

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

### Forward pass: 
compute predicted y by passing x to the model. Module objects override the __call__ operator so you can call them like functions. 
When doing so you pass a Variable of input data to the Module and it produces a Variable of output data.

### Compute Loss

### Zero the gradients before running the backward pass.

### Backward pass: 
compute gradient of the loss with respect to all the learnable parameters of the model. Internally, the parameters of each Module are stored in Variables with requires_grad=True, so this call will compute gradients for all learnable parameters in the model.

### Update the weights using gradient descent. 

In [None]:
for t in range(500):
  y_pred = model(x)

  # Compute and print loss.
  loss = loss_fn(y_pred, y)
  print(t, loss.data[0])
  
  # Before the backward pass, use the optimizer object to zero all of the
  # gradients for the variables it will update (which are the learnable weights
  # of the model)
  optimizer.zero_grad()

  # Backward pass: compute gradient of the loss with respect to model parameters
  loss.backward()

  # Calling the step function on an Optimizer makes an update to its parameters
  optimizer.step()