# Installation of Pytorch
Pytorch binary can be installed by pip, or complied following instructions in https://github.com/pytorch/pytorch#from-source.

Here we will use pip.

## Define a Set of Computation Nodes

| &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Name  &nbsp;&nbsp;&nbsp;&nbsp;  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;    |   &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Fomula &nbsp;&nbsp;&nbsp;&nbsp;   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;    |   &nbsp;&nbsp;  Gradients  |
|:-------------:|:------------- |:----- |
| Linear      | $y=x^T\cdot W+b$ | $\frac{\partial \mathcal{L}}{\partial x}=W\cdot\frac{\partial \mathcal{L}}{\partial y}\\\frac{\partial \mathcal{L}}{\partial W}=x^T\cdot\frac{\partial \mathcal{L}}{\partial y}\\\frac{\partial \mathcal{L}}{\partial b}=\frac{\partial \mathcal{L}}{\partial y}$ |
| Sigmoid     | $y=\frac{1}{1+e^{-x}}$  | $\frac{\partial \mathcal{L}}{\partial x}=\frac{\partial \mathcal{L}}{\partial y}(1-y)y$ |
| Softmax     | $y_j=\frac{e^{x_j}}{\sum\limits_i e^{x_i}}$ | $\frac{\partial \mathcal{L}}{\partial x_j}=\frac{\partial \mathcal{L}}{\partial y_j}y_j-y_j\sum\limits_i \frac{\partial \mathcal{L}}{\partial y_i}y_i$ |
| CrossEntropy | $y=-\sum\limits_i p_i \log(x_i)$ | $\frac{\partial \mathcal{L}}{\partial x_i}=-\frac{\partial \mathcal{L}}{\partial y}\frac{p_i}{x_i}$ |
| Mean  | $y=\frac{1}{N}\sum\limits_i x_i$ | $\frac{\partial \mathcal{L}}{\partial x_i}=\frac{1}{N}\frac{\partial \mathcal{L}}{\partial y}$ |

In [2]:
try:
    import torch
except:
    !pip install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl
    import torch

# Introducing Pytorch
Pytoch is composed of three parts: Tensors, Variable and Module.

For more detailed tutorial, refer to http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html.

## Tensors
They are like numpy arrays, you can do basic operations on them, like:

In [3]:
import numpy as np
import torch

a = torch.Tensor([1,2,3])
b = torch.Tensor([3,4,5])

c = a+b # add.

a = torch.ones(4,4)
b = torch.ones(4)

c = torch.mv(a,b) # matrix multiply vector.

a = torch.randn(4,4)
b = torch.randn(4,3)

c = torch.mm(a,b) # matrix multiply matrix.

# indexing Tensor is the same as indexing numpy array.

c[:,0]

a_np = a.numpy() # convert tensor to numpy array.

d = torch.from_numpy(a_np) # convert numpy array to tensor.

## Variable
For auto differentiation, we need another data structure to store Tensor's gradient and it's computation graph information.

It contains the following parts:

1.   data: the Tensor wrapped inside Variable;
2.   grad: the gradients of this Variable;
3.   grad_fn(creator): the former point in computation graph.

>![Structure of Variable](http://pytorch.org/tutorials/_images/Variable.png)


In [4]:
from torch.autograd import Variable
# for convenience, we can import Variable from torch.autograd.

x = torch.ones(2,2)
X = Variable(x,requires_grad=True) # wrap the Tensor x into a Variable.

print(X.data) # print the Tensor inside X.

Y=X+2

Z = Y*Y*3

out = Z.mean()

print(out) # the result of above computation.

print(out.grad_fn) # print the former point of out in computation graph.

out.backward() # do backward of this computation graph.

print(X.grad) # the gradient of original X.


 1  1
 1  1
[torch.FloatTensor of size 2x2]

Variable containing:
 27
[torch.FloatTensor of size 1]

<MeanBackward1 object at 0x151507361240>
Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]



## Module
To define a neural network, we need Module. Module is the base class for all neural network modules.

The following code define a neural network same as the neural network wrote using numpy in BP tutorial

In [13]:
import torch.nn as nn
import torch.nn.functional as F

# create a neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.linear = nn.Linear(28 * 28, 10)

    def forward(self, x):
        return F.log_softmax(self.linear(x))


net = Net()
print(net)

Net(
  (linear): Linear(in_features=784, out_features=10)
)


In [14]:
from torchvision import datasets, transforms

# data loader: we split the dataset into training set and test set.
batch_size = 200
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=batch_size, shuffle=True)

In [20]:
import torch.optim as optim

def train(learning_rate=0.5, epochs=10):

    # create a stochastic gradient descent optimizer
    optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0)
    # create a loss function
    loss_func = nn.NLLLoss()

    # run the main training loop
    for epoch in range(epochs):
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = Variable(data), Variable(target)
            # resize data from (batch_size, 1, 28, 28) to (batch_size, 28*28)
            data = data.view(-1, 28 * 28)
            optimizer.zero_grad()
            net_out = net(data)
            loss = loss_func(net_out, target)
            loss.backward()
            optimizer.step()
        print('Train Epoch: {} \tLoss: {:.6f}'.format(epoch, loss.data[0]))

    # run a test loop
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        data = data.view(-1, 28 * 28)
        net_out = net(data)
        # sum up batch loss
        test_loss += loss_func(net_out, target).data[0]
        # get the index of the max log-probability
        pred = net_out.data.max(1)[1]
        correct += pred.eq(target.data).sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [21]:
train()

RuntimeError: addmm(): argument 'mat1' (position 1) must be Variable, not torch.FloatTensor