## Pytorch Tensors

A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Like numpy arrays, PyTorch Tensors do not know anything about deep learning or computational graphs or gradients; they are a generic tool for scientific computing.

In [None]:
# Import basic libraries
import torch 
import torchvision
import torch.nn as nn
import numpy as np
import torch.utils.data as data
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [None]:
a = torch.Tensor([1])
print(a)

b=torch.FloatTensor([[1, 2, 3], [4, 5, 6]])
print(b)
print(b[0][2])

c=torch.IntTensor(2, 4).zero_()
print(c)
c.fill_(8)
print(c)

d=torch.Tensor(3, 3).uniform_(0, 1)
print(d)

e=torch.Tensor(3, 3).exponential_()
print(e)

f=torch.ones(3, 3)
print(f)

g_np = np.arange(12)
g = torch.from_numpy(g_np)
print(g)

# More distributions and ways of initializing tensors here 
# http://pytorch.org/docs/master/torch.html#random-sampling
# http://pytorch.org/docs/master/torch.html#tensors


## Pytorch Variables

We wrap our PyTorch Tensors in Variable objects; a Variable represents a node in a computational graph. If x is a Variable then x.data is a Tensor, and x.grad is another Variable holding the gradient of x with respect to some scalar value. PyTorch Variables have the same API as PyTorch Tensors: (almost) any operation that you can perform on a Tensor also works on Variables; the difference is that using Variables defines a computational graph, allowing you to automatically compute gradients.

### Simple Computational Graph 

![title](comp_graph_pytorch.png)

#### Manual Gradient Descent

![title](comp_graph_pytorch2.png)

In [None]:
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w = 1.0  # a random guess: random value

# our model forward pass


def forward(x):
    return x * w


# Loss function
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)


# compute gradient
def gradient(x, y):  # d_loss/d_w
    return 2 * x * (x * w - y)

# Before training
print("predict (before training)",  4, forward(4))

# Training loop
for epoch in range(10):
    for x_val, y_val in zip(x_data, y_data):
        grad = gradient(x_val, y_val)
        w = w - 0.01 * grad
        l = loss(x_val, y_val)
    print('Loss after', epoch, 'epochs', l)
    print('Value of w after', epoch,'epochs',w)

# After training
print("predict (after training)",  "4 hours", forward(4))


#### Automated Gradient Descent using Pytorch

In [None]:
import torch
from torch.autograd import Variable

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w = Variable(torch.Tensor([1.0]),  requires_grad=True)  # Any random value

# our model forward pass


def forward(x):
    return x * w

# Loss function


def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)

# Before training
print("predict (before training)",  4, forward(4).data[0])

# Training loop
for epoch in range(10):
    for x_val, y_val in zip(x_data, y_data):
        l = loss(x_val, y_val)
        l.backward()
        #update the weight variable
        w.data = w.data - 0.01 * w.grad.data
        # Manually zero the gradients after updating weights
        w.grad.data.zero_()

    new_loss = loss(x_val,y_val)
    print('Loss after', epoch, 'epochs', new_loss.data[0])
    print('Value of w after', epoch,'epochs',w.data)
    
# After training
print("predict (after training)",  4, forward(4).data[0])

### Linear Regression Using Pytorch

![title](pytorch_rhythm.png)

In [None]:
import torch
from torch.autograd import Variable
import torch.nn as nn
"""
Data has to be in the form of a matrix with the first dimension being the sample number,
second dimension being the observation and the third dimension being the features, we don't need the third dimesion in the 
example below
"""
x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0]]))
y_data = Variable(torch.Tensor([[2.0], [4.0], [6.0]]))


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate the nn.Linear module
        First line has to be the call to the constructor
        """
        super(Model, self).__init__()
        
        """
        linear layer does take in two arguments for the instantiation which are the number of inputs
        neurons and number of output neurons as shown above
        """
        self.linear = nn.Linear(1, 1) 

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        y_pred = self.linear(x)
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(500):
        # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


# After training
hour_var = Variable(torch.Tensor([[4.0]]))
y_pred = model(hour_var)
print("predict (after training)",  4, model(hour_var).data[0][0])

### Logistic Regression using Pytorch

In [None]:
import torch
from torch.autograd import Variable
import torch.nn.functional as F
import torch.nn as nn

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0]]))
y_data = Variable(torch.Tensor([[0.], [0.], [1.], [1.]]))


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. Activation functions are applied to layers as shown below
        """
        y_pred = F.sigmoid(self.linear(x))
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(1000):
        # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# After training
hour_var = Variable(torch.Tensor([[1.0]]))
print("predict 1 hour ", 1.0, model(hour_var).data[0][0] > 0.5)
hour_var = Variable(torch.Tensor([[7.0]]))
print("predict 7 hours", 7.0, model(hour_var).data[0][0] > 0.5)


## Simple DNN using pytorch

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
import torch.nn.functional as F
from torch.autograd import Variable

### Torch NN module
Neural networks can be constructed using the torch.nn module.
Provides pretty much all neural network related functionalities such as :
Linear layers - nn.Linear, nn.Bilinear
Convolution Layers - nn.Conv1d, nn.Conv2d, nn.Conv3d, nn.ConvTranspose2d
Nonlinearities - nn.Sigmoid, nn.Tanh, nn.ReLU, nn.LeakyReLU
Pooling Layers - nn.MaxPool1d, nn.AveragePool2d
Recurrent Networks - nn.LSTM, nn.GRU
Normalization - nn.BatchNorm2d
Dropout - nn.Dropout, nn.Dropout2d
Embedding - nn.Embedding
Loss Functions - nn.MSELoss, nn.CrossEntropyLoss, nn.NLLLoss
Instances of these classes will have an __call__ function built-in that can be used to run an input through the layer.

More Details here http://pytorch.org/docs/0.3.0/nn.html?

In [None]:
# Linear Layers

x = Variable(torch.randn(32, 10))
y = Variable(torch.randn(32, 30))

sigmoid = nn.Sigmoid()

# y = Wx + b
linear = nn.Linear(in_features=10, out_features=20, bias=True)
output_linear = linear(x)
print('Linear output size : ', output_linear.size())

# y = x1*W*x2 + b 
bilinear = nn.Bilinear(in1_features=10, in2_features=30, out_features=50, bias=True)
output_bilinear = bilinear(x, y)
print('Bilinear output size : ', output_bilinear.size())

In [None]:
# Convolutional layers

x = Variable(torch.randn(10, 3, 28, 28))

conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding=1, bias=True)
bn = nn.BatchNorm2d(num_features=32)
pool = nn.MaxPool2d(kernel_size=(2, 2), stride=2)

output_conv = bn(conv(x))
outpout_pool = pool(conv(x))

print('Conv output size : ', output_conv.size())
print('Pool output size : ', outpout_pool.size())

In [None]:
# Recurrent, Embedding & Dropout Layers

inputs = [[1, 2, 3], [1, 0, 4], [1, 2, 4], [1, 4, 0], [1, 3, 3]]
x = Variable(torch.LongTensor(inputs))

embedding = nn.Embedding(num_embeddings=5, embedding_dim=20, padding_idx=1)
drop = nn.Dropout(p=0.5)
gru = nn.GRU(input_size=20, hidden_size=50, num_layers=2, batch_first=True, bidirectional=True, dropout=0.3)

emb = drop(embedding(x))
gru_h, gru_h_t = gru(emb)

print('Embedding size : ', emb.size())
print('GRU hidden states size : ', gru_h.size())
print('GRU last hidden state size : ', gru_h_t.size())

### Torch.nn.functional

Using the above classes requires defining an instance of the class and then running inputs through the instance.

The functional API provides users a way to use these classes in a functional way. Such as

import torch.nn.functional as F

Linear layers - F.linear(input=x, weight=W, bias=b)

Convolution Layers - F.conv2d(input=x, weight=W, bias=b, stride=1, padding=0, dilation=1, groups=1)

Nonlinearities - F.sigmoid(x), F.tanh(x), F.relu(x), F.softmax(x)

Dropout - F.dropout(x, p=0.5, training=True)

In [None]:
x = Variable(torch.randn(10, 3, 28, 28))
filters = Variable(torch.randn(32, 3, 3, 3))
conv_out = F.relu(F.dropout(F.conv2d(input=x, weight=filters, padding=1), p=0.5, training=True))

print('Conv output size : ', conv_out.size())

### Torch.nn.init

Provides a set of functions for standard weight initialization techniques

import torch.nn.init as init

Calculate the gain of a layer based on the activation function - init.calculate_gain('sigmoid')

Uniform init - init.uniform(tensor, low, high)

Xavier uniform - init.xavier_uniform(tensor, gain=init.calculate_gain('sigmoid'))

Xavier normal - init.xavier_normal(tensor, gain=init.calculate_gain('tanh'))

Orthogonal - init.orthogonal(tensor, gain=init.calculate_gain('tanh'))

Kaiming normal - init.kaiming_normal(tensor, mode='fan_in')

In [None]:
conv_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3), padding=1)
for k,v in conv_layer.named_parameters():
    if k == 'weight':
        init.kaiming_normal(v)

### Torch.optim

Provides implementations of standard stochastic optimization techniques

1. SGD - optim.SGD([W1, W2], lr=0.01, momentum=0.9, dampening=0, weight_decay=1e-2, nesterov=True)
2. Adam - optim.Adam([W1, W2], lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

0ptim.lr_scheduler

1. optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
2. optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=True, threshold=1e-04, threshold_mode='rel', min_lr=1e-05, eps=1e-08)

### Putting it all together

In [44]:
#create random dataset using sklearn
import torchvision.datasets as dsets
import torchvision.transforms as transforms
batch_size = 100

# MNIST Dataset 
train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),  
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

In [45]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.l1 = nn.Linear(in_features=784, out_features=200, bias=True)
        self.l2 = nn.Linear(in_features=200, out_features=100, bias=True)
        self.l3 = nn.Linear(in_features=100, out_features=10, bias=True)
        

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.softmax(self.l3(x))
        return x

net = Net()
print(net)

Net(
  (l1): Linear(in_features=784, out_features=200)
  (l2): Linear(in_features=200, out_features=100)
  (l3): Linear(in_features=100, out_features=10)
)


In [47]:
criterion = nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(net.parameters(), lr=0.01)  

In [48]:
num_epochs = 20
# Train the Model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  
        # Convert torch tensor to Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Forward + Backward + Optimize
        optimizer.zero_grad()  # zero the gradient buffer
        outputs = net(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' 
                   %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))




Epoch [1/20], Step [100/600], Loss: 1.7069
Epoch [1/20], Step [200/600], Loss: 1.6157
Epoch [1/20], Step [300/600], Loss: 1.6488
Epoch [1/20], Step [400/600], Loss: 1.5283
Epoch [1/20], Step [500/600], Loss: 1.5210
Epoch [1/20], Step [600/600], Loss: 1.5696
Epoch [2/20], Step [100/600], Loss: 1.5634
Epoch [2/20], Step [200/600], Loss: 1.5975
Epoch [2/20], Step [300/600], Loss: 1.5766
Epoch [2/20], Step [400/600], Loss: 1.5510
Epoch [2/20], Step [500/600], Loss: 1.5812
Epoch [2/20], Step [600/600], Loss: 1.5479
Epoch [3/20], Step [100/600], Loss: 1.5612
Epoch [3/20], Step [200/600], Loss: 1.5812
Epoch [3/20], Step [300/600], Loss: 1.5511
Epoch [3/20], Step [400/600], Loss: 1.5310
Epoch [3/20], Step [500/600], Loss: 1.5614
Epoch [3/20], Step [600/600], Loss: 1.6111
Epoch [4/20], Step [100/600], Loss: 1.5991
Epoch [4/20], Step [200/600], Loss: 1.6010
Epoch [4/20], Step [300/600], Loss: 1.5714
Epoch [4/20], Step [400/600], Loss: 1.6512
Epoch [4/20], Step [500/600], Loss: 1.5411
Epoch [4/20

In [56]:
# Test the Model
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))



Accuracy of the network on the 10000 test images: 70 %


In [54]:
# Save the Model
torch.save(net.state_dict(), 'model.pkl')
# Load the model
the_model = Net()
the_model.load_state_dict(torch.load('model.pkl'))

In [55]:
# Test the Model
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    outputs = the_model(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))



Accuracy of the network on the 10000 test images: 70 %
