<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Intro-to-Pytorch" data-toc-modified-id="Intro-to-Pytorch-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Intro to Pytorch</a></span><ul class="toc-item"><li><span><a href="#Pytorch-tensors" data-toc-modified-id="Pytorch-tensors-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Pytorch tensors</a></span></li><li><span><a href="#Pytorch-Autograd" data-toc-modified-id="Pytorch-Autograd-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Pytorch Autograd</a></span></li><li><span><a href="#torch.nn-module" data-toc-modified-id="torch.nn-module-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>torch.nn module

In [1]:
# import pytorch libraries
%matplotlib inline
import torch 
import torch.autograd as autograd 
import torch.nn as nn 
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

# Intro to Pytorch

PyTorch consists of 4 main packages:
* torch: a general purpose array library similar to Numpy that can do computations on GPU
* torch.autograd: a package for automatically obtaining gradients
* torch.nn: a neural net library with common layers and cost functions
* torch.optim: an optimization package with common optimization algorithms like SGD, Adam, etc

## Pytorch tensors
Like Numpy tensors but can utilize GPUs to accelerate its numerical computations. 

### Creation

In [2]:
# Initializing a tensor
data = torch.tensor([
                     [0, 1],    
                     [2, 3],
                     [4, 5]
                    ])
data

tensor([[0, 1],
        [2, 3],
        [4, 5]])

In [3]:
# Initializing a tensor as data type = torch.float32 / torch.int
data = torch.tensor([
                     [0 , 1],    
                     [2, 3],
                     [4, 5]
                    ],dtype=torch.float32)
data

tensor([[0., 1.],
        [2., 3.],
        [4., 5.]])

In [4]:
# Create random tensor
N = 3
x = torch.randn(N, 5).type(torch.FloatTensor)
x

tensor([[-0.0178, -0.7224,  0.0757, -0.8119, -1.3555],
        [ 0.5471,  0.3311,  1.4192,  0.0902,  0.4487],
        [-2.4475,  1.2038, -0.9069, -0.2007,  1.8423]])

In [5]:
# 3x3x3 Tensor
s = [3,3,3]
b = torch.randn(s)
b

tensor([[[ 0.0606, -1.3085,  1.4893],
         [ 2.0778,  0.1684,  0.5583],
         [ 0.7862,  0.5500, -0.4482]],

        [[ 0.2579,  1.0363,  0.3196],
         [ 1.1917,  0.7270,  0.7504],
         [-0.4242,  0.0099,  0.6363]],

        [[-0.0620, -1.2823,  0.9859],
         [-0.3995,  0.5984,  0.9968],
         [ 1.3148,  0.2202, -0.6200]]])

In [6]:
# tensor from a list
l = [[1,2],[3,4]]
l
tensor = torch.tensor(l)
tensor

tensor([[1, 2],
        [3, 4]])

In [7]:
# a tensor of all zeros / ones
zeros = torch.zeros(2, 5).type(torch.LongTensor)  
zeros

tensor([[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]])

In [8]:
# range from [1,10]
rr = torch.arange(1,10)
rr

tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
# numpy.ndarray --> torch.Tensor:
arr = np.array([[1, 0, 5]])
data = torch.tensor(arr)
print("This is a torch.tensor", data)

# torch.Tensor --> numpy.ndarray:
new_arr = data.numpy()
print("This is a np.ndarray", new_arr)

This is a torch.tensor tensor([[1, 0, 5]])
This is a np.ndarray [[1 0 5]]


### Tensor attributes

In [10]:
print('Tensor shape is:',tensor.shape)
print('Tensor type is:',tensor.dtype)
print('Tensor device is:',tensor.device)

Tensor shape is: torch.Size([2, 2])
Tensor type is: torch.int64
Tensor device is: cpu


### Reshape

In [11]:
rr = torch.arange(1, 13)
print("The shape is:", rr.shape)
print("The contents are:", rr)
print()
rr = rr.view(4, 3)
print("After reshaping, the shape is:", rr.shape)
print(rr)

The shape is: torch.Size([12])
The contents are: tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

After reshaping, the shape is: torch.Size([4, 3])
tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])


In [12]:
# reshaping of tensors using .view()
rr.view(1,-1) #-1 makes torch infer the second dim 

tensor([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

### Operations

In [13]:
xx = torch.tensor([[1, 2], [2, 3], [4, 5]])      # (3, 2)
yy = torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])  # (2, 4) then xx * yy = (3, 4)

print("xx is", xx)
print("yy is", yy)
print("The product is", xx.matmul(yy))
print("The other product is", xx @ yy) # +, -, *, @

xx is tensor([[1, 2],
        [2, 3],
        [4, 5]])
yy is tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])
The product is tensor([[11, 14, 17, 20],
        [17, 22, 27, 32],
        [29, 38, 47, 56]])
The other product is tensor([[11, 14, 17, 20],
        [17, 22, 27, 32],
        [29, 38, 47, 56]])


In [14]:
data = torch.arange(1, 26, dtype=torch.float32).reshape(5,5)
print(data)

# We can perform operations like *sum* over each row...
print("Taking the sum over columns:")
print(data.sum(dim=0))

# or over each column.
print("Taking thep sum over rows:")
print(data.sum(dim=1))

# Other operations are available:
print("Taking the stdev over rows:")
print(data.std(dim=1))

tensor([[ 1.,  2.,  3.,  4.,  5.],
        [ 6.,  7.,  8.,  9., 10.],
        [11., 12., 13., 14., 15.],
        [16., 17., 18., 19., 20.],
        [21., 22., 23., 24., 25.]])
Taking the sum over columns:
tensor([55., 60., 65., 70., 75.])
Taking thep sum over rows:
tensor([ 15.,  40.,  65.,  90., 115.])
Taking the stdev over rows:
tensor([1.5811, 1.5811, 1.5811, 1.5811, 1.5811])


## Quiz
Write code that creates a `torch.tensor` with the following contents:
$\begin{bmatrix} 1 & 2.2 & 9.6 \\ 4 & -7.2 & 6.3 \end{bmatrix}$

Now compute the average of each row (`.mean()`) and each column.

What's the shape of the results?

In [15]:
## Be careful with dimension
test = data.sum(dim=1)
print(test)
print(test.shape)
print(test.unsqueeze(dim=0))
print(test.unsqueeze(dim=0).shape)


tensor([ 15.,  40.,  65.,  90., 115.])
torch.Size([5])
tensor([[ 15.,  40.,  65.,  90., 115.]])
torch.Size([1, 5])


## Indexing

In [16]:
x = torch.arange(1,13).view(3,2,2)
print(x.shape)
print(x)

torch.Size([3, 2, 2])
tensor([[[ 1,  2],
         [ 3,  4]],

        [[ 5,  6],
         [ 7,  8]],

        [[ 9, 10],
         [11, 12]]])


In [17]:
x[0]      # first row equivalent to x[0,:]
x[:,0]    # for every row (dim = 0), print first row
x[0,1,1]  # 0 (first matrix), 1 (second row of first matrix), 1 (element 2)
x[:,0,0]  # top left element of each row->(array)

# Let's access the 0th and 1st elements, each twice (list of arrays)
i = torch.tensor([0, 0, 1, 1])
print(x[i])

# Let's access the 0th elements of the 1st and 2nd elements
i = torch.tensor([1, 2])
j = torch.tensor([0])
x[i, j]

tensor([[[1, 2],
         [3, 4]],

        [[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]],

        [[5, 6],
         [7, 8]]])


tensor([[ 5,  6],
        [ 9, 10]])

In [18]:
matr = torch.arange(1, 16).view(5, 3)
print(matr)
print(matr[0])         # elements from first row (dim = 0) eq to x[0,:]
print(matr[:,0])       # elements from first column (dim = 0)
print(matr[0:3])       # rows from 0 to 3
print(matr[:,0:2])     # for every row print columns 0 and 1
print(matr[0:3,0:2])   # for rows 0,1,2 print columns 0 and 1
print(matr[0][2])      # element row=0 , column=2
print(matr[[0, 2, 4]]) # rows 0,2,4 [list[0,2,4]] mask!!


tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12],
        [13, 14, 15]])
tensor([1, 2, 3])
tensor([ 1,  4,  7, 10, 13])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[ 1,  2],
        [ 4,  5],
        [ 7,  8],
        [10, 11],
        [13, 14]])
tensor([[1, 2],
        [4, 5],
        [7, 8]])
tensor(3)
tensor([[ 1,  2,  3],
        [ 7,  8,  9],
        [13, 14, 15]])


## Pytorch Autograd
The autograd package in PyTorch provides classes and functions implementing automatic differentiation of arbitrary scalar valued function. For example, the gradient of the error with respect to all parameters.

In order for this to happen we need to declare our parameters as Tensors with the requires_grad=True keyword.
We can call the backward() method to ask PyTorch to calculate the gradients, which are then stored in the grad attribute.
Here is an example:

In [19]:
x = torch.tensor([2.], requires_grad=True)
x

tensor([2.], requires_grad=True)

In [20]:
print(x.grad)

None


In [21]:
# Calculating the gradient of y with respect to x

y = x * x * 3     # 3x^2
y.backward()  # computes the grad of y with respect to x
print(x.grad) # d(y)/d(x) = d(3x^2)/d(x) = 6x = 12

tensor([12.])


In [22]:
# backprop again

z = x * x * 3 # 3x^2
z.backward()
print(x.grad)    # After each training iteration (aka. coeficient update) Run "zero_grad()" to avoid gradient accumulation 

tensor([24.])


## torch.nn module
A neural net library with common layers and cost functions

In [23]:
# linear transformation of a Nx5 matrix into a Nx3 matrix, where N can be anything 
# (number of observations)
torch.manual_seed(42)

D = 5 # number of input featutes
M = 3 # neurons in the first hidden layer
linear_map = nn.Linear(D, M)

# parameters are initialized randomly
[p for p in linear_map.parameters()]

[Parameter containing:
 tensor([[ 0.3419,  0.3712, -0.1048,  0.4108, -0.0980],
         [ 0.0902, -0.2177,  0.2626,  0.3942, -0.3281],
         [ 0.3887,  0.0837,  0.3304,  0.0606,  0.2156]], requires_grad=True),
 Parameter containing:
 tensor([-0.0631,  0.3448,  0.0661], requires_grad=True)]

In [24]:
## Define a block

model_seq = torch.nn.Sequential(
    torch.nn.Linear(2, 2),
    torch.nn.Sigmoid(),
    torch.nn.Linear(2, 1)
)
print(model_seq)
model_seq.state_dict()

Sequential(
  (0): Linear(in_features=2, out_features=2, bias=True)
  (1): Sigmoid()
  (2): Linear(in_features=2, out_features=1, bias=True)
)


OrderedDict([('0.weight',
              tensor([[-0.3301,  0.1802],
                      [-0.3258, -0.0829]])),
             ('0.bias', tensor([-0.2872,  0.4691])),
             ('2.weight', tensor([[-0.5582, -0.3260]])),
             ('2.bias', tensor([-0.1997]))])

In [25]:
class MyNetwork(nn.Module):    
    def __init__(self):
        """
        Input:
        - n_features
        - hiden layer
        - output layer
        """
        super(MyNetwork, self).__init__()
        self.hidden = nn.Linear(3,2)
        self.output = nn.Linear(2,1)
        
    def forward(self,x):
        x = F.relu(self.hidden(x))
        x = self.output(x)
        return x

In [26]:
model = MyNetwork()
model
model.state_dict()

OrderedDict([('hidden.weight',
              tensor([[-0.3471,  0.0545, -0.5702],
                      [ 0.5214, -0.4904,  0.4457]])),
             ('hidden.bias', tensor([ 0.0961, -0.1875])),
             ('output.weight', tensor([[0.4370, 0.1102]])),
             ('output.bias', tensor([0.5713]))])

In [27]:
a = model.hidden.weight.data
a

tensor([[-0.3471,  0.0545, -0.5702],
        [ 0.5214, -0.4904,  0.4457]])

In [28]:
parameters = list(model.parameters())
parameters 

[Parameter containing:
 tensor([[-0.3471,  0.0545, -0.5702],
         [ 0.5214, -0.4904,  0.4457]], requires_grad=True),
 Parameter containing:
 tensor([ 0.0961, -0.1875], requires_grad=True),
 Parameter containing:
 tensor([[0.4370, 0.1102]], requires_grad=True),
 Parameter containing:
 tensor([0.5713], requires_grad=True)]

In [29]:
for param in model.hidden.parameters():
    param.requires_grad = False
    print(param.requires_grad)
    
parameters = list(model.parameters())
parameters     

False
False


[Parameter containing:
 tensor([[-0.3471,  0.0545, -0.5702],
         [ 0.5214, -0.4904,  0.4457]]),
 Parameter containing:
 tensor([ 0.0961, -0.1875]),
 Parameter containing:
 tensor([[0.4370, 0.1102]], requires_grad=True),
 Parameter containing:
 tensor([0.5713], requires_grad=True)]

## Loss

In [30]:
loss = nn.MSELoss(reduction='mean')

## Optimizer

In [31]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2)
optim.step() #gradient descent. The optimizer adjusts each parameter by its gradient stored in .grad.
             # weight = weight - learning_rate * gradient

## Example

In [32]:
# Data 1 training example (X) 3 features, 1 label (Y)
x = torch.rand(1,3)
print(x)

y = torch.ones(1,1)
print(y)

loss_func = nn.MSELoss(reduction='mean')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)

for t in range(100):
    # Forward pass: compute predicted y using operations on Variables
    y_pred = model(x)
    loss = loss_func(y_pred, y)
    if t % 10 == 0: print("loss={:.4f}".format(loss.item()), 'y_pred={:.2f}'.format(y_pred.item()))
       
    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables
    optimizer.zero_grad()
    loss.backward()
    
    # Calling the step function on an Optimizer makes an update to its parameters
    optimizer.step()

tensor([[0.5547, 0.3423, 0.6343]])
tensor([[1.]])
loss=0.1639 y_pred=0.60
loss=0.1073 y_pred=0.67
loss=0.0703 y_pred=0.73
loss=0.0460 y_pred=0.79
loss=0.0302 y_pred=0.83
loss=0.0197 y_pred=0.86
loss=0.0129 y_pred=0.89
loss=0.0085 y_pred=0.91
loss=0.0055 y_pred=0.93
loss=0.0036 y_pred=0.94


# References
* https://pytorch.org/docs/stable/index.html
* http://pytorch.org/tutorials/beginner/pytorch_with_examples.html
* https://hsaghir.github.io/data_science/pytorch_starter/