# PyTorch Basics

In [70]:
import torch
import numpy as np
torch.manual_seed(1234)

<torch._C.Generator at 0x1433eaf0070>

## Tensors

* Scalar is a single number.
* Vector is an array of numbers.
* Matrix is a 2-D array of numbers.
* Tensors are N-D arrays of numbers.

#### Creating Tensors

You can create tensors by specifying the shape as arguments.  Here is a tensor with 5 rows and 3 columns

In [71]:
def describe(x):
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape))
    print("Values: \n{}".format(x))

In [72]:
describe(torch.Tensor(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[4.5436e+30, 1.6842e+22, 1.9284e+31],
        [4.5559e-41, 0.0000e+00, 0.0000e+00]])


In [73]:
describe(torch.randn(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])


It's common in prototyping to create a tensor with random numbers of a specific shape.

In [74]:
x = torch.rand(2, 3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.7749, 0.8208, 0.2793],
        [0.6817, 0.2837, 0.6567]])


You can also initialize tensors of ones or zeros.

In [75]:
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 0., 0.],
        [0., 0., 0.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 1., 1.],
        [1., 1., 1.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[5., 5., 5.],
        [5., 5., 5.]])


Tensors can be initialized and then filled in place. 

Note: operations that end in an underscore (`_`) are in place operations.

In [76]:
x = torch.Tensor(3,4).fill_(5)
print(x.type())
print(x.shape)
print(x)

torch.FloatTensor
torch.Size([3, 4])
tensor([[5., 5., 5., 5.],
        [5., 5., 5., 5.],
        [5., 5., 5., 5.]])


Tensors can be initialized from a list of lists

In [77]:
x = torch.Tensor([[1, 2,],  
                  [2, 4,]])
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 2.],
        [2., 4.]])


Tensors can be initialized from numpy matrices

In [78]:
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))
print(npy.dtype)

Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0.6610, 0.0203, 0.0198],
        [0.5836, 0.7290, 0.1878]], dtype=torch.float64)
float64


#### Tensor Types

The FloatTensor has been the default tensor that we have been creating all along

In [79]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [80]:
x = torch.FloatTensor([[1, 2, 3],  
                       [4, 5, 6]])
describe(x)

x = x.long()
describe(x)

x = torch.tensor([[1, 2, 3], 
                  [4, 5, 6]], dtype=torch.int64)
describe(x)

x = x.float() 
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[1., 2., 3.],
        [4., 5., 6.]])


In [81]:
x.shape

torch.Size([2, 3])

In [82]:
x.size()

torch.Size([2, 3])

In [83]:
x = torch.randn(2, 3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 1.5385, -0.9757,  1.5769],
        [ 0.3840, -0.6039, -0.5240]])


In [84]:
describe(torch.add(x, x))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 3.0771, -1.9515,  3.1539],
        [ 0.7680, -1.2077, -1.0479]])


In [85]:
describe(x + x)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[ 3.0771, -1.9515,  3.1539],
        [ 0.7680, -1.2077, -1.0479]])


In [86]:
x = torch.arange(6)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([6])
Values: 
tensor([0, 1, 2, 3, 4, 5])


In [87]:
x = x.view(2, 3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [88]:
describe(torch.sum(x, dim=0))
describe(torch.sum(x, dim=1))

Type: torch.LongTensor
Shape/size: torch.Size([3])
Values: 
tensor([3, 5, 7])
Type: torch.LongTensor
Shape/size: torch.Size([2])
Values: 
tensor([ 3, 12])


In [89]:
describe(torch.transpose(x, 0, 1))

Type: torch.LongTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[0, 3],
        [1, 4],
        [2, 5]])


In [90]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)
describe(x[:1, :2])
describe(x[0, 1])

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([1, 2])
Values: 
tensor([[0, 1]])
Type: torch.LongTensor
Shape/size: torch.Size([])
Values: 
1


In [91]:
indices = torch.LongTensor([0, 2])
describe(torch.index_select(x, dim=1, index=indices))

Type: torch.LongTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[0, 2],
        [3, 5]])


In [92]:
indices = torch.LongTensor([0, 0])
describe(torch.index_select(x, dim=0, index=indices))

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [0, 1, 2]])


In [93]:
row_indices = torch.arange(2).long()
col_indices = torch.LongTensor([0, 1])
describe(x[row_indices, col_indices])

Type: torch.LongTensor
Shape/size: torch.Size([2])
Values: 
tensor([0, 4])


In [95]:
import torch
x = torch.arange(6).view(2,3)
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])


In [96]:
describe(torch.cat([x, x], dim=0))

Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])


In [97]:
describe(torch.cat([x, x], dim=1))

Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])


In [98]:
describe(torch.stack([x, x]))

Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])


In [102]:
import torch
x1 = torch.arange(6).view(2, 3)
x1 = x1.float() 
describe(x1)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 1., 2.],
        [3., 4., 5.]])


In [103]:
x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[1., 2.],
        [1., 2.],
        [1., 2.]])


In [104]:
describe(torch.mm(x1, x2))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[ 3.,  6.],
        [12., 24.]])


In [105]:
x = torch.LongTensor([[1, 2, 3],  
                      [4, 5, 6],
                      [7, 8, 9]])
describe(x)
print(x.dtype)
print(x.numpy().dtype)

Type: torch.LongTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
torch.int64
int64


You can convert a FloatTensor to a LongTensor

In [106]:
x = torch.FloatTensor([[1, 2, 3],  
                       [4, 5, 6],
                       [7, 8, 9]])
x = x.long()
describe(x)

Type: torch.LongTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])


### Special Tensor initializations

We can create a vector of incremental numbers

In [107]:
x = torch.arange(0, 10)
print(x)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


Sometimes it's useful to have an integer-based arange for indexing

In [108]:
x = torch.arange(0, 10).long()
print(x)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


In [109]:
import torch
x = torch.ones(2, 2, requires_grad=True)
describe(x)
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
True


In [110]:
y = (x + 2) * (x + 5) + 3
describe(y)
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[21., 21.],
        [21., 21.]], grad_fn=<AddBackward0>)
True


In [111]:
z = y.mean()
describe(z)
z.backward()
print(x.grad is None)

Type: torch.FloatTensor
Shape/size: torch.Size([])
Values: 
21.0
False


## CUDA Tensors

In [112]:
import torch
print (torch.cuda.is_available())

True


In [113]:
# preferred method: device agnostic tensor instantiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (device)

cuda


In [114]:
x = torch.rand(3, 3).to(device)
describe(x)

Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.6662, 0.3343, 0.7893],
        [0.3216, 0.5247, 0.6688],
        [0.8436, 0.4265, 0.9561]], device='cuda:0')


In [115]:
y = torch.rand(3, 3)
x + y

RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float

In [116]:
cpu_device = torch.device("cpu")
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y

tensor([[0.7432, 0.7451, 0.7907],
        [0.8631, 1.1666, 0.9664],
        [1.5513, 0.8455, 1.0217]])

## Operations

Using the tensors to do linear algebra is a foundation of modern Deep Learning practices

Reshaping allows you to move the numbers in a tensor around.  One can be sure that the order is preserved.  In PyTorch, reshaping is called `view`

In [117]:
x = torch.arange(0, 20)

print(x.view(1, 20))
print(x.view(2, 10))
print(x.view(4, 5))
print(x.view(5, 4))
print(x.view(10, 2))
print(x.view(20, 1))

tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
         18, 19]])
tensor([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]])
tensor([[ 0],
        [ 1],
        [ 2],
        [ 3],
        [ 4],
        [ 5],
        [ 6],
        [ 7],
        [ 8],
        [ 9],
        [10],
        [11],
        [12],
        [13],
        [14],
        [15],
        [16],
        [17],
        [18],
        [19]])


We can use view to add size-1 dimensions, which can be useful for combining with other tensors.  This is called broadcasting. 

In [118]:
x = torch.arange(12).view(3, 4)
y = torch.arange(4).view(1, 4)
z = torch.arange(3).view(3, 1)

print(x)
print(y)
print(z)
print(x + y)
print(x + z)

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[0, 1, 2, 3]])
tensor([[0],
        [1],
        [2]])
tensor([[ 0,  2,  4,  6],
        [ 4,  6,  8, 10],
        [ 8, 10, 12, 14]])
tensor([[ 0,  1,  2,  3],
        [ 5,  6,  7,  8],
        [10, 11, 12, 13]])


Unsqueeze and squeeze will add and remove 1-dimensions.

In [119]:
x = torch.arange(12).view(3, 4)
print(x.shape)

x = x.unsqueeze(dim=1)
print(x.shape)

x = x.squeeze()
print(x.shape)

torch.Size([3, 4])
torch.Size([3, 1, 4])
torch.Size([3, 4])


all of the standard mathematics operations apply (such as `add` below)

In [120]:
x = torch.rand(3,4)
print("x: \n", x)
print("--")
print("torch.add(x, x): \n", torch.add(x, x))
print("--")
print("x+x: \n", x + x)

x: 
 tensor([[0.8839, 0.8083, 0.7528, 0.8988],
        [0.6839, 0.7658, 0.9149, 0.3993],
        [0.1100, 0.2541, 0.4333, 0.4451]])
--
torch.add(x, x): 
 tensor([[1.7677, 1.6166, 1.5056, 1.7977],
        [1.3677, 1.5317, 1.8298, 0.7985],
        [0.2201, 0.5082, 0.8665, 0.8901]])
--
x+x: 
 tensor([[1.7677, 1.6166, 1.5056, 1.7977],
        [1.3677, 1.5317, 1.8298, 0.7985],
        [0.2201, 0.5082, 0.8665, 0.8901]])


The convention of `_` indicating in-place operations continues:

In [121]:
x = torch.arange(12).reshape(3, 4)
print(x)
print(x.add_(x))

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
tensor([[ 0,  2,  4,  6],
        [ 8, 10, 12, 14],
        [16, 18, 20, 22]])


There are many operations for which reduce a dimension.  Such as sum:

In [122]:
x = torch.arange(12).reshape(3, 4)
print("x: \n", x)
print("---")
print("Summing across rows (dim=0): \n", x.sum(dim=0))
print("---")
print("Summing across columns (dim=1): \n", x.sum(dim=1))

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
Summing across rows (dim=0): 
 tensor([12, 15, 18, 21])
---
Summing across columns (dim=1): 
 tensor([ 6, 22, 38])


#### Indexing, Slicing, Joining and Mutating

In [123]:
x = torch.arange(6).view(2, 3)
print("x: \n", x)
print("---")
print("x[:2, :2]: \n", x[:2, :2])
print("---")
print("x[0][1]: \n", x[0][1])
print("---")
print("Setting [0][1] to be 8")
x[0][1] = 8
print(x)

x: 
 tensor([[0, 1, 2],
        [3, 4, 5]])
---
x[:2, :2]: 
 tensor([[0, 1],
        [3, 4]])
---
x[0][1]: 
 tensor(1)
---
Setting [0][1] to be 8
tensor([[0, 8, 2],
        [3, 4, 5]])


We can select a subset of a tensor using the `index_select`

In [124]:
x = torch.arange(9).view(3,3)
print(x)

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=0, index=indices))

print("---")
indices = torch.LongTensor([0, 2])
print(torch.index_select(x, dim=1, index=indices))

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 2],
        [3, 5],
        [6, 8]])


We can also use numpy-style advanced indexing:

In [125]:
x = torch.arange(9).view(3,3)
indices = torch.LongTensor([0, 2])

print(x[indices])
print("---")
print(x[indices, :])
print("---")
print(x[:, indices])

tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 1, 2],
        [6, 7, 8]])
---
tensor([[0, 2],
        [3, 5],
        [6, 8]])


We can combine tensors by concatenating them.  First, concatenating on the rows

In [126]:
x = torch.arange(6).view(2,3)
describe(x)
describe(torch.cat([x, x], dim=0))
describe(torch.cat([x, x], dim=1))
describe(torch.stack([x, x]))

Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([4, 3])
Values: 
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 6])
Values: 
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])
Type: torch.LongTensor
Shape/size: torch.Size([2, 2, 3])
Values: 
tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])


We can concentate along the first dimension.. the columns.

In [127]:
x = torch.arange(9).view(3,3)

print(x)
print("---")
new_x = torch.cat([x, x, x], dim=1)
print(new_x.shape)
print(new_x)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
torch.Size([3, 9])
tensor([[0, 1, 2, 0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5, 3, 4, 5],
        [6, 7, 8, 6, 7, 8, 6, 7, 8]])


We can also concatenate on a new 0th dimension to "stack" the tensors:

In [128]:
x = torch.arange(9).view(3,3)
print(x)
print("---")
new_x = torch.stack([x, x, x])
print(new_x.shape)
print(new_x)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
---
torch.Size([3, 3, 3])
tensor([[[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]],

        [[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]],

        [[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]]])


#### Linear Algebra Tensor Functions

Transposing allows you to switch the dimensions to be on different axis. So we can make it so all the rows are columsn and vice versa. 

In [129]:
x = torch.arange(0, 12).view(3,4)
print("x: \n", x) 
print("---")
print("x.tranpose(1, 0): \n", x.transpose(1, 0))

x: 
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
---
x.tranpose(1, 0): 
 tensor([[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]])


A three dimensional tensor would represent a batch of sequences, where each sequence item has a feature vector.  It is common to switch the batch and sequence dimensions so that we can more easily index the sequence in a sequence model. 

Note: Transpose will only let you swap 2 axes.  Permute (in the next cell) allows for multiple

In [130]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.transpose(1, 0).shape: \n", x.transpose(1, 0).shape)
print("x.transpose(1, 0): \n", x.transpose(1, 0))

x.shape: 
 torch.Size([3, 4, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29],
         [30, 31, 32, 33, 34],
         [35, 36, 37, 38, 39]],

        [[40, 41, 42, 43, 44],
         [45, 46, 47, 48, 49],
         [50, 51, 52, 53, 54],
         [55, 56, 57, 58, 59]]])
-----
x.transpose(1, 0).shape: 
 torch.Size([4, 3, 5])
x.transpose(1, 0): 
 tensor([[[ 0,  1,  2,  3,  4],
         [20, 21, 22, 23, 24],
         [40, 41, 42, 43, 44]],

        [[ 5,  6,  7,  8,  9],
         [25, 26, 27, 28, 29],
         [45, 46, 47, 48, 49]],

        [[10, 11, 12, 13, 14],
         [30, 31, 32, 33, 34],
         [50, 51, 52, 53, 54]],

        [[15, 16, 17, 18, 19],
         [35, 36, 37, 38, 39],
         [55, 56, 57, 58, 59]]])


Permute is a more general version of tranpose:

In [132]:
batch_size = 3
seq_size = 4
feature_size = 5

x = torch.arange(batch_size * seq_size * feature_size).view(batch_size, seq_size, feature_size)

print("x.shape: \n", x.shape)
print("x: \n", x)
print("-----")

print("x.permute(1, 0, 2).shape: \n", x.permute(1, 0, 2).shape)
print("x.permute(1, 0, 2): \n", x.permute(1, 0, 2))

x.shape: 
 torch.Size([3, 4, 5])
x: 
 tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29],
         [30, 31, 32, 33, 34],
         [35, 36, 37, 38, 39]],

        [[40, 41, 42, 43, 44],
         [45, 46, 47, 48, 49],
         [50, 51, 52, 53, 54],
         [55, 56, 57, 58, 59]]])
-----
x.permute(1, 0, 2).shape: 
 torch.Size([4, 3, 5])
x.permute(1, 0, 2): 
 tensor([[[ 0,  1,  2,  3,  4],
         [20, 21, 22, 23, 24],
         [40, 41, 42, 43, 44]],

        [[ 5,  6,  7,  8,  9],
         [25, 26, 27, 28, 29],
         [45, 46, 47, 48, 49]],

        [[10, 11, 12, 13, 14],
         [30, 31, 32, 33, 34],
         [50, 51, 52, 53, 54]],

        [[15, 16, 17, 18, 19],
         [35, 36, 37, 38, 39],
         [55, 56, 57, 58, 59]]])


Matrix multiplication is `mm`:

In [133]:
torch.randn(2, 3, requires_grad=True)

tensor([[ 0.8042, -0.1383,  0.3196],
        [-1.0187, -1.3147,  2.5228]], requires_grad=True)

In [134]:
x1 = torch.arange(6).view(2, 3).float()
describe(x1)

x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)

describe(torch.mm(x1, x2))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[0., 1., 2.],
        [3., 4., 5.]])
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values: 
tensor([[1., 2.],
        [1., 2.],
        [1., 2.]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[ 3.,  6.],
        [12., 24.]])


In [135]:
x = torch.arange(0, 12).view(3,4).float()
print(x)

x2 = torch.ones(4, 2)
x2[:, 1] += 1
print(x2)

print(x.mm(x2))

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])
tensor([[1., 2.],
        [1., 2.],
        [1., 2.],
        [1., 2.]])
tensor([[ 6., 12.],
        [22., 44.],
        [38., 76.]])


See the [PyTorch Math Operations Documentation](https://pytorch.org/docs/stable/torch.html#math-operations) for more!

## Computing Gradients

In [136]:
x = torch.tensor([[2.0, 3.0]], requires_grad=True)
z = 3 * x
print(z)

tensor([[6., 9.]], grad_fn=<MulBackward0>)


In this small snippet, you can see the gradient computations at work.  We create a tensor and multiply it by 3.  Then, we create a scalar output using `sum()`.  A Scalar output is needed as the the loss variable. Then, called backward on the loss means it computes its rate of change with respect to the inputs.  Since the scalar was created with sum, each position in z and x are independent with respect to the loss scalar. 

The rate of change of x with respect to the output is just the constant 3 that we multiplied x by.

In [137]:
x = torch.tensor([[2.0, 3.0]], requires_grad=True)
print("x: \n", x)
print("---")
z = 3 * x
print("z = 3*x: \n", z)
print("---")

loss = z.sum()
print("loss = z.sum(): \n", loss)
print("---")

loss.backward()

print("after loss.backward(), x.grad: \n", x.grad)


x: 
 tensor([[2., 3.]], requires_grad=True)
---
z = 3*x: 
 tensor([[6., 9.]], grad_fn=<MulBackward0>)
---
loss = z.sum(): 
 tensor(15., grad_fn=<SumBackward0>)
---
after loss.backward(), x.grad: 
 tensor([[3., 3.]])


### Example: Computing a conditional gradient

$$ \text{ Find the gradient of f(x) at x=1 } $$
$$ {} $$
$$ f(x)=\left\{
\begin{array}{ll}
    sin(x) \text{ if } x>0 \\
    cos(x) \text{ otherwise } \\
\end{array}
\right.$$

In [140]:
def f(x):
    if (x.data > 0).all():
        return torch.sin(x)
    else:
        return torch.cos(x)

In [141]:
x = torch.tensor([1.0], requires_grad=True)
y = f(x)
y.backward()
print(x.grad)

tensor([0.5403])


We could apply this to a larger vector too, but we need to make sure the output is a scalar:

In [142]:
x = torch.tensor([1.0, 0.5], requires_grad=True)
y = f(x)
# this is meant to break!
y.backward()
print(x.grad)

RuntimeError: grad can be implicitly created only for scalar outputs

Making the output a scalar:

In [143]:
x = torch.tensor([1.0, 0.5], requires_grad=True)
y = f(x)
y.sum().backward()
print(x.grad)

tensor([0.5403, 0.8776])


but there was an issue.. this isn't right for this edge case:

In [144]:
x = torch.tensor([1.0, -1], requires_grad=True)
y = f(x)
y.sum().backward()
print(x.grad)

tensor([-0.8415,  0.8415])


In [145]:
x = torch.tensor([-0.5, -1], requires_grad=True)
y = f(x)
y.sum().backward()
print(x.grad)

tensor([0.4794, 0.8415])


This is because we aren't doing the boolean computation and subsequent application of cos and sin on an elementwise basis.  So, to solve this, it is common to use masking:

In [146]:
def f2(x):
    mask = torch.gt(x, 0).float()
    return mask * torch.sin(x) + (1 - mask) * torch.cos(x)

x = torch.tensor([1.0, -1], requires_grad=True)
y = f2(x)
y.sum().backward()
print(x.grad)

tensor([0.5403, 0.8415])


In [147]:
def describe_grad(x):
    if x.grad is None:
        print("No gradient information")
    else:
        print("Gradient: \n{}".format(x.grad))
        print("Gradient Function: {}".format(x.grad_fn))

In [148]:
import torch
x = torch.ones(2, 2, requires_grad=True)
describe(x)
describe_grad(x)
print("--------")

y = (x + 2) * (x + 5) + 3
describe(y)
z = y.mean()
describe(z)
describe_grad(x)
print("--------")
z.backward(create_graph=True, retain_graph=True)
describe_grad(x)
print("--------")


Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
No gradient information
--------
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values: 
tensor([[21., 21.],
        [21., 21.]], grad_fn=<AddBackward0>)
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values: 
21.0
No gradient information
--------
Gradient: 
tensor([[2.2500, 2.2500],
        [2.2500, 2.2500]], grad_fn=<CloneBackward>)
Gradient Function: None
--------


In [149]:
x = torch.ones(2, 2, requires_grad=True)

In [150]:
y = x + 2

In [151]:
y.grad_fn

<AddBackward0 at 0x14340e68048>

### CUDA Tensors

PyTorch's operations can seamlessly be used on the GPU or on the CPU.  There are a couple basic operations for interacting in this way.

In [152]:
print(torch.cuda.is_available())

True


In [153]:
x = torch.rand(3,3)
describe(x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.0108, 0.9455, 0.7661],
        [0.2634, 0.1880, 0.5174],
        [0.7849, 0.1412, 0.3112]])


In [154]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [155]:
x = torch.rand(3, 3).to(device)
describe(x)
print(x.device)

Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values: 
tensor([[0.7091, 0.1775, 0.4443],
        [0.1230, 0.9638, 0.7695],
        [0.0378, 0.2239, 0.6772]], device='cuda:0')
cuda:0


In [156]:
cpu_device = torch.device("cpu")

In [157]:
# this will break!
y = torch.rand(3, 3)
x + y

RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float

In [158]:
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y

tensor([[1.2365, 0.8100, 0.5353],
        [0.3552, 1.6906, 0.8883],
        [0.4330, 0.9438, 1.4367]])

In [159]:
if torch.cuda.is_available(): # only is GPU is available
    a = torch.rand(3,3).to(device='cuda:0') #  CUDA Tensor
    print(a)
    
    b = torch.rand(3,3).cuda()
    print(b)

    print(a + b)

    a = a.cpu() # Error expected
    print(a + b)

tensor([[0.5311, 0.6449, 0.7224],
        [0.4416, 0.3634, 0.8818],
        [0.9874, 0.7316, 0.2814]], device='cuda:0')
tensor([[0.0651, 0.0065, 0.5035],
        [0.3082, 0.3742, 0.4297],
        [0.9729, 0.9739, 0.4533]], device='cuda:0')
tensor([[0.5962, 0.6514, 1.2259],
        [0.7497, 0.7376, 1.3115],
        [1.9603, 1.7055, 0.7347]], device='cuda:0')


RuntimeError: expected backend CPU and dtype Float but got backend CUDA and dtype Float

### Exercises

Some of these exercises require operations not covered in the notebook.  You will have to look at [the documentation](https://pytorch.org/docs/) (on purpose!)


(Answers are at the bottom)

#### Exercise 1

Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.

In [198]:
a = torch.rand(3,3)
a = a.unsqueeze(0)
print(a)
print(a.shape)

tensor([[[0.5096, 0.5060, 0.9572],
         [0.6021, 0.1343, 0.7866],
         [0.7866, 0.7021, 0.2957]]])
torch.Size([1, 3, 3])


#### Exercise 2

Remove the extra dimension you just added to the previous tensor.

In [199]:
a = a.squeeze(0)
print(a.shape)

torch.Size([3, 3])


#### Exercise 3

Create a random tensor of shape 5x3 in the interval [3, 7)

In [170]:
t = 3 + (torch.rand(5, 3) * 4) 
describe(t)

Type: torch.FloatTensor
Shape/size: torch.Size([5, 3])
Values: 
tensor([[5.2460, 6.5312, 3.2476],
        [6.2847, 4.3989, 4.5931],
        [5.9517, 3.3385, 4.6979],
        [6.9112, 5.7198, 4.2605],
        [4.5644, 6.5772, 5.7556]])


#### Exercise 4

Create a tensor with values from a normal distribution (mean=0, std=1).

In [174]:
# a.normal_(mean=0, std=1)
describe(torch.randn(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
tensor([[-0.7443, -0.7085,  0.1822],
        [ 2.4659,  2.5291,  0.0924]])


#### Exercise 5

Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

In [180]:
x = torch.Tensor([1, 1, 1, 0, 1])
print ((x != 0).nonzero())
# torch.nonzero(a)

tensor([[0],
        [1],
        [2],
        [4]])


#### Exercise 6

Create a random tensor of size (3,1) and then horizonally stack 4 copies together.

In [205]:
x = torch.rand(3, 1)
new_x = new_x = torch.cat([x, x, x, x], dim=1)
# a.expand(3,4)
describe(new_x)

Type: torch.FloatTensor
Shape/size: torch.Size([3, 4])
Values: 
tensor([[0.7238, 0.7238, 0.7238, 0.7238],
        [0.5787, 0.5787, 0.5787, 0.5787],
        [0.4874, 0.4874, 0.4874, 0.4874]])


#### Exercise 7

Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

In [185]:
a=torch.rand(3,4,5)
b=torch.rand(3,5,4)
describe(torch.bmm(a, b))

Type: torch.FloatTensor
Shape/size: torch.Size([3, 4, 4])
Values: 
tensor([[[1.6701, 1.3213, 1.4777, 0.5422],
         [1.3662, 1.3447, 0.9684, 0.3262],
         [2.0981, 1.6573, 1.6182, 1.1433],
         [1.3271, 1.1003, 1.1736, 0.3078]],

        [[0.7305, 0.7932, 0.7121, 0.3113],
         [1.5795, 1.2468, 0.9343, 0.9783],
         [1.2539, 0.7933, 1.0025, 1.1728],
         [2.0705, 1.6689, 1.4496, 1.3343]],

        [[1.5310, 0.9070, 1.3645, 1.2746],
         [1.3904, 0.7273, 1.2383, 0.7936],
         [1.9549, 1.1853, 1.6287, 1.9005],
         [1.4171, 0.7856, 1.2047, 1.4312]]])


#### Exercise 8

Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

In [208]:
a=torch.rand(3,4,5)
b=torch.rand(5,4)
bu = b.unsqueeze(0)
buu = torch.cat([bu, bu, bu], dim=0)
describe(torch.bmm(a, buu))

Type: torch.FloatTensor
Shape/size: torch.Size([3, 4, 4])
Values: 
tensor([[[0.9565, 1.3249, 1.0157, 0.8182],
         [1.0571, 1.4061, 1.4783, 1.3961],
         [0.8773, 2.1693, 1.7573, 1.7111],
         [1.1071, 1.6276, 1.6906, 1.5201]],

        [[0.6456, 1.4153, 1.3523, 1.2681],
         [0.8894, 1.7736, 1.6288, 1.6398],
         [0.7439, 1.6391, 1.5266, 1.3131],
         [0.9506, 1.7963, 1.3078, 1.2344]],

        [[0.7634, 1.6585, 1.4265, 1.3849],
         [0.7523, 1.4100, 1.4725, 1.2839],
         [0.5718, 1.1563, 1.1185, 0.9951],
         [0.7828, 2.0229, 1.9557, 1.7650]]])


Answers below

Answers still below.. Keep Going

#### Exercise 1

Create a 2D tensor and then add a dimension of size 1 inserted at the 0th axis.

In [196]:
a = torch.rand(3,3)
a = a.unsqueeze(0)
print(a)
print(a.shape)

tensor([[[0.4402, 0.5700, 0.9997],
         [0.1437, 0.3332, 0.1013],
         [0.6606, 0.3848, 0.0086]]])
torch.Size([1, 3, 3])


#### Exercise 2 

Remove the extra dimension you just added to the previous tensor.

In [200]:
a = a.squeeze(0)
print(a.shape)

torch.Size([3, 3])


#### Exercise 3

Create a random tensor of shape 5x3 in the interval [3, 7)

In [201]:
3 + torch.rand(5, 3) * 4

tensor([[6.1853, 4.3406, 5.0945],
        [5.6850, 4.1621, 6.7670],
        [6.7921, 5.1479, 4.6956],
        [3.9384, 6.3653, 4.2260],
        [6.7328, 5.3279, 6.5731]])

#### Exercise 4

Create a tensor with values from a normal distribution (mean=0, std=1).

In [202]:
a = torch.rand(3,3)
a.normal_(mean=0, std=1)

tensor([[-1.1659,  0.1005, -1.0513],
        [ 0.4468,  0.3073,  0.9953],
        [-1.3048,  0.5276, -1.5614]])

#### Exercise 5

Retrieve the indexes of all the non zero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

In [203]:
a = torch.Tensor([1, 1, 1, 0, 1])
torch.nonzero(a)

tensor([[0],
        [1],
        [2],
        [4]])

#### Exercise 6

Create a random tensor of size (3,1) and then horizonally stack 4 copies together.

In [204]:
a = torch.rand(3,1)
a.expand(3,4)

tensor([[0.0134, 0.0134, 0.0134, 0.0134],
        [0.4205, 0.4205, 0.4205, 0.4205],
        [0.7033, 0.7033, 0.7033, 0.7033]])

#### Exercise 7

Return the batch matrix-matrix product of two 3 dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

In [206]:
a = torch.rand(3,4,5)
b = torch.rand(3,5,4)
torch.bmm(a, b)

tensor([[[0.9441, 2.1102, 2.0876, 2.5976],
         [0.8242, 1.9864, 1.9303, 2.4765],
         [0.6655, 1.0732, 1.0401, 1.0798],
         [0.9575, 1.9989, 1.9697, 2.3769]],

        [[2.6008, 1.5117, 1.5789, 1.4201],
         [1.8111, 1.1379, 1.0761, 0.8441],
         [2.1213, 1.1320, 1.0274, 0.7076],
         [1.4239, 0.9891, 1.1913, 1.1903]],

        [[1.2708, 1.6769, 1.0298, 1.4778],
         [0.9771, 2.1704, 1.3339, 2.0092],
         [1.8078, 2.1488, 1.2304, 1.9096],
         [1.5301, 2.2813, 1.3152, 2.0203]]])

#### Exercise 8

Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

In [210]:
a = torch.rand(3,4,5)
b = torch.rand(5,4)
describe(torch.bmm(a, b.unsqueeze(0).expand(a.size(0), *b.size())))

Type: torch.FloatTensor
Shape/size: torch.Size([3, 4, 4])
Values: 
tensor([[[0.7531, 1.3374, 0.9882, 0.8176],
         [0.5111, 0.7757, 0.8252, 0.7467],
         [1.0234, 1.1478, 1.4016, 1.4146],
         [0.9277, 1.3949, 1.0704, 0.9856]],

        [[0.6225, 1.2794, 0.7085, 0.6017],
         [1.3344, 1.8273, 1.4123, 1.3828],
         [0.9642, 1.4989, 1.1724, 1.0908],
         [1.1031, 1.7326, 1.2122, 1.1306]],

        [[0.4625, 0.9734, 0.5780, 0.4319],
         [0.8182, 1.3433, 1.2207, 1.0346],
         [0.7643, 1.1014, 0.5115, 0.5352],
         [1.2984, 1.7011, 1.5805, 1.5696]]])


### END