# Lab 1: Getting started with PyTorch

The first lab to introduce PyTorch basics including `Tensor`, `Loss`, and `Adagrad`. 

In [8]:
import os
import torch
import numpy as np
import random
import matplotlib.pyplot as plt

In [9]:
# set seed
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)

## 1. Tensors

Tensors are a specialized data structure that are very similar to arrays and matrices. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators. Tensors are also optimized for automatic differentiation (we’ll see more about that later in the Autograd section). 

### 1.1 Initializing a Tensor

Tensors can be initialized in various ways. Take a look at the following examples:

**Directly from data**

Tensors can be created directly from data. The data type is automatically inferred.

In [10]:
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
x_data

tensor([[1, 2],
        [3, 4]])

**From a NumPy array**

Tensors can be created from NumPy arrays (and vice versa).

In [11]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
x_np

tensor([[1, 2],
        [3, 4]])

**From another tensor:**

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.

In [12]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: 
 tensor([[1, 1],
        [1, 1]]) 

Random Tensor: 
 tensor([[0.8823, 0.9150],
        [0.3829, 0.9593]]) 



**With random or constant values:**

``shape`` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor.

In [13]:
shape = (2, 3)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zero_tensor = torch.zeros(shape)

print(f'Random tensor: {rand_tensor}')
print(f'Ones tensor: {ones_tensor}')
print(f'Zeros tensor: {zero_tensor}')

Random tensor: tensor([[0.3904, 0.6009, 0.2566],
        [0.7936, 0.9408, 0.1332]])
Ones tensor: tensor([[1., 1., 1.],
        [1., 1., 1.]])
Zeros tensor: tensor([[0., 0., 0.],
        [0., 0., 0.]])


### 1.2 Attributes of a Tensor

Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [14]:
tensor = torch.rand(3, 4)

print('Tensor shape:', tensor.shape)
print('Datatype:', tensor.dtype)
print('Device tensor is stored on:', tensor.device)

Tensor shape: torch.Size([3, 4])
Datatype: torch.float32
Device tensor is stored on: cpu


### 1.3 Operations on Tensors

Over 100 tensor operations, including arithmetic, linear algebra, matrix manipulation (transposing, 
indexing, slicing), sampling and more are
comprehensively described [here](https://pytorch.org/docs/stable/torch.html).

In [15]:
tensor = torch.arange(12).reshape(3, 4).float()

print(tensor)
print('First row:', tensor[0])
print('First column:', tensor[:, 0])
print('Last column:', tensor[:, -1])

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])
First row: tensor([0., 1., 2., 3.])
First column: tensor([0., 4., 8.])
Last column: tensor([ 3.,  7., 11.])


**Joining tensors** 

You can use ``torch.cat`` to concatenate a sequence of tensors along a given dimension.
See also [`torch.stack`](https://pytorch.org/docs/stable/generated/torch.stack.html),
another tensor joining op that is subtly different from ``torch.cat``.


In [16]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
t1

tensor([[ 0.,  1.,  2.,  3.,  0.,  1.,  2.,  3.,  0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.,  4.,  5.,  6.,  7.,  4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.,  8.,  9., 10., 11.,  8.,  9., 10., 11.]])

In [17]:
t1 = torch.cat([tensor, tensor, tensor], dim=0)
t1

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

#### Arithmetic operators

In [19]:
# This computes the matrix multiplication between two tensors. y1, y2, y3 will have the same value
y1 = tensor @ tensor.T
print(y1)

y2 = tensor.matmul(tensor.T)
print(y2)

y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)
y3

tensor([[ 14.,  38.,  62.],
        [ 38., 126., 214.],
        [ 62., 214., 366.]])
tensor([[ 14.,  38.,  62.],
        [ 38., 126., 214.],
        [ 62., 214., 366.]])


  torch.matmul(tensor, tensor.T, out=y3)


tensor([[ 14.,  38.,  62.],
        [ 38., 126., 214.],
        [ 62., 214., 366.]])

In [20]:
# This computes the element-wise product. z1, z2, z3 will have the same value
z1 = tensor * tensor
print(z1)
z2 = tensor.mul(tensor)
print(z2)
z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
print(z3)

tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.]])
tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.]])
tensor([[  0.,   1.,   4.,   9.],
        [ 16.,  25.,  36.,  49.],
        [ 64.,  81., 100., 121.]])


**Single-element tensors** 

If you have a one-element tensor, for example by aggregating all
values of a tensor into one value, you can convert it to a Python
numerical value using ``item()``:

In [21]:
agg = tensor.sum()
agg_item = agg.item()  
print(agg_item, type(agg_item))

66.0 <class 'float'>


### 1.4 GPU Acceleration

If we have NVIDIA GPU(s), we can accelerate computation once we move Tensors onto GPU.
Let's compare how much GPU can accelerate especially matrix operations.
We will do a matrix-matrix multiplication between two 5k-by-5k matrices on both CPU and GPU.

Unfortunately, Coursera does not have a GPU environment. But feel free to try the following snippets on a GPU machine. Ideally, with GPU acceleration, matrix multiplication will be much faster.

In [22]:
mat = torch.rand(5000, 5000)
mat

tensor([[0.7886, 0.5895, 0.7539,  ..., 0.9313, 0.6453, 0.9844],
        [0.8312, 0.8682, 0.9359,  ..., 0.4046, 0.9987, 0.8608],
        [0.1268, 0.2253, 0.1223,  ..., 0.3939, 0.4493, 0.5327],
        ...,
        [0.5851, 0.5824, 0.8857,  ..., 0.3165, 0.4845, 0.3896],
        [0.3348, 0.1535, 0.5840,  ..., 0.0285, 0.7444, 0.5193],
        [0.4027, 0.7190, 0.3847,  ..., 0.9606, 0.6629, 0.0359]])

In [23]:
%%time
torch.mm(mat.t(), mat)

CPU times: user 4.86 s, sys: 46.1 ms, total: 4.9 s
Wall time: 821 ms


tensor([[1697.6150, 1260.7886, 1260.5798,  ..., 1276.7966, 1265.4301,
         1273.4854],
        [1260.7886, 1655.7583, 1247.6844,  ..., 1256.2369, 1263.6820,
         1263.9419],
        [1260.5798, 1247.6844, 1658.2700,  ..., 1258.1453, 1267.2045,
         1274.8801],
        ...,
        [1276.7966, 1256.2367, 1258.1453,  ..., 1704.4204, 1270.7224,
         1275.9523],
        [1265.4302, 1263.6820, 1267.2046,  ..., 1270.7224, 1686.0026,
         1272.0712],
        [1273.4852, 1263.9419, 1274.8801,  ..., 1275.9523, 1272.0712,
         1699.9994]])

In [24]:
%%time
if torch.cuda.is_available():
    mat = mat.cuda()
    torch.mm(mat.t(), mat)
else:
    print('GPU is not available!')

GPU is not available!
CPU times: user 1.75 ms, sys: 189 µs, total: 1.94 ms
Wall time: 2.95 ms


### Exercise 1 [10 points]

Implement the Sigmoid function on your own.

$$\sigma(x) = \frac{1}{1 + \exp(-x)}$$

Note that you should not use existing PyTorch implementation.

Hint: try `torch.exp()`.

In [25]:
def sigmoid(x):
    return torch.div(1, 1 + torch.exp(-x))


In [26]:
'''
AUTOGRADER CELL. DO NOT MODIFY THIS.
'''

assert torch.allclose(sigmoid(torch.tensor([1.2])), torch.tensor([0.7685]), rtol=1e-2)
assert torch.allclose(sigmoid(torch.tensor([0, 1.5])), torch.tensor([0.5000, 0.8176]), rtol=1e-2)

### Exercise 2 [10 points]

Implement a Softmax function on your own.

$$\mathrm{softmax}(\mathbf{X})_{ij} = \frac{\exp(\mathbf{X}_{ij})}{\sum_k \exp(\mathbf{X}_{ik})}$$

Note that you should not use existing PyTorch implementation.

Hint: try `torch.exp()` and `torch.sum()`.

In [40]:
def softmax(x):
    return torch.exp(x) / torch.sum(torch.exp(x), dim=1, keepdims=True)

In [41]:
X = torch.tensor([[0.2288, 0.4111, 0.0385], [0.6233, 0.0364, 0.1999]])

softmax(X)

tensor([[0.3304, 0.3965, 0.2731],
        [0.4523, 0.2515, 0.2962]])

In [42]:
'''
AUTOGRADER CELL. DO NOT MODIFY THIS.
'''

X = torch.tensor([[0.2288, 0.4111, 0.0385], [0.6233, 0.0364, 0.1999]])
assert torch.allclose(softmax(X), torch.tensor([[0.3304, 0.3965, 0.2731], [0.4523, 0.2515, 0.2962]]), rtol=1e-2)

### Exercise 3 [10 points]

Implement a linear layer.

$$\mathbf{O} = \mathbf{X}\mathbf{W} + \mathbf{b},$$

where $\mathbf{X}$ is the input feature, $\mathbf{O}$ is the output feature, $\mathbf{W}$ and $\mathbf{b}$ are the weight parameters.

Hint: try `torch.matmul()`.

In [43]:
def linear(X, W, b):
    return torch.matmul(X, W) + b

In [44]:
'''
AUTOGRADER CELL. DO NOT MODIFY THIS.
'''

X = torch.Tensor([[0.1, 0.2, 0.3]])
W = torch.Tensor([[0.1, 0.2, 0.3]]).T
b = torch.Tensor([-0.5])
assert torch.allclose(linear(X, W, b), torch.Tensor([[-0.3600]]), rtol=1e-2)

## 2. Loss

When presented with some training data, our untrained network is likely not to give the correct answer. Loss function measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.