This tutorial heavily draws on [https://pytorch.org/tutorials](https://pytorch.org/tutorials); refer to those pages for further detail.

In [None]:
import torch
import numpy as np

# Data types: everything is tensor

Tensors in `torch` are the equivalent of `numpy` arrays. See [https://pytorch.org/docs/stable/torch.html](https://pytorch.org/docs/stable/torch.html) for detail.

They can be created from python or numpy objects. The data type is automatically inferred, un less stated otherwise.

In [None]:
# list
data = [[0, 2],[5, 4]]

# tensor from list
t_data = torch.tensor(data)

print('data:\n{}\nt_data:\n{}'.format(data, t_data))

In [None]:
print('t_data shape: {}'.format(t_data.shape))

In [None]:
print('data type is {}, made of elements of type {}'.format(type(data), type(data[0][0])))
print('t_data type is {}, made of elements of type {}'.format(type(t_data), t_data.dtype))

Types can be overridden, for example to save memory space.

In [None]:
# tensor from list
t_data = torch.tensor(data, dtype=torch.float32)

print('t_data type is {}, made of elements of type {}'.format(type(t_data), t_data.dtype))

Similarly, tensors can be built from `numpy` arrays.

In [None]:
x_data = np.array(data, dtype=np.float32)

t_data = torch.from_numpy(x_data)

print('x_data type is {}, made of elements of type {}'.format(type(x_data), x_data.dtype))
print('t_data type is {}, made of elements of type {}'.format(type(t_data), t_data.dtype))

Careful here: the `torch` tensor and `numpy` array are linked together (they share the same memory), so changing one changes the other.

In [None]:
# add one to all elements
t_data += 1
print('t_data:\n{}\nx_data:\n{}'.format(t_data, x_data))

But that is not the case with the list...

In [None]:
print(data)

Most usual constructors from `numpy` are available. See also `torch.zeros_like`, `torch.arange`, `torch.linspace`, `torch.eye`, etc.


In [None]:
shape = (2,3,)

# tensor filled with zeros
t_zeros = torch.zeros(shape)
print('t_zeros: \n {} \n'.format(t_zeros))

# tensor filled with ones
t_ones = torch.ones(shape)
print('t_ones: \n {} \n'.format(t_ones))

# tensor filled with random variables
t_rand = torch.rand(shape)
print('t_rand: \n {} \n'.format(t_rand))


Likewise, many functions from `numpy` are available as member functions, in particular linear algebra from `numpy.linalg`.


In [None]:
# sum
print('sum of t_rand: \n{}'.format(t_rand.sum()))
print('or equivalently')
print('{} \n'.format(torch.sum(t_rand)))
      
# mean
print('mean of t_rand: \n{}'.format(t_rand.mean(axis=1)))
print('or equivalently')
print('{} \n'.format(torch.mean(t_rand, axis=1)))

# std
print('standard deviation of t_rand: \n{}'.format(t_rand.std(axis=1)))
print('or equivalently')
print('{} \n'.format(torch.std(t_rand, axis=1)))

# svd
print('singular value decomposition of t_rand: \n{}'.format(t_rand.svd()))
print('or equivalently')
print('{} \n'.format(torch.svd(t_rand)))

## GPU vs CPU speed

A particularity of `torch` is to keep track of the device where the object is stored (usually cpu or gpu).

In [32]:
print('t_data is stored on: {}'.format(t_data.device))

t_data is stored on: cuda:0


In [33]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


t_data = t_data.to(device)
print('t_data is stored on: {}'.format(t_data.device))

t_data = t_data.to('cpu')
print('t_data is stored on: {}'.format(t_data.device))

t_data is stored on: cuda:0
t_data is stored on: cpu


In [None]:
import time

N=5000

# Matrix multiplication on CPU

A = torch.rand(N, N, device='cpu')
B = torch.rand(N, N, device='cpu')

start = time.time()
result = A @ B
end = time.time()

print('CPU time :', end - start)


# Matrix multiplication on GPU

A = A.cuda()
B = B.cuda()


start = time.time()
result = A @ B
torch.cuda.synchronize()
end = time.time()

print('GPU time :', end - start)


# Datasets and loaders

Data manipulation are eased in pytorch by functions that can load big datasets and select batches of samples with randomization. Transformation are also used to normalize the data (here the contrast to a given range). Many datasets are available, like images with `torchvision`.
This tutorial heavily draws [https://pytorch.org/tutorials/beginner/nn_tutorial.html](https://pytorch.org/tutorials/beginner/nn_tutorial.html)


In [None]:
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

In [None]:
train_data = datasets.MNIST(
    root='./tmp',
    train=True,
    download=True,
    transform=ToTensor()
)

In [None]:
image, label = train_data[0]

print(image.shape)
print(label)

In [None]:
# plot random example samples
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3
for i in range(1, cols * rows + 1):
    sample_idx = torch.randint(len(train_data), size=(1,)).item()
    img, label = train_data[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(label)
    plt.axis('off')
    plt.imshow(img.squeeze(), cmap='gray')
plt.show()

In [None]:
# make a batch loader
train_dataloader = DataLoader(train_data, batch_size=64, shuffle=True)

In [None]:
# load new batch
train_features, train_labels = next(iter(train_dataloader))
print('Feature batch shape: {}'.format(train_features.size()))
print('Labels batch shape: {}'.format(train_labels.size()))

# plot first sample of batch
plt.figure()
plt.title(train_labels[0].numpy())
plt.axis('off')
plt.imshow(train_features[0].squeeze(), cmap='gray')
plt.colorbar()
plt.show()

Depending on the neural network design, we might need to format the labels (integer originally) into vectors, with 0s everywhere except for a 1 at the index of the corresponding class (one-hot encoding).

In [None]:
train_data = datasets.MNIST(
    root='tmp',
    train=False,
    download=True,
    transform=ToTensor(),
    target_transform=Lambda(lambda y: torch.zeros(10, dtype=torch.float).scatter_(0, torch.tensor(y), value=1))
)

X, y = train_data[0]

# plot first sample of batch
plt.figure()
plt.axis('off')
plt.imshow(X.squeeze(), cmap='gray')
plt.colorbar()
plt.show()

# plot the one-hot encoded label vector
plt.figure()
plt.imshow(y.reshape([1,10]), cmap='gray')
plt.yticks([])
plt.show()

# Gradient and parameter update

In [None]:
import torch
import matplotlib.pyplot as plt



# number of samples
n = 5

# ground truth: y is a linear function of x contaminated with noise
x = torch.arange(n, dtype=torch.float)  # input tensor
print(f'input: {x}')
a_true = 0.5
b_true = 1.5

y = a_true * x + b_true + torch.randn(n)  # expected/desired output

plt.figure()
plt.scatter(x, y, c='b', label='data')
plt.plot(x, a_true*x+b_true, '--k', label='th')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.tight_layout()
plt.show()
# we start from a random 
a = torch.randn(1, requires_grad=True)
b = torch.randn(1, requires_grad=True)
z = a * x + b



We calculate the loss of our linear regression, which is the mean sqarred error here.


In [None]:
loss = torch.nn.functional.mse_loss(z, y)
print('x:', x)
print('y:', y)
print('a:', a)
print('b:', b)
loss.backward()
print('gradient for a:', a.grad)
print('gradient for b:', b.grad)


Let's try to run the previous cell a second time...

To update the parameters, one has to first calculate the gradient, then modify the parameters with it, but the gradient has to be stopped to do the modification. Otherwise, the parameters (that influence the loss) will be linked to and thus depend on the loss for their calculation, which makes circular dependencies.



In [None]:
# learning rate
eta = 0.1

for i in range(10):
    print('a:', a)
    print('b:', b)
    # calculate output
    z = a * x + b
    loss = torch.nn.functional.mse_loss(z, y)
    loss.backward()
    print('gradient for a:', a.grad)
    print('gradient for b:', b.grad)
    # update the parameters with the 
    with torch.no_grad():
        a.copy_(a - eta * a.grad)
        b.copy_(b - eta * b.grad)
    # reset the gradients
    a.grad.data.zero_()
    b.grad.data.zero_()
a_est = a.detach()
b_est = b.detach()

plt.figure()
plt.scatter(x, y, c='b', label='data')
plt.plot(x, a_true*x+b_true, '--k', label='th')
plt.plot(x, a_est*x+b_est, '--r', label='est')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.tight_layout()
plt.show()



In `torch`, one can disable the gradient calculation using `torch.no_grad()`. Moreover, `detach` can be used to retrieve the trained parameters and use them e.g. in `numpy` functions (as with plotting above).


In [None]:
z = a * x + b
print(z.requires_grad)

with torch.no_grad():
    z = a * x + b
print(z.requires_grad)
z = a * x + b
z_det = z.detach()
print(z.requires_grad)
print(z_det.requires_grad)



# Model and parameter optimization

The above mechanism can be automatized after defining the linear regression as a neural network model (based `torch.nn` and with the same trainable parameters as above) and using an optimizer.


In [None]:

from torch import nn, optim

class LinReg(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = nn.Linear(1, 1)

    def forward(self, x):
        # calculate output
        y = self.lin(x)
        return y.reshape(x.shape[0])
    
# create an instanciation of the linear regression model
model = LinReg()

# loss function
loss_fn = nn.MSELoss()

# optimizer
optimizer = optim.SGD(model.parameters(), lr=0.1)

In [None]:

# optimization loop
for i in range(50):
    # compute prediction and loss
    pred = model(x.reshape([5,1]))
    loss = loss_fn(pred, y)

    # backpropagation of loss error
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

    # report loss
    print('epoch {}, loss = {}'.format( i, loss.detach()))
[print(p) for p in model.parameters()]
a_est2, b_est2 = model.parameters()
a_est2 = a_est2.detach().flatten()
b_est2 = b_est2.detach().flatten()

plt.figure()
plt.scatter(x, y, c='b', label='data')
plt.plot(x, a_true*x+b_true, '--k', label='th')
plt.plot(x, a_est2*x+b_est2, '--r', label='est')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.tight_layout()
plt.show()