## Read the data

We will download the MNIST dataset for training a classifier. Torch provides a convenient function for that.

The MNIST dataset is composed of images of digits that must be classified with labels from 0 to 9. The inputs are 28x28 matrices containing the grayscale intensity in each pixel.

In [14]:
import torch
import torchvision.transforms as transforms
import torchvision.datasets as dsets
import numpy as np

# download (if it isn't there yet) to a folder named data
# convert the data to Tensor objects
train_data = dsets.MNIST(root='data/', download=True, train=True,
                         transform=transforms.ToTensor())
test_data = dsets.MNIST(root='data/', download=True, train=False,
                        transform=transforms.ToTensor())

# these loaders make it easier to access the numeric content of the data
train_loader = torch.utils.data.DataLoader(train_data, batch_size=4)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=4)

In [16]:
import numpy as np
import mnist

train_x = mnist.train_images()
train_y = mnist.train_labels()
test_x = mnist.test_images()
test_y = mnist.test_labels()

print('%d training instances and %d test instances' % (len(train_x), len(test_x)))

60000 training instances and 10000 test instances


Check the shape of our training data to see how many input features there are:

In [20]:
print(train_x.shape)
print(train_x[0])

(60000, 28, 28)
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   3  18  18  18 126 136
  175  26 166 255 247 127   0   0   0   0]
 [  0   0   0   0   0   0   0   0  30  36  94 154 170 253 253 253 253 253
  225 172 253 242 195  64   0   0   0   0]
 [  0   0   0   0   0   0   0  49 238 253 253 253 253 253 253 253 253 251
   93  82  82  56  39   0   0   0   0   0]
 [  0   0   0   0   0   0   0  18 219 253 253 25

### Formatting

Each sample is a 28x28 matrix. But we want to represent them as vectors, since our model doesn't take any advantage of the 2-d nature of the data.

So, we reshape the data:

In [30]:
num_features = 28 * 28
new_shape = [60000, new_dim]
train_x_vectors = train_x.reshape(new_shape)
print(train_x_vectors.shape)
print(train_x_vectors[0])

(60000, 784)
[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
   0   0   0   0   0   0   0   0   3  18  18  18 126 136 175  26 166 255
 247 127   0   0   0   0   0   0   0   0   0   0   0   0  30  36  94 154
 170 253 253 253 253 253 225 172 253 242 195  64   0   0   0   0   0   0
   0   0   0   0   0  49 238 253 253 253 253 253 253 253 253 251  93  82
  82  56  39   0   0   0   0   0   0   0   0   0   0   0   0  18 219 253
 253 253 253 253 198 182 247 241   0  

When we reshape an array (or torch tensor, for that matter), we don't need to specify all dimensions. We can leave one as -1, and it will be automatically determined from the size of the data. This is useful when we don't know a priori the shape of some array.

In [24]:
train_x_vectors = train_x.reshape([-1, new_dim])
test_x_vectors = test_x.reshape([-1, new_dim])

Now, check the labels:

In [25]:
print(np.unique(train_y))
num_classes = len(np.unique(train_y))

[0 1 2 3 4 5 6 7 8 9]


### Creating a simple linear classifier

Our input has 128 dimensions (each one is a pixel), and the output has 26 possible classes. We will create a weight matrix $w$ and a bias vector $b$.

In [53]:
w = torch.randn([num_features, num_classes], requires_grad=True)
b = torch.randn([num_classes], requires_grad=True)

For illustration purposes, let's take a small batch of the data and create a pytorch tensor with it.

In [54]:
batch = torch.tensor(train_x_vectors[:8])
logits = torch.matmul(batch, w) + b

RuntimeError: Expected object of type torch.ByteTensor but found type torch.FloatTensor for argument #2 'mat2'

Always take care with data types! The training data were 8-bit integers; but for neural networks in general, we want to work with floats.

In [58]:
batch = torch.tensor(train_x_vectors[:8], dtype=torch.float) / 255
logits = torch.matmul(batch, w) + b

This is how the logits look like. Think of them as the scores for each instance/class combination.

In [59]:
logits

tensor([[ -2.1683,   1.6978,  -5.5113,  17.6847,  -6.6208, -18.6775,   5.6941,
         -10.6070,   9.5219,  11.2406],
        [ -3.7293,   0.1569, -18.7178,  14.7960,  14.3775, -15.4994,  -0.0481,
         -10.9189, -15.2758, -12.3799],
        [ 18.6178,   0.4904,  -2.3751,  -4.0934,   0.5952, -10.6828,   3.7636,
          -4.7998,  -0.8878,   2.3721],
        [ -3.6464,   4.4262,  -4.0765,   3.2682,  -4.5175,  -0.9030,   6.7660,
          -1.0203,  -3.4345,   0.7512],
        [  5.3913,  -2.9702, -22.2613,  15.6960,  -5.6496,  -2.6898,   4.6293,
          -9.6786,   1.3551,  -2.6823],
        [ -1.5382,  -8.6193, -12.2209,   2.8339,  -5.1867,  -1.9191,   9.0326,
         -21.1009,   2.6512,   8.9355],
        [  8.0742,   3.6382,  -7.3217,   8.0356,  -7.2478,   0.7622,  -5.5060,
         -21.6653,   1.0987,   5.7574],
        [  1.5389,  -5.5820, -12.7511,  11.4416,  -1.9076,  -7.4371,   3.9595,
         -15.7578,  -3.0995,  18.5088]], grad_fn=<ThAddBackward>)

We want to take the highest scoring class for each instance, i.e., the argmax:

In [60]:
answer = torch.argmax(logits, dim=1)
answer

tensor([3, 3, 0, 6, 3, 6, 0, 9])

What are the correct classes for those? Most of them must be wrong, we just initialized weights randomly.

In [76]:
batch_labels = torch.tensor(train_y[:8], dtype=torch.long)
batch_labels

tensor([5, 0, 4, 1, 9, 2, 1, 3])

We can compute the loss as the mean cross-entropy, as usual for classification problems. Remember that the cross-entropy between the true label distribution $p$ and the predicted $q$ is computed as:

$-\sum_c p(x) \log q(x)$

for every label $c$.

The true distribution $p$ is one for the correct label and 0 elsewhere; the predicted $q$ can be computed as the softmax over the logits.

In [67]:
p = torch.zeros([8, num_classes])
p[torch.arange(8), train_y[:8]] = 1
p

tensor([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
        [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]])

In [68]:
q = torch.softmax(logits, dim=1)
q

tensor([[2.3831e-09, 1.1380e-07, 8.4191e-11, 9.9812e-01, 2.7760e-11, 1.6116e-16,
         6.1907e-06, 5.1550e-13, 2.8453e-04, 1.5868e-03],
        [5.4323e-09, 2.6468e-07, 1.6810e-15, 6.0311e-01, 3.9689e-01, 4.2004e-14,
         2.1563e-07, 4.0979e-12, 5.2528e-14, 9.5074e-13],
        [1.0000e+00, 1.3407e-08, 7.6360e-10, 1.3698e-10, 1.4889e-08, 1.8832e-13,
         3.5390e-07, 6.7585e-11, 3.3792e-09, 8.8018e-08],
        [2.6599e-05, 8.5258e-02, 1.7302e-05, 2.6783e-02, 1.1131e-05, 4.1335e-04,
         8.8493e-01, 3.6760e-04, 3.2878e-05, 2.1613e-03],
        [3.3475e-05, 7.8224e-09, 3.2760e-17, 9.9995e-01, 5.3667e-10, 1.0354e-08,
         1.5624e-05, 9.5487e-12, 5.9129e-07, 1.0433e-08],
        [1.3423e-05, 1.1287e-08, 3.0789e-10, 1.0632e-03, 3.4942e-07, 9.1712e-06,
         5.2324e-01, 4.2841e-14, 8.8567e-04, 4.7479e-01],
        [4.8212e-01, 5.7096e-03, 9.9265e-08, 4.6387e-01, 1.0688e-07, 3.2181e-04,
         6.0999e-07, 5.8538e-14, 4.5053e-04, 4.7529e-02],
        [4.2628e-08, 3.4444

In [83]:
- torch.sum(p * torch.log(q)) / 8

tensor(16.0491, grad_fn=<DivBackward0>)

We can also use pytorch's own function for that!

In [84]:
loss_layer = torch.nn.CrossEntropyLoss()
loss = loss_layer(logits, batch_labels)
loss

tensor(16.0491, grad_fn=<NllLossBackward>)

Squeeze, transpose, reshape, dimensão -1

loss functions

Onehot?