In [1]:
import operator

def test(a,b,cmp,cname=None):
    if cname is None: cname=cmp.__name__
    assert cmp(a,b),f"{cname}:\n{a}\n{b}"

def test_eq(a,b): test(a,b,operator.eq,'==')

In [6]:
#export
from pathlib import Path
from IPython.core.debugger import set_trace
from fastai import datasets
import pickle, gzip, math, torch, matplotlib as mpl
import matplotlib.pyplot as plt
from torch import tensor

MNIST_URL='http://deeplearning.net/data/mnist/mnist.pkl'

In [7]:
path = datasets.download_data(MNIST_URL, ext='.gz'); path

PosixPath('/home/jupyter/.fastai/data/mnist.pkl.gz')

And then we can use the standard library gzip to open it and then we can pickle.load() it. So in Python the kind of standard serialization format is called pickle, and so this MNIST version on deeplearning.net is stored in that format, so it basically gives us a tuple of tuple of datasets like so:

In [8]:
with gzip.open(path, 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')

It actually contains numpy arrays, but numpy arrays are not allowed in our foundations. So we have to convert them into tensors. So we can just use the Python map to map the tensor function over each of these four arrays, so we get back four tensors.

A lot of you will be more familiar with numpy arrays than PyTorch tensors, but you know, everything you can do in numpy arrays you can also do in PyTorch tensors, but you can also do it on the GPU and have all this nice deep learning infrastructure. So it is a good idea to get used to using PyTorch tensors in my opinion.

So we can now grab the numbers of rows and number of columns in the training set and we can take a look.

In [9]:
x_train,y_train,x_valid,y_valid = map(tensor, (x_train,y_train,x_valid,y_valid))
n,c = x_train.shape
x_train, x_train.shape, y_train, y_train.shape, y_train.min(), y_train.max()

(tensor([[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]),
 torch.Size([50000, 784]),
 tensor([5, 0, 4,  ..., 8, 4, 8]),
 torch.Size([50000]),
 tensor(0),
 tensor(9))

So here's MNIST, hopefully pretty familiar to you already. It is 50000 rows by 784 columns (28px x 28px), and the y data looks something like this: the y_shape is just 50000 rows and the minimum and maximum of the dependent variable is 0 - 9. Hopefully that looks pretty familiar.

So let's add some tests:

In [10]:
assert n==y_train.shape[0]==50000
test_eq(c,28*28)
test_eq(y_train.min(),0)
test_eq(y_train.max(),9)

So we got a FloatTensor and we pass that to imshow() after casting it to a 28 by 28. .view() is really important, I think we saw it a few times in part one, but get really familiar with it this is how we reshape our 784 long vector into a 28 by 28 matrix that is suitable for plotting.

Ok, so there's our data. And let's start by creating a simple linear model.
So for a linear model we are going to need to basically have something where y = ax + b and so our a will be a bunch of weights, so it is needs to be a 784 by 10 matrix, because we got 784 coming in and 10 coming out. That is going to allow us take in our independent variable and map it to something which we can compare to our dependent variable. And for our bias we just start with 10 zeroes.

## Initial python model

In [15]:
weights = torch.randn(784,10)

In [16]:
bias = torch.zeros(10)

### Matrix Multiplication with loops

In [None]:
def matmul(a,b):
    ar, ac = a.shape
    br, bc = b.shape
    
    # make sure matrices can be multiplied
    assert ac==br          
    
    # init c to zeros with correct shape
    c = torch.zeros(ar, bc)
    
    # loop thru n_rows of a
    # loop thru n_cols of b
    # loop thru n_cols of a, or n_rows of b
    # update c
    
    return c

In [18]:
m1 = x_valid[:5]    # first 5 from valid set
m2 = weights        # torch.randn(784,10)

In [19]:
m1.shape,m2.shape

(torch.Size([5, 784]), torch.Size([784, 10]))

In [20]:
%time t1=matmul(m1, m2)

CPU times: user 884 ms, sys: 0 ns, total: 884 ms
Wall time: 891 ms


In [21]:
t1.shape

torch.Size([5, 10])

### Elementwise matmul

So this is all the same (the first 7 lines). But now we have replaced the inner loop and you'll see that basically it loks exactly the same as before but where it used to say k it now says :. 

So in PyTorch and numpy : means the entirity of that axis. So Rachel help me remember the order of rows and columns when we talk about matrices, which is the song 'Row by column, row by column'. So i is the row number, take all cloumns (a[i,:]). And this (b[:,j]) is column number j, take all rows. So multiply all of column j by all of row i and that gives us back a rank one tensor, which we add up. 

Ok, that is exactly the same as what we had before. So now that takes 1.45ms. We have removed one line of code and it is a 178 times faster.

In [28]:
def matmul(a,b):
    ar,ac = a.shape
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    # Fill in code here
    
    return c

`c[i]` is the same as `c[i,:]`. Any time there's a trailing colon in numpy or PyTorch, you can delete it optionally. You don't have to. 

`c[None,:]` is the same as `c[None]`

#### Matmul with broadcasting

In [57]:
def matmul(a,b):
    ar,ac = a.shape
    br,bc = b.shape
    assert ac==br
    c = torch.zeros(ar, bc)
    # Code here
    
    return c

So we are going to set the whole of row i of c (`c[i]`) to the whole of row i of a (`a[i]`) turned into a rank two tensor: (`a[i].unsqueeze(-1)`) We could also have written it like that (`a[i, None]`)<br>
So this is now of length `ar` by 1, which is a rank two tensor. b is also a rank two tensor, and that is the entirity of our matrix.

And so `a[i].unsqueeze(-1)` is going to get broadcast over this `b`. Which gets rid of the loop.

This is actually going to return a rank two tensor. And then that rank to tensor we want to sum it up over the rows `sum(dim=0)`.