## In this notebook I have tried to recreate matrix multiplication from scratch using only torch.tensor (for tensors creation only)

fast.ai Part 2: Deep Learning from the Foundations
lesson 8 homework 1

In [1]:
import torch

In [100]:
x = torch.rand(size=[4,2],dtype=torch.float)
y = torch.rand(size=[2,4],dtype=torch.float)
display(x,y)

tensor([[0.9848, 0.5285],
        [0.6515, 0.9521],
        [0.5628, 0.0976],
        [0.0582, 0.6412]])

tensor([[0.1714, 0.1951, 0.7761, 0.2252],
        [0.7349, 0.4572, 0.3885, 0.7724]])

To start, here is the PyTorch implementation

In [143]:
%timeit torch.matmul(x, y)

The slowest run took 35.76 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.96 µs per loop


So, let's make a function that will check if the shapes of matrices is appropriate for mutual multiplication

In [107]:
def check_shapes(mat1, mat2):
    mat1_shape, mat2_shape = mat1.shape, mat2.shape
    assert mat1_shape[1] == mat2_shape[0], 'mat1.shape[1] must be equal to mat2.shape[0]'
    assert mat1_shape[0] == mat2_shape[1], 'mat1.shape[0] must be equal to mat2.shape[1]'

In [108]:
check_shapes(x,y)

Matmul ver1 (cycle)

In [137]:
def matmul_cycle(mat1, mat2):
    check_shapes(mat1,mat2)
    mat1_shape, mat2_shape = mat1.shape, mat2.shape
    new_mat = torch.zeros([mat1_shape[0],mat2_shape[1]])
    #multiplication
    for row in range(mat1_shape[0]):
        for col in range(mat2_shape[1]):
            #elementwise
            for row_el,col_el in zip(mat1[row],mat2[:,col]):
                new_mat[row,col] += row_el*col_el
    return new_mat

In [147]:
matmul_cycle(x,y)

tensor([[0.5572, 0.4337, 0.9696, 0.6300],
        [0.8113, 0.5624, 0.8755, 0.8822],
        [0.1682, 0.1544, 0.4747, 0.2021],
        [0.4812, 0.3045, 0.2943, 0.5084]])

In [142]:
%timeit matmul_cycle(x,y)

The slowest run took 7.21 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 646 µs per loop


In [144]:
#ver1 done but it is veeery slow as we can see
#let's figure that out

Matmul ver2 (broadcasting)

In [149]:
def matmul_brcst(mat1, mat2):
    check_shapes(mat1,mat2)
    mat1_shape, mat2_shape = mat1.shape, mat2.shape
    new_mat = torch.zeros([mat1_shape[0],mat2_shape[1]])
    #multiplication
    for row in range(mat1_shape[0]):
        for col in range(mat2_shape[1]):
            #elementwise
            new_mat[row,col] = mat1[row]@mat2[:,col]
    return new_mat

In [150]:
matmul_brcst(x,y)

tensor([[0.5572, 0.4337, 0.9696, 0.6300],
        [0.8113, 0.5624, 0.8755, 0.8822],
        [0.1682, 0.1544, 0.4747, 0.2021],
        [0.4812, 0.3045, 0.2943, 0.5084]])

In [152]:
%timeit matmul_brcst(x,y)

The slowest run took 5.11 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 214 µs per loop


In [153]:
#3 times faster!

Matmul ver3 (super-broadcasting)

In [168]:
def matmul_sbrcst(mat1, mat2):
    check_shapes(mat1,mat2)
    mat1_shape, mat2_shape = mat1.shape, mat2.shape
    new_mat = torch.zeros([mat1_shape[0],mat2_shape[1]])
    #multiplication
    for i in range(new_mat.shape[1]):
        new_mat[:,i] = mat1[:]@mat2[:,i]
    return new_mat

In [162]:
matmul_sbrcst(x,y)

tensor([[0.5572, 0.4337, 0.9696, 0.6300],
        [0.8113, 0.5624, 0.8755, 0.8822],
        [0.1682, 0.1544, 0.4747, 0.2021],
        [0.4812, 0.3045, 0.2943, 0.5084]])

In [163]:
%timeit matmul_sbrcst(x,y)

The slowest run took 16.02 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 63.1 µs per loop


In [165]:
#damn, it is faster! but still pytorch ver is better, what can I do with that?