# Notes Part 2

Overfitting - does *not* mean that you're training loss is lower than validation loss! 

A well fit will almost always have training loss lower than the validation loss.

Overfitting means - you're actually seeing your validation loss gettting worse.

### Five steps to avoid overfitting

More data

Data augmentation 

Generalizable architectures

Regularization

Reduce architecture complexity

### Steps to a basic modern CNN model

Matmul

Relu / Init

FC Forward

FC backward

Train loop

Conv

Optim

Batch-Norm

ResNet

If you want to develop your own library in Jupyter notebooks, follow notebook2script.py

# Matrix Multiplication

In [3]:
from fastai.basics import *
from pathlib import Path
from IPython.core.debugger import set_trace
from fastai import datasets
import pickle, gzip, math
import matplotlib as mpl
import torch
import matplotlib.pyplot as plt
from torch import tensor

In [1]:
import time

def timeit(func):
    def wrapper(*args, **kwargs):
        now = time.time()
        retval = func(*args, **kwargs)
        print('{} took {:.5f}s'.format(func.__name__, time.time() - now))
        return retval
    return wrapper

In [4]:
# python only matrix multiplication

def matmul(a, b):
    ar, ac = a.shape
    br, bc = b.shape
    assert ac == br
    
    c = torch.zeros(ar, bc)
    for i in range(ar):
        for j in range(bc):
            for k in range(ac):
                c[i, j] += a[i,k] * b[k,j]
                
    return c

In [5]:
m1 = torch.rand((5, 784))
m2 = torch.rand((784, 10))

In [7]:
m1.shape, m2.shape

(torch.Size([5, 784]), torch.Size([784, 10]))

In [8]:
%timeit t = matmul(m1, m2)

971 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [11]:
t = matmul(m1, m2)

In [12]:
t.shape

torch.Size([5, 10])

About 1 second per matrix multiply, so if you have a 50,000 long dataset (MNIST), will take you 50,000 seconds 

### Every layer would take about 10 hours!

# How do we speed things up? Write in something other than python!

### PyTorch behind the scenes is using a library called a10

Elementwise operations

Operators +, -, *, >, <, == are usually element-wise

In [15]:
# Example

a = tensor([10., 6, -4])
b = tensor([2., 8, 7])
a,b 

(tensor([10.,  6., -4.]), tensor([2., 8., 7.]))

In [16]:
a + b

tensor([12., 14.,  3.])

In [17]:
(a < b).float().mean()

tensor(0.6667)

In [19]:
m = tensor([[1.,2,3], [4,5,6], [7,7,8]]); m

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 7., 8.]])

Frobenius norm

Matrix times itself, .sum(), .sqrt()