Big Picture
* Get the dataset (pairs of input and label)
* Forward Pass: input -> function (model) -> output (prediction) 
* Compute the loss 
* Backward Pass (i.e backprop)
* Update the parameters (i.e. weights and biases)


In [None]:
from pathlib import Path
import requests

# get the dataset
DATA_PATH = Path('data')
PATH = DATA_PATH / 'mnist'

PATH.mkdir(parents=True, exist_ok=True)
URL = 'https://github.com/pytorch/tutorials/raw/main/_static/'
FILENAME = 'mnist.pkl.gz'

if not (PATH / FILENAME).exists():
    content = requests.get(URL + FILENAME).content
    (PATH / FILENAME).open('wb').write(content)

In [None]:
import pickle
import gzip

with gzip.open((PATH / FILENAME).as_posix(), 'rb') as f:
    # (training data), (validation data)
    # (x is input, y is label)
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")

print("x training data", x_train.shape)
print("y training data", y_train.shape)


In [None]:
import matplotlib.pyplot as plt
import numpy as np

# examine the dataset
plt.imshow(x_train[0].reshape(28, 28), cmap='gray')
print(y_train[0])


In [None]:
import torch

# turn numpy arrays to tensors
x_train, y_train, x_valid, y_valid = map(torch.tensor, (x_train, y_train, x_valid, y_valid))


* numpy: scientific and numerical computing in Python
* numpy array - multidimensional table of data - 2d, 3d, 4d...
* when all the elements in an array is of simple type - like integer or float; numpy will store it as a compact C data structure in memory. It can run computations on the data at the same speed as optimize C code.
* pytorch tensors are almost the same as numpy arrays - but there are some restrictions, that makes it more performant. 
* restrictions: A tensor can't be of any type - it has to be a single basic numeric type for all elements. - it also has to be rectangular in shape
* Tensors can utilize GPUs and optimized for computation in GPUs.
* Pytorch implements Autograd - automatically can compute gradients(derivatives) of an operation we do on tensors.

In [None]:
import math
# Initialize weights and biases (i.e parameters)
# NN Architecture: inputs -> hidden -> hidden -> output <--> label
# Our Architecture: inputs -> output <--> label (similar to logistic regression)
# Number of neurons in the input layer: 784(28 * 28); output layers: 10
# (10, 784) * (784 rows, 50000 columns) -> w * x + b
# (50000, 784) * (784, 10) -> x * w + b
torch.manual_seed(0)    
weights = torch.randn(784, 10) / math.sqrt(784) # Xavier initialization
weights = requires_grad_() # in place apply
biases = torch.zeroes(10, requires_grad=True)
