Logs   
- [2023/03/08]   
  Restart this notebook if you change the scratch library

- [2024/04/15]   
  You do not need to restart this notebook when updating the scratch library

Notes:
- You should not use the following implementation of tensor and all the other   
  layer abstraction to the real problem. Because they are very slow. Use   
  existing library such as TensorFlow or Pytorch for fast running.

In [51]:
import numpy as np
import operator

from typing import List, Callable, Iterable, Tuple
from scratch.linear_algebra import LinearAlgebra as la
from scratch.probability import Probability as prob
from scratch.neural_networks import NeuralNetworks as nn


In [28]:
%load_ext autoreload
%autoreload 2 


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Deep learning is a technique that utilize neural network with many layers
to solve many problems including supervise and unsupervise.

To do deep learning, we need several abstraction of data structure

## The Tensor

The implementation of Tensor type in here using some kind of cheating using List.  
We do this because for the practical purpose to learn the concept first.   
More general Tensor datatype is provided by popular library like TensorFlow or PyTorch

Here we use Tensor data type as a list (in fact in concise
mathematical term, $n$-dimensional array *is not*
a tensor)

In [29]:
Tensor = list

In [30]:
# To find a tensors' shape:
def shape(tensor: Tensor) -> List[int]:
  sizes: List[int] = []
  while isinstance(tensor, list):
    sizes.append(len(tensor))
    tensor = tensor[0]    # we enter the first element and recursively fo the deeper elements
  return sizes

assert shape([1, 2, 3]) == [3]
assert shape([[1, 2], [3, 4], [5, 6]]) == [3, 2]
assert shape([[[1, 2], [2, 3], [4, 5]],
       [[6, 7], [8, 9], [10, 11]]]) == [2, 3, 2]

We implement 1d case tensor with its function modification of tensor    
and generalization greater than 1d can be implemented using recursive function


In [31]:
def is_1d(tensor: Tensor) -> bool:
  """If tensor[0] is a list, it's a higher-order tensor.
     Otherwise, tensor is 1-dimensional (that is, a vector).""" 
  return not isinstance(tensor[0], list)

assert is_1d([1, 2, 3])
assert not is_1d([[1, 2], [3, 4]])

Recursive `tensor_sum` function

In [32]:
def tensor_sum(tensor: Tensor) -> float:
  """Sums up all the values in the tensor"""
  if is_1d(tensor):
    return sum(tensor)      # just a list of floats, use Python sum
  else:
    return sum(tensor_sum(tensor_i)     # Call tensor_sum on each row
                for tensor_i in tensor) # and sum up those results.

assert tensor_sum([1, 2, 3]) == 6
assert tensor_sum([[1, 2], [3, 4]]) == 10

Recursive function to apply a function to a tensor

In [33]:
def tensor_apply(f: Callable[[float], float], tensor: Tensor) -> Tensor:
  """Applies f elementwise"""
  if is_1d(tensor):
    return [f(x) for x in tensor]
  else:
    return [tensor_apply(f, tensor_i) for tensor_i in tensor]

assert tensor_apply(lambda x: x + 1, [1, 2, 3]) == [2, 3, 4]
assert tensor_apply(lambda x: 2*x, [[1, 2], [3, 4]]) == [[2, 4], [6, 8]]

Use the above `tensor_apply` to create a zero tensor with the same shape as   
a given tensor

In [34]:
def zeros_like(tensor: Tensor) -> Tensor:
  return tensor_apply(lambda _: 0.0, tensor)

assert zeros_like([1, 2, 3]) == [0, 0, 0]
assert zeros_like([[1, 2], [3, 4]]) == [[0, 0], [0, 0]]

Element-wise operation of two tensor

In [35]:
def tensor_combine(f: Callable[[float, float], float],
                    t1: Tensor, t2: Tensor) -> Tensor:
  """Applies f to corresponding elements of t1 and t2"""
  if is_1d(t1):
    return [f(x, y) for x, y in zip(t1, t2)]
  else:
    return [tensor_combin(f, t1_i, t2_i) for t1_i, t2_i in zip(t1, t2)]

assert tensor_combine(operator.add, [1, 2, 3], [4, 5, 6]) == [5, 7, 9]
assert tensor_combine(operator.mul, [1, 2, 3], [4, 5, 6]) == [4, 10, 18]

## The Layer Abstraction

This `Layer` class will define an abstraction to derive a specifi layer.    
A layer is a function that perform multidimensional array operations

In [36]:
class Layer:
  """Our neural networks will composed of Layers, each of which knows how to do  
  some computation on its inputs in the "forward" direction and propagate
  gradients in the "backward" direction"""

  def forward(self, input):
    """Note the lack of types. We're not going to be presriptive about what kinds
    of input layers can take and what kinds of outputs they can return"""
    raise NotImplementedError

  def backward(self, gradient):
    """Similarly, we're not going to be prescriptive about what the gradient 
    looks like. It's up to you the user to make sure that you're doing things
    sensibly"""

  def params(self) -> Iterable[Tensor]:
    """Returns the parameters of this layer. The default implementation return
    nothing, so that if you have a layer with no parameters you don't have to 
    implement this."""
    return ()

  def grads(self) -> Iterable[Tensor]:
    """Returns the gradients, in the same order as params()"""
    return ()

The above layer class is an abstraction of specific layer that will be defined   
by inherit from that class. In here we called `Layer` class above as a parent  
class, and all the specific class can be derived from the parent class.

In each specific layer, we can update parameters (`params` variables) in our  
networks using its gradient. We can also get from each specific layer its  
parameters and gradients.

Let us define a specific class `Sigmoid` layer

In [37]:
class Sigmoid(Layer):
  def forward(self, input: Tensor) -> Tensor:
    """Apply sigmoid to each element of the input tensor, and save the results 
    to use in backpropagation."""
    self.sigmoid = tensor_apply(sigmoid, input)
    return self.sigmoids

  def backwards(self, gradient: Tensor) -> Tensor:
    return tensor_combine(lambda sig, grad: sig * (1 - sig) * grad,
                          self.sigmoids, gradient)

To understand the implementation of `backward` for sigmoid function, see  
[`ch-18-neural-networks.ipynb`](./ch-18-neural-networks.ipynb) or
[`neural-nets.drawio`](./img-resources/neural-nets.drawio). We can say in  
general that `sig * (1 - sig)` is the derivative of sigmoid to its argument   
by representing the result with `sigmoid` function, and `grad` is   
the gradient propagation from the next layer.

## The Linear Layer

This layer implements linear layer from Chapter 18 which is a linear function
that is defined by 
$$
  \mathbf{w} \cdot \mathbf{x} + b
$$

The bias term $b$ can be incorporated to the weights vector $\mathbf{w}$ by 
concatenating 1 to inputs vector $\mathbf{x}$. Therefore we we have
$$
  \begin{bmatrix}
    w_1 & w_2 & \ldots & w_n & w_{n+1}
  \end{bmatrix}
  \begin{bmatrix}
    x_1 \\ x_2 \\ \vdots \\ x_n \\ 1
  \end{bmatrix}
$$

There are three options to initialize weights vector:
- random uniform distribution on $[0, 1]$
- standard normal distribution
- Xavier initialization, each weight is initialized with a random draw from  
  a normal distribution with mean 0 and variance `2 / (num_inputs + num_outputs)`

In [48]:
def random_uniform(*dims: int, rng=np.random.default_rng()) -> Tensor:
  if len(dims) == 1:
    return [rng.random() for _ in range(dims[0])]
  else:
    return [random_uniform(*dims[1:], rng=rng) for _ in range(dims[0])]

def random_normal(*dims: int, mean: float = 0.0, variance: float = 1.0,
                  rng=np.random.default_rng()) -> Tensor:
  if len(dims) == 1:
    return [mean + variance * prob.inverse_normal_cdf(rng.random())
            for _ in range(dims[0])]
  else:
    return [random_normal(*dims[1:], mean=mean, variance=variance)
            for _ in range(dims[0])]

seed = 24_04_26
rng = np.random.default_rng(seed)
# random_uniform(2, 3, 4, rng=rng)
assert shape(random_uniform(2, 3, 4, rng=rng)) == [2, 3, 4]
assert shape(random_normal(5, 6, mean=10, rng=rng)) == [5, 6]

Wrap the random generator above in a `random_tensor` function

In [50]:
def random_tensor(*dims: int, init: str = 'normal', 
                  rng=np.random.default_rng()) -> Tensor:
  if init == 'normal':
    return random_normal(*dims, rng=rng)
  elif init == 'uniform':
    return random_uniform(*dims, rng=rng)
  elif init == 'xavier':
    variance = len(dims) / sum(dims)
    return random_normal(*dims, variance=variance)
  else:
    raise ValueError(f"unknown init: {init}")

When defining a linear layer we need the following integers to defined
- the dimension of the inputs (which tells us how many weights each neuron neds) 
- the dimension of the outputs (which tell us how many neurons we should have)

In [54]:
class Linear(Layer):
  def __init__(self, input_dim: int, output_dim: int, init: str = 'xavier',
                rng=np.random.default_rng()) -> None:
    """A layer of output_dim neurons, each with input_dim weights (and a bias)."""
    self.input_dim = input_dim
    self.output_dim = output_dim
    
    # self.w[o] is the weights for the o-th neuron
    self.w = random_tensor(output_dim, input_dim, init=init, rng=rng)
    # self.b[o] is the bias term for the o-th neuron
    self.b = random_tensor(output_dim, init=init, rng=rng)

  def forward(self, input_: Tensor) -> Tensor:
    # Save the input to use in the backward pass.
    self.input = input_

    # Return the vector of neuron outputs.
    return [la.dot(input_, self.w[o]) + self.b[o] 
            for o in range(self.output_dim)]
  
  def backward(self, gradient: Tensor) -> Tensor:
    # Each b[o] gets added to output[o], whcich means the gradient of b is the 
    # same as the output gradient
    self.b_grad = gradient

    # Each w[o][i] multiplies input[i] and gets added to output[o]. So its
    # gradient is input[i] * gradient[o].
    self.w_grad = [[self.input[i] * gradient[o]
                    for i in range(self.input_dim)]
                      for o in range(self.output_dim)]

    # Each input[i] multiplies every w[o][i] and gets added to every output[o]. 
    # So its gradient is the sum of w[o][i] * gradient[o] across all the outputs.
    return [sum(self.w[o][i] * gradient[o] for o in range(self.output_dim))
            for i in range(self.input_dim)]

  def params(self) -> Iterable[Tensor]:
    return [self.w, self.b]

  def grads(self) -> Iterable[Tensor]:
    return [self.w_grad, self.b_grad]

## Neural Networks as a Sequence of Layers

## Loss and Optimization

## Example: XOR Revisited

## Other Activation Functions

## Example: FizzBuzz Revisited

## Softmaxes and Cross-Entropy

## Dropout

## Example: MNIST

## Saving and Loading Models