# Chapter 1: Foundations

"The aim of this chapter is to explain some foundational mental models that are essential for understanding how neural networks work. Specifically, we'll cover *nested mathematical functions and their derivatives*."

For foundational concepts, we'll introduce via three perspectives:
1. Math, in the form of equations
2. Code, with as little extra syntax as possible
3. A diagram explaining what is going on

"one of the challenges of understanding neural networks is that it requires multiple mental models"

## Dependencies

In [11]:
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
from numpy import ndarray
from typing import Callable

In [None]:


print("Python list operations")
a = [1,2,3]
b = [4,5,6]
print("a+b", a+b)

In [None]:
try:
    print(a*b)
except TypeError:
    print("a*b has no meaning for Python lists")

In [None]:
print("numpy array operations")
a = np.array([1,2,3])
b = np.array([4,5,6])
print("a + b =", a+b)
print("a * b =", a*b)

In [None]:
a = np.array([[1,2,3],
              [4,5,6]]) 
print(a)


Each dimension of the array has an associated axis, making it possible to do intuitive numerical calculations along the different axes. For a 2D array, `axis = 0` corresponds to rows, `axis = 1` corresponds to columns.

In [None]:
print('a:')
print(a)
print('a.sum(axis = 0):', a.sum(axis = 0))
print('a.sum(axis = 1):', a.sum(axis = 1))

In [None]:
b = np.array([10, 20, 30])
print("a + b:\n", a + b)

Some basic functions in `numpy`

In [None]:
def square(x: ndarray) -> ndarray:
    '''
    Square each element in the input ndarray.
    '''
    return np.power(x, 2)


def leaky_relu(x: ndarray) -> ndarray:
    '''
    Apply "Leaky ReLU" function to each element in ndarray.
    '''
    return np.maximum(0.2 * x, x)


In [None]:
square(np.array([1,2,3,4,5,6]))

In [None]:
leaky_relu(np.array([1,2,-3,4,-5,6]))

## Derivatives

In [None]:
def derivative(func: Callable[[ndarray], ndarray],
               input_: ndarray,
               delta: float = 0.001) -> ndarray:
    '''
    Evaluatves the derivative of a function "func" at every element in 
    the "input_" array.
    '''
    return (func(input_ + delta) - func(input_ - delta)) / (2 * delta)

In [None]:
derivative(square, np.array([1,2,4,8,20]) )

# Nested functions

The idea of nesting functions such that the output of one becomes the input for another is crucial for understanding neural networks.

"computing derivatives of composite functions will turn out to be essential for training deep learning models" (11)

## Sigmoid

In [None]:
def sigmoid(x: ndarray) -> ndarray:
    '''
    Apply the sigmoid function to each element in the input ndarray.
    '''
    return 1 / (1 + np.exp(-x))

## Chain rule

In [None]:
from typing import List

# A Function takes in an ndarray as an argument and produces an ndarray
Array_Function = Callable[[ndarray], ndarray]

# A Chain is a list of functions
Chain = List[Array_Function]


def chain_length_2(chain: Chain,
                   x: ndarray) -> ndarray:
    '''
    Evaluates two functions in a row, in a "Chain".
    '''
    assert len(chain) == 2, \
    "Length of input 'chain' should be 2"

    f1 = chain[0]
    f2 = chain[1]

    return f2(f1(x))


def chain_deriv_2(chain: Chain,
                  input_range: ndarray) -> ndarray:
    '''
    Uses the chain rule to compute the derivative of two nested functions:
    (f2(f1(x))' = f2'(f1(x)) * f1'(x)
    '''

    assert len(chain) == 2, \
    "This function requires 'Chain' objects of length 2"

    assert input_range.ndim == 1, \
    "Function requires a 1 dimensional ndarray as input_range"

    f1 = chain[0]
    f2 = chain[1]

    # df1/dx
    f1_of_x = f1(input_range)

    # df1/du
    df1dx = derivative(f1, input_range)

    # df2/du(f1(x))
    df2du = derivative(f2, f1(input_range))

    # Multiplying these quantities together at each point
    return df1dx * df2du


def plot_chain(ax,
               chain: Chain, 
               input_range: ndarray) -> None:
    '''
    Plots a chain function - a function made up of 
    multiple consecutive ndarray -> ndarray mappings - 
    Across the input_range
    
    ax: matplotlib Subplot for plotting
    '''
    
    assert input_range.ndim == 1, \
    "Function requires a 1 dimensional ndarray as input_range"

    output_range = chain_length_2(chain, input_range)
    ax.plot(input_range, output_range)
    
    
def plot_chain_deriv(ax,
                     chain: Chain,
                     input_range: ndarray) -> ndarray:
    '''
    Uses the chain rule to plot the derivative of a function consisting of two nested functions.
    
    ax: matplotlib Subplot for plotting
    '''
    output_range = chain_deriv_2(chain, input_range)
    ax.plot(input_range, output_range)
    
    
fig, ax = plt.subplots(1, 2, sharey=True, figsize=(16, 8))  # 2 Rows, 1 Col

chain_1 = [square, sigmoid]
chain_2 = [sigmoid, square]

PLOT_RANGE = np.arange(-3, 3, 0.01)
plot_chain(ax[0], chain_1, PLOT_RANGE)
plot_chain_deriv(ax[0], chain_1, PLOT_RANGE)

ax[0].legend(["$f(x)$", "$\\frac{df}{dx}$"])
ax[0].set_title("Function and derivative for\n$f(x) = sigmoid(square(x))$")

plot_chain(ax[1], chain_2, PLOT_RANGE)
plot_chain_deriv(ax[1], chain_2, PLOT_RANGE)
ax[1].legend(["$f(x)$", "$\\frac{df}{dx}$"])
ax[1].set_title("Function and derivative for\n$f(x) = square(sigmoid(x))$");

"It will turn out that deep learning models are, mathematically, long chains of these mostly differentiable functions" (14)

## Longer example
"if we have three mostly differentiable functions, how would we go about computing the derivative of f1f2f3?" (14)
"Interestingly, already in this simple example we see the beginnings of what will become the forward and backward passes of a neural network"

In [None]:
def chain_deriv_3(chain: Chain,
                  input_range: ndarray) -> ndarray:
    '''
    Uses the chain rule to compute the derivative of three nested functions:
    (f3(f2(f1)))' = f3'(f2(f1(x))) * f2'(f1(x)) * f1'(x)
    '''
    
    assert len(chain) == 3, \
    "This function requires 'Chain' objects to have length 3"
    
    f1 = chain[0]
    f2 = chain[1]
    f3 = chain[2]
    
    # f1(x)
    f1_of_x = f1(input_range)
    
    # f2(f1_of_x)
    f2_of_x = f2(f1_of_x)
    
    # df3du
    df3du = derivative(f3, f2_of_x)
    
    # df2du
    df2du = derivative(f2, f1_of_x)
    
    # df1du
    df1dx = derivative(f1, input_range)
    
    ## Multiply these together at each point
    return df1dx * df2du * df3du


def plot_chain(ax,
               chain: Chain, 
               input_range: ndarray,
               length: int=2) -> None:
    '''
    Plots a chain function - a function made up of 
    multiple consecutive ndarray -> ndarray mappings - across one range
    
    ax: matplotlib Subplot for plotting
    '''
    
    assert input_range.ndim == 1, \
    "Function requires a 1 dimensional ndarray as input_range"
    if length == 2:
        output_range = chain_length_2(chain, input_range)
    elif length == 3:
        output_range = chain_length_3(chain, input_range)
    ax.plot(input_range, output_range)

    
def plot_chain_deriv(ax,
                     chain: Chain,
                     input_range: ndarray,
                     length: int=2) -> ndarray:
    '''
    Uses the chain rule to plot the derivative of two nested functions.
    
    ax: matplotlib Subplot for plotting
    '''

    if length == 2:
        output_range = chain_deriv_2(chain, input_range)
    elif length == 3:
        output_range = chain_deriv_3(chain, input_range)
    ax.plot(input_range, output_range)
    
    
def chain_length_3(chain: Chain,
                   x: ndarray) -> ndarray:
    '''
    Evaluates three functions in a row, in a "Chain".
    '''
    assert len(chain) == 3, \
    "Length of input 'chain' should be 3"

    f1 = chain[0]
    f2 = chain[1]
    f3 = chain[2]

    return f3(f2(f1(x)))


fig, ax = plt.subplots(1, 2, sharey=True, figsize=(16, 8))  # 2 Rows, 1 Col

chain_1 = [leaky_relu, square, sigmoid]
chain_2 = [leaky_relu, sigmoid, square]

PLOT_RANGE = np.arange(-3, 3, 0.01)
plot_chain(ax[0], chain_1, PLOT_RANGE, length=3)
plot_chain_deriv(ax[0], chain_1, PLOT_RANGE, length=3)

ax[0].legend(["$f(x)$", "$\\frac{df}{dx}$"])
ax[0].set_title("Function and derivative for\n$f(x) = sigmoid(square(leakyRrelu(x)))$")

plot_chain(ax[1], chain_2, PLOT_RANGE, length=3)
plot_chain_deriv(ax[1], chain_2, PLOT_RANGE, length=3)
ax[1].legend(["$f(x)$", "$\\frac{df}{dx}$"])
ax[1].set_title("Function and derivative for\n$f(x) = square(sigmoid(leakyRelu(x)))$");

In [None]:
def binary_step_function(x: ndarray,
                         threshold: float) -> ndarray:
    '''
    Apply binary step function to each element in ndarray using
    float threshold as the basis for activation.
    '''
    def binary_step(item):
        if item < threshold:
            return 0
        else:
            return 1
    return np.array([binary_step(item) for item in x])


binary_step_function(np.array([4, 0, 1, 2, -5]), 3)

In [None]:


def linear_activation(x: ndarray, slope: float) -> ndarray:
    '''
    Apply linear function to each element in ndarray using
    float slope as the basis for activation.
    '''
    return x * slope



In [1]:
from src.math import tanh_function, softmax_function
import numpy as np

INPUT = np.array([0.8, 1.2, 3.1])
softmax_function(INPUT)

array([0.08021815, 0.11967141, 0.80011044])

In [2]:
assert 1==1

In [4]:
import numpy as np

np.dot([3,4], [3,4])

25

In [8]:
np.transpose([[3,4],[5,6]])

array([[3, 5],
       [4, 6]])

In [9]:
np.ones(6)

array([1., 1., 1., 1., 1., 1.])

In [12]:
def matmul_backward_first(X: ndarray,
                          W: ndarray) -> ndarray:
    '''
    Computes the backward pass of a matrix multiplication with respect to the first argument.
    '''

    # backward pass
    dNdX = np.transpose(W, (1, 0))

    return dNdX

In [13]:
np.random.seed(190203)

X = np.random.randn(1,3)
W = np.random.randn(3,1)

print(X)
print(W)
matmul_backward_first(X, W)

[[ 0.47231121  0.61514271 -1.72622715]]
[[ 0.92819676]
 [-0.60754888]
 [-1.22136052]]


array([[ 0.92819676, -0.60754888, -1.22136052]])

## Partial derivatives

In mathematics, a [partial derivative](https://en.wikipedia.org/wiki/Partial_derivative) of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Partial derivatives are used in vector calculus and differential geometry.

"The term 'gradient' as we'll use it in this book simply refers to a multidimensional analogue of the partial derivative; specifically, it is an array of partial derivatives of the output of a function with respect to each element of the input to that function" (25)

In [14]:
def matmul_forward(X: ndarray,
                   W: ndarray) -> ndarray:
    '''
    Computes the forward pass of a matrix multiplication
    '''
    
    assert X.shape[1] == W.shape[0], \
    '''
    For matrix multiplication, the number of columns in the first array should match the
    number of rows in the second, instead the number of columns in the first array is {0}
    and the number of rows in the second array is {1}
    '''.format(X.shape[1], W.shape[0])

    # matrix multiplication
    N = np.dot(X, W)

    return N




In [15]:
np.random.seed(190203)

X = np.random.randn(1,3)
W = np.random.randn(3,1)

print(X)
print(W)

[[ 0.47231121  0.61514271 -1.72622715]]
[[ 0.92819676]
 [-0.60754888]
 [-1.22136052]]


In [17]:
from src.math import sigmoid

sigmoid(matmul_forward(X, W))

array([[0.89779986]])

In [25]:
np.sum([[3,4], [5,6]])

18

In [31]:
import pandas as pd

d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5,6]}
df = pd.DataFrame(data=d)
df.head()

Unnamed: 0,col1,col2,col3
0,1,3,5
1,2,4,6


In [33]:
thing = df[['col1', 'col2']].copy()
thing.head()

Unnamed: 0,col1,col2
0,1,3
1,2,4
