<a href="https://colab.research.google.com/github/dlsun/Data402-F21/blob/main/Implementing_Neural_Networks_from_Scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Loss Function

First, we implement the loss function $L$. 

Implement squared error loss below. Remember that 
$$L(y, \hat y) = (y - \hat y)^2 $$

In [None]:
class LossFunction:

  def __call__(self, actual, predicted):
    """Evaluates L(actual, predicted)."""
    return 0

  def gradient(self, actual, predicted):
    """Evaluates dL/d(predicted)."""
    return 0


class SquaredErrorLoss(LossFunction):

  def __call__(self, actual, predicted):
    # TODO: Evaluate squared error loss
    return

  def gradient(self, actual, predicted):
    # TODO: Evaluate the derivative of squared error loss.
    return

# Activation Function

Now, we implement the activation function $g$. We need a way to evaluate the function $g(z)$, as well as a way to evaluate its derivative $g'(z)$.

Remember that we can write the derivative $g' = \frac{dh}{dz}$ in two ways, in terms of the input $z$ or in terms of the output $h$. For example, if the activation function is $g(z) = e^z$, then the derivative $\frac{dh}{dz}$ can be expressed as $e^z$ or $h$.

I have implemented ReLU for you. Your job is to implement the sigmoid activation function.

In [None]:
class ActivationFunction:

  def __call__(self, input):
    """Evaluates g(z)."""
    output = input
    return output

  def gradient_in_terms_of_input(self, input):
    """Evaluates dh/dz in terms of z."""
    return 1

  def gradient_in_terms_of_output(self, output):
    """Evaluates dh/dz in terms of h."""
    return 1


class ReLU(ActivationFunction):

  def __call__(self, input):
    """Evaluates g(z) = max(z, 0)."""
    output = np.maximum(input, 0)
    return output

  def gradient_in_terms_of_input(self, input):
    """Evaluates g'(z) = dh/dz = (1 if z > 0 else 0)."""
    return 1 * (input > 0)

  def gradient_in_terms_of_output(self, output):
    """Evaluates dh/dz = (1 if h > 0, 0 if h = 0)."""
    return 1 * (output > 0)


class Sigmoid(ActivationFunction):

  def __call__(self, input):
    """Evaluates g(z) = max(z, 0)."""
    output = np.exp(input) / (1 + np.exp(input))
    return output

  def gradient_in_terms_of_input(self, input):
    """Evaluates g'(z) = dh/dz."""
    # TODO: Implement dh/dz in terms of z.
    return

  def gradient_in_terms_of_output(self, output):
    """Evaluates dh/dz in terms of h."""
    # TODO: Implement dh/dz in terms of h.
    return

# Layer Connection

Next, we implement a layer of a neural network---or more precisely, the connection between one layer and the next. The hyperparameters are:
- `n_input`: the number of nodes in the preceding layer.  
- `n_output`: the number of nodes in the next layer.
- `activation_function`: the activation function to use.

There are two methods you have to implement: 
- `forward`: Given the values of input layer ${\bf h}^{(k)}$, calculate the values of output layer ${\bf h}^{(k+1)}$.
- `backward`: Given backpropagated error gradient $\frac{dL}{d{\bf h}^{(k+1)}}$, update the weights in this layer and return $\frac{dL}{d{\bf h}^{(k)}}$. This in turn will get passed to the `backward()` method of the previous layer.

In [None]:
class LayerConnection:
  """Represents the connection between one layer and the next.

  Attributes:
    weights: An n_input x n_output matrix representing the edge weights W.
    bias: A scalar number representing the bias (a.k.a. intercept).
    n_input: Number of nodes in the preceding layer.
    n_output: Number of nodes in the next layer.
    activation_function: ActivationFunction to apply to the linear combination.
    current_input: Saves the most recent input layer values.
    current_output: Saves the most recent output layer values.
  """

  def __init__(self, n_input, n_output, activation_function=ActivationFunction()):
    # set atributes
    self.n_input = n_input
    self.n_output = n_output
    self.activation_function = activation_function
    # initialize the weights and bias to random numbers
    self.weights = np.random.randn(n_input, n_output)
    self.bias = np.random.randn()
  
  def forward(self, input):
    """Given values of the input layer, calculates values of the output layer.

    Args:
      input: An array of n_input values.

    Returns:
      An array of n_output values.
    """
    # Check the dimensions of the input.
    if len(input) != self.n_input:
      raise ValueError(f"Input to layer must contain {self.n_input} values.")
    # Save the values of the input.
    self.current_input = input

    # TODO: Calculate the values of the output.
    self.current_output = np.zeros(self.n_output)

    # Check the dimensions of the output.
    if len(self.current_output) != self.n_output:
      raise ValueError(f"Layer must return {self.n_output} values.")
    return self.current_output

  def backward(self, backpropagated_error, learning_rate):
    """Updates weights/bias in current layer and returns backpropagated error.

    When taking derivatives, you will need the most recent values of the
    input layer. I recommend that you use the values of the input that you
    saved in `self.current_input`.

    You will also need the derivative of the activation function in terms of
    the output. I recommend that you use the values of the output that you 
    saved in `self.current_output`.

    You should be able to do this using only matrix operations. You should
    not need any for loops.

    Args:
      backpropagated_error: The gradient dL/d(output).
      learning_rate: The learning rate to use in updating the weights.

    Returns:
      The gradient dL/d(input).
    """
    # TODO: calculate the gradient with respect to z (instead of h)
    dL_dz = np.zeros_like(self.current_output)

    # TODO: backpropagate the error to the input layer
    new_backpropagated_error = np.zeros(self.current_input)

    # TODO: calculate the gradient of the weights and update the weights
    self.weights -= np.zeros_like(self.weights)

    # TODO: calculate the gradient of the bias and update the bias
    self.bias -= np.zeros_like(self.bias)

    # return the backpropagated error with respect to the input layer
    return new_backpropagated_error

# Neural Network

A neural network consists of many layer connections. The `NeuralNetwork` class below is fully implemented for you. Read it, and make sure you understand what it is doing.

In [None]:
class NeuralNetwork:
  """A fully-connected neural network.

  Attributes:
    layers: A list of LayerConnections.
    loss_function: A LossFunction to minimize when training the network.
  """

  def __init__(self, layers, loss_function):
    self.layers = layers
    self.loss_function = loss_function

  def predict(self, input):
    """Predict the output for the input.
    
    Args:
      input: An array of values for the input layer.

    Returns:
      An array of values for the output layer.
    """
    # iterate over the layers in order
    for layer in self.layers:
      # calculate the output
      output = layer.forward(input)
      # this output is the input to the next layer
      input = output
    return output

  def train(self, input, actual_label, learning_rate=1.0):
    """Update the network based on a training example.
    
    Args:
      input: An array of values for the input layer.
      actual_label: The correct label for this training example.
      learning_rate: The learning rate to use.
    """
    # calculate prediction first
    predicted_label = self.predict(input)
    # calculate the derivative of the loss
    error = self.loss_function.gradient(actual_label, predicted_label)
    # iterate over the layers in reverse
    for layer in self.layers[::-1]:
      # backpropagate the error, one layer at a time
      error = layer.backward(error, learning_rate)

# Example

If you implemented the neural network above correctly, the code below should run.

This is an illustration of how neural networks can learn non-linear functions.

In [None]:
# Simulate some fake data
n = 10000
xs = 2 * np.random.rand(n) - 1
ys = xs ** 2 + 0.1 * np.random.randn(n)

plt.plot(xs, ys, '.')

Let's fit a neural network to this data. 

- We only have a single predictor $x$, so the input layer only has 1 node.
- Let's use two hidden layers:
    - The first will have 4 nodes.
    - The second will have 8 nodes.
- We want to predict a single value $y$, so the output layer also only has 1 node.

In [None]:
network = NeuralNetwork(
    layers=[
            LayerConnection(1, 4, ReLU()),
            LayerConnection(4, 8, ReLU()),
            LayerConnection(8, 1)  # no activation function at the output layer
            ],
    loss_function=SquaredErrorLoss()
)

Now we train this network on our data. Notice that we make 10 _epochs_ through our data.

In [None]:
for _ in range(10):
  for x, y in zip(xs, ys):
    network.train(np.array([x]), np.array([y]), learning_rate=.001)

Now, let's use our network to predict on a grid of values and plot these predictions.

In [None]:
x_test = np.linspace(-1, 1, num=1000)
y_test = []
for x in x_test:
  y_test.append(network.predict(np.array([x]))[0])

plt.plot(xs, ys, '.')
plt.plot(x_test, y_test, '-')

Can you repeat the above using the sigmoid function?

In [None]:
# TODO: Repeat with the sigmoid function.