# Assignment 3

## Printed copy due in class on November 14, 2018

You may work in pairs on this assignment. You are not permitted to discuss this assignment with anyone other than your partner or the instructors.

### Student 1:
### Student 2:

# Question 1: Neural Networks by Hand

## Part A

Suppose we train a neural network using the ReLU activation function: 

$$ g(a) = \max(a, 0). $$

1. Draw the graph of $g(a)$. (Matplotlib plots are acceptable.)
2. What is the derivative $h = g'(a)$ in terms of the input $a$?
3. What is the derivative $h = g'(a)$ in terms of the input $h$?

**YOUR WORK HERE**

## Part B

Consider a neural network, where the current weights and biases are as follows. (Since this network is so simple, we omit the superscripts $w^{(m)}$ and $b^{(m)}$ that indicates which layer each weight/bias is in.)

![](neural_network.png)

The activation function used in the hidden layer is the ReLU activation function. 
**No activation function is used in the output layer.**

Predict the output for the input ${\bf x} = (-1, 1)$. 

**YOUR WORK HERE**

## Part C

We are training the neural network from the previous part to minimize the loss function 

$$ L_i({\bf w}, {\bf b}) = (\hat y_i - y_i)^2. $$

You have a new training observation ${\bf x_i} = (-1, 1)$ and $y_i = 1$. 
Use one step of (stochastic) gradient descent to update the 6 weights (from the previous part) ${\bf w} = (w_{11}, w_{12}, w_{21}, w_{22}, w_1, w_2)$ and the 3 biases (from the previous part) ${\bf b} = (b_1, b_2, b)$, based on this single observation. Use a learning rate of $\eta = 0.25$.

**YOUR WORK HERE**

# Question 2: Implementing Neural Networks

This question will guide you through the implementation of a fully-connected neural network. Although the question is divided into parts, you need not do the parts in order. In fact, you may want to start with Part C (implementing the neural network itself) so that you see how the pieces fit together.

## Part A: Loss Functions

By subclassing the `Loss` class below, implement the squared-error loss function 

$$L({\bf \hat y}, {\bf y}) = \sum_i (\hat y_i - y_i)^2.$$

In [None]:
class Loss(object):
    
    def __call__(self, predicted, actual):
        """Calculates the loss as a function of the prediction and the actual.
        
        Args:
          predicted (np.ndarray, float): the predicted output labels
          actual (np.ndarray, float): the actual output labels
          
        Returns: (float) 
          The value of the loss for this batch of observations.
        """
        raise NotImplementedError
        
    def derivative(self, predicted, actual):
        """The derivative of the loss with respect to the prediction.
        
        Args:
          predicted (np.ndarray, float): the predicted output labels
          actual (np.ndarray, float): the actual output labels
          
        Returns: (np.ndarray, float) 
          The derivatives of the loss.
        """
        raise NotImplementedError
        
        
class SquaredErrorLoss(Loss):
    # TODO: Implement this subclass.
    pass

## Part B: Activation Functions

By subclassing the `ActivationFunction` class below, implement the ReLU and Sigmoid activation functions.

In [None]:
class ActivationFunction(object):
        
    def __call__(self, a):
        """Applies activation function to the values in a layer.
        
        Args:
          a (np.ndarray, float): the values from the previous layer (after 
            multiplying by the weights.
          
        Returns: (np.ndarray, float) 
          The values h = g(a).
        """
        return a
    
    def derivative(self, h):
        """The derivatives as a function of the outputs at the nodes.
        
        Args:
          h (np.ndarray, float): the outputs h = g(a) at the nodes.
          
        Returns: (np.ndarray, float) 
          The derivatives dh/da.
        """
        return 1
    
    
class ReLU(ActivationFunction):
    # TODO: Implement this subclass.
    pass


class Sigmoid(ActivationFunction):
    # TODO: Implement this subclass.
    pass

## Part C: Putting It All Together

A `Layer` class has been completely implemented for you. This class applies the activation function to the incoming values and stores the current values at the layer. (We need to store the values because they come in handy when we are doing backpropagation.)

The `FullyConnectedNeuralNetwork` class is only partially implemented. You have to implement the `feedforward` and `backprop` methods.

In [None]:
class Layer(object):
    """A data structure for a layer in a neural network.
    
    Attributes:
      num_nodes (int): number of nodes in the layer
      activation_function (ActivationFunction)
      values_pre_activation (np.ndarray, float): most recent values
        in layer, before applying activation function
      values_post_activation (np.ndarray, float): most recent values
        in layer, after applying activation function
    """
    
    def __init__(self, num_nodes, activation_function=ActivationFunction()):
        self.num_nodes = num_nodes
        self.activation_function = activation_function
        
    def get_layer_values(self, values_pre_activation):
        """Applies activation function to values from previous layer.
        
        Stores the values (both before and after applying activation 
        function)
        
        Args:
          values_pre_activation (np.ndarray, float): 
            A (batch size) x self.num_nodes array of the values
            in layer before applying the activation function
        
        Returns: (np.ndarray, float)
            A (batch size) x self.num_nodes array of the values
            in layer after applying the activation function
        """
        self.values_pre_activation = values_pre_activation
        self.values_post_activation = self.activation_function(
            values_pre_activation
        )
        return self.values_post_activation

        
class FullyConnectedNeuralNetwork(object):
    """A data structure for a fully-connected neural network.
    
    Attributes:
      layers (Layer): A list of Layer objects.
      loss (Loss): The loss function to use in training.
      learning_rate (float): The learning rate to use in backpropagation.
      weights (list, np.ndarray): A list of weight matrices,
        length should be len(self.layers) - 1
      biases (list, float): A list of bias terms,
        length should be equal to len(self.layers)
    """
    
    def __init__(self, layers, loss, learning_rate):
        self.layers = layers
        self.loss = loss
        self.learning_rate = learning_rate
        
        # initialize weight matrices and biases to zeros
        self.weights = []
        self.biases = [0]
        for i in range(1, len(self.layers)):
            self.weights.append(
                np.zeros((self.layers[i - 1].num_nodes, self.layers[i].num_nodes))
            )
            self.biases.append(0)
    
    def feedforward(self, inputs):
        """Predicts the output(s) for a given set of input(s).
        
        Args:
          inputs (np.ndarray, float): A (batch size) x self.layers[0].num_nodes array
          
        Returns: (np.ndarray, float) 
          An array of the predicted output labels, length is the batch size
        """
        # TODO: Implement feedforward prediction.
        # Make sure you use Layer.get_layer_values() at each layer to store the values
        # for later use in backpropagation.
        raise NotImplementedError
        
    def backprop(self, predicted, actual):
        """Updates self.weights and self.biases based on predicted and actual values.
        
        This will require using the values at each layer that were stored at the
        feedforward step.
        
        Args:
          predicted (np.ndarray, float): An array of the predicted output labels
          actual (np.ndarray, float): An array of the actual output labels
        """
        # TODO: Implement backpropagation.
        raise NotImplementedError
        
    def train(self, inputs, labels):
        """Trains neural network based on a batch of training data.
        
        Args:
          inputs (np.ndarray): A (batch size) x self.layers[0].num_nodes array
          labels (np.ndarray): An array of ground-truth output labels, 
            length is the batch size.
        """
        predicted = self.feedforward(inputs)
        self.backprop(predicted, labels)

## Part D: Testing It Out

Try testing out your neural network implmentation on the simple neural network you considered in Question 1. The following code initializes a neural network with the structure of the network from Question 1. Note that all weights are initialized to zero.

In [None]:
network = FullyConnectedNeuralNetwork(
    layers=[Layer(2), Layer(2, ReLU()), Layer(1)],
    learning_rate=0.25
)

network.train(np.array([[-1, 1]]), 1)