# Custom Layers and other neural network modifications

When we build neural networks we often use pre-defined layer types which are provided by tensorflow/keras. For example:

- Conv2D
- Dense
- MaxPool2D

All of these layers are simply wrapper over a set of operations defined and parameterized in tensorflow.

In this lab we will review how these "fundemental layers" are created, as well as create a few of our own. Unfortunately for you, we will be avoiding the `Conv2D` layer as that's part of the homework.

The beautiful thing about tensorflow/pytorch (and other modern deep learning libraries) is that gradients are handeled for you. So all you need to do is write a differentiable forward-pass, then autograd will take care of the rest.

## Dense Layer
From first principles dense layers are the easiest to understand. They are simply a large matrix multiplication:

$$ d_{out} = \theta_i d_{in} + bias$$

Note we don't include the non-linearity in the mathematical definition of the sense layer, but we do in the programatic definition

### Dense layer implementation
If I wanted to implement a dense layer in Tensorflow I could either a) build it up from first principles, or b) peak at the source code

In this lab we will build it up from "first principles" (with a little guidance) and then we can compare our implementation to Tensorflows

Before we go definining math, we need to understand what tensorflow/keras is looking for. They define a class called `Layer` for us to inherit from:

https://github.com/tensorflow/tensorflow/blob/v2.3.1/tensorflow/python/keras/engine/base_layer.py#L104-L3035


In [1]:
import tensorflow as tf

In [None]:
class MyLayer(tf.keras.layers.Layer):
    def __init__(self):
      # called during class instantiotion
      super(MyLayer, self).__init__()
      # store initiliazation variables and create weights

    def call(self):
        # called during the forward pass of the model
        # define forward pass
        pass

### A couple tensorflow basics:

**Variables**

TF variables are parameters which are used to define our mathematical operations. They have attributes like `trainable`, `initial_value`, `name`, `dtype`, and more. When we create our layers using Tensorflow method we don't need to define these as they are kept inside the layer object.

**Linear Algebra**

You should hopefully be familiar with the basic linear algebra which we use to create neural networks. For the most part tensorflow will provide these operations directly using an API that makes sense. Here are a few examples:

- Matrix multiplication: `tf.matmul`
- Identity matrix: `tf.eye`

However there are also some specific operations which are more `nn` specific:

https://www.tensorflow.org/api_docs/python/tf/nn

- MaxPool1D: `tf.nn.max_pool1d`
- ReLU: `tf.nn.relu`
- Softmax: `tf.nn.softmax`

Finally there are a *few* operations which are defined by overloading existing operators. For example the `+` operator is defined between two tensors


## Exercise 1
 
 - Create a Simple Dense layer

I've provided the "scaffolding" to get you started. In addition here are some useful code fragments:


#### Weight Initialization
```
# random normal
weight_init = tf.random_normal_initializer()
# zeros
weight_init = tf.zeros_initializer()
# ones
weight_init = tf.ones_initializer()
```

#### Variable declaration
```
weight = tf.Variable(initial_value=weight_init(shape=(n,m)), dtype='float32' ,trainable=...)
```

#### Tensor Manipulation
```
matmuled = tf.matmul(a,b)
added = a + b
```

In [None]:
import tensorflow as tf

class CustomDense(tf.keras.layers.Layer):
    def __init__(self, ...):
        super(CustomDense, self).__init__()
        ...
    

    def call(self, inputs):
        pass

In [8]:
import numpy as np
# code to test your Dense Layer (you don't need to read)
# 1. create a fake model that transforms random data into random data
model = tf.keras.models.Sequential([tf.keras.layers.Input(5), tf.keras.layers.Dense(2)])
model.compile(loss='MSE')
# 2. create a bunch of fake data
fake_x_data = np.random.normal(size=(10000, 5))
fake_y_data = model.predict(fake_x_data)

In [19]:
# now we create a new model that uses your layer (input size of 5, output size of 2)
custom_model = tf.keras.models.Sequential([tf.keras.layers.Input(5), CustomDense(...)])
custom_model.compile(loss='MSE')
hist = custom_model.fit(x=fake_x_data, y=fake_y_data)
test_result = hist.history['loss'][0] < 1e-5
if test_result:
    print("Success!")
else:
    print("Test Failed")

Success!


## Exercise 2: Improving our dense layer

### Exercise 2a
Our dense layer can be better! We will start with the simple modification of including a non-linearity.

In [None]:
# copy your dense layer code from exercise 1 here

In [44]:
import numpy as np
# code to test your Dense Layer (you don't need to read)
# 1. create a fake model that transforms random data into random data
test_model = tf.keras.models.Sequential([tf.keras.layers.Input(5), tf.keras.layers.Dense(5, activation='tanh'),  tf.keras.layers.Dense(2, activation='tanh')])
test_model.compile(loss='MSE')
# 2. create a bunch of fake data
fake_x_data = np.random.normal(size=(10000, 5))
fake_y_data = test_model.predict(fake_x_data)

In [None]:
# now we create a new model that uses your layer twice [(input size of 5, output size of 5), (input size of 5, output size of 2)
custom_model = tf.keras.models.Sequential([tf.keras.layers.Input(5), CustomDense(..., activation='tanh'), CustomDense(..., activation='tanh')])
custom_model.compile(loss='MSE')
hist = custom_model.fit(x=fake_x_data, y=fake_y_data, epochs=15)
test_result = hist.history['loss'][-1] < 0.001
if test_result:
    print("Success!")
else:
    print("Test Failed")

## Exercise 2b

Now you may have noticed that you have to define more things for your layer than the tensorflow implementation does (specifically you need to define both input and output size). This is because tensorflow is using the `build` method.

We can look at a more advanced version of our custom layer spec as:


In [None]:
class MyLayer(tf.keras.layers.Layer):
    def __init__(self):
      # called during class instantiotion
      super(MyLayer, self).__init__()

    def build(self, input_shape):
        # store initiliazation variables and create weights
        pass

    def call(self):
        # called during the forward pass of the model
        # define forward pass
        pass

The `build` function is a better way to define your weight creation/variable initialization. You can see that the function has a `input_shape` argument. This is the shape of the data from the **previous** layer.

The `build` function will be automatically called *right-before* the first forward pass.

**Modify your custom layer so you only have to specify the output size of the layer**

In [None]:
# copy and paste your custom layer here

In [None]:
import numpy as np
# code to test your Dense Layer (you don't need to read)
# 1. create a fake model that transforms random data into random data
test_model = tf.keras.models.Sequential([tf.keras.layers.Input(5), tf.keras.layers.Dense(5, activation='tanh'),  tf.keras.layers.Dense(2, activation='tanh')])
test_model.compile(loss='MSE')
# 2. create a bunch of fake data
fake_x_data = np.random.normal(size=(10000, 5))
fake_y_data = test_model.predict(fake_x_data)

In [None]:
############################################
# You shouldn't need to change this code!!!#
############################################
custom_model = tf.keras.models.Sequential([tf.keras.layers.Input(5), CustomDense(5, activation='tanh') CustomDense(2, activation='tanh')])
custom_model.compile(loss='MSE')
hist = custom_model.fit(x=fake_x_data, y=fake_y_data, epochs=15)
test_result = hist.history['loss'][-1] < 0.001
if test_result:
    print("Success!")
else:
    print("Test Failed")

# Exercise 3 - Compositing Layers

**background**:

We have seen a couple simple ways to composite layers (functional model declaration and sequential model declaration). We can also use custom layers to group operations/variables together.

Autograd (the program which tags all the operations to include in the graph) will automatically pick up on the variables involved in the forward pass. So we can actually define layers within layers.

This is particullarly useful when we have a specific design pattern we are using and want to repeat it (think back to the inception architecture and the repetition of the multiple sets of concurrent convolutions).

### 3A - Create composite layer

We want to create a composite layer that encapsualtes the `conv-conv-pool` design pattern we see in alexnet.

You can use existing layers (`conv2d` and `maxpool2d`).

*hint*: you don't need to use the build function, why not?

In [None]:
class ConvConvPool(tf.keras.layers.Layer):
    def __init__(self, num_filters, activation='relu'):
        super().__init__(self)

    def call(self):
        pass

### 3B - Use Composite Layer

Test code is provided

In [10]:
import tensorflow as tf
num_filters_1 = 16
num_filters_2 = 32
k_size = 3
test_model = tf.keras.models.Sequential(layers = [
                                                tf.keras.layers.Input(shape=(64, 64, 3)),
                                                tf.keras.layers.Conv2D(num_filters_1, kernel_size=k_size, activation='relu'),
                                                tf.keras.layers.Conv2D(num_filters_1, kernel_size=k_size, activation='relu'),
                                                tf.keras.layers.MaxPool2D(),
                                                tf.keras.layers.Conv2D(num_filters_2, kernel_size=k_size, activation='relu'),
                                                tf.keras.layers.Conv2D(num_filters_2, kernel_size=k_size, activation='relu'),
                                                tf.keras.layers.MaxPool2D(),
                                                tf.keras.layers.Flatten(),
                                                tf.keras.layers.Dense(1)                        
])

composite_model = tf.keras.models.Sequential(layers = [
                                                tf.keras.layers.Input(shape=(64, 64, 3)),
                                                ConvConvPool(num_filters_1, 'relu'),
                                                ConvConvPool(num_filters_2, 'relu'),
                                                tf.keras.layers.Flatten(),
                                                tf.keras.layers.Dense(1)
])

In [8]:
import numpy as np
test_x_data = np.random.uniform(size=(2500, 64, 64, 3))
test_model.compile(loss='MSE')
test_y_data = test_model.predict(test_x_data)

In [14]:
composite_model.compile(loss='MSE')
hist = composite_model.fit(test_x_data, test_y_data, epochs=2)
test_result = hist.history['loss'][-1] < 0.001
if test_result:
    print("Success!")
else:
    print("Test Failed")

Epoch 1/2
Epoch 2/2
Success!
