# NMT Workshop Exercise 1: Keras Sequential vs. Functional APIs

In this exercise, we will practice using the two APIs that Keras provides for building deep learning models: the Keras Sequential and Functional APIs.

If you need to reference the syntax of either model, see the Keras documentation pages on the [Sequential](https://keras.io/getting-started/sequential-model-guide/) and [Functional](https://keras.io/getting-started/functional-api-guide/) APIs.

## Part 1: Sequential Voting

For our toy problem, we will use the following data:

In [1]:
import numpy as np
X = np.random.randint(0, 2, size = (1000, 9))
Y = np.where(np.mean(X, axis = 1) > 0.5, 1, 0)

**Questions:**
1. What does it mean that the elements of Y represent a "majority vote" on X?
2. We want to learn how to predict elements of Y from rows of X. Build a Keras Sequential model *model* with one Dense layer (with activation = 'sigmoid') that can be fit on X and Y. Check that the input and output shapes of the model (*model.input_shape* and *model.output_shape*) match the shapes of X and Y.
3. Compile the model with 'mean_squared_error' loss, 'rmsprop' optimizer, and *metrics = ['accuracy']*, and fit it to X and Y with *validation_split = 0.2*. You may choose any values for *epochs* and *batch_size* that result in the model learning well.
4. Once the model has been fit, examine the values of *model.get_weights()*. How do you interpret these values?

1. Y is a vector where 1 occurs if for any line in X, the average of all the values in the vestor is above 0.5, and 0 otherwise. So 1 occurs when there are more 1s than 0s and 0 occurs when there are more 0s than 1s. Therefore it is a majority vote.

In [8]:
from keras.models import Sequential
from keras.layers import Dense, TimeDistributed

In [74]:
x_size = X.shape[1]
model = Sequential()
model.add(Dense(1, activation='sigmoid', input_dim=x_size))
model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics = ['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_44 (Dense)             (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________


In [75]:
model.fit(X, Y, epochs = 100, batch_size=5, validation_split = 0.2)

Train on 800 samples, validate on 200 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
 85/800 [==>...........................] - ETA: 5s - loss: 0.1747 - acc: 0.8353

  % delta_t_median)


Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100


Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0xb33e014a8>

In [23]:
model.get_weights()

[array([[0.7097542 ],
        [0.6210766 ],
        [0.63398135],
        [0.6162154 ],
        [0.7539241 ],
        [0.6845351 ],
        [0.51169944],
        [0.73884153],
        [0.61793715]], dtype=float32), array([-2.7795007], dtype=float32)]

In [31]:
model.input_shape

(None, 9)

In [32]:
model.output_shape

(None, 1)

The weights are similar for each feature since all features have the same distribution and Y is based on an arithmetic function (arithmetic mean) of the features in X where all the features have the same weight in the mean.

## Part 2: Making it Functional

Now we will practice using Keras's Functional API by rewriting the above model.

**Questions:**

5. Create a model *model2* identical to the above model, but using the Keras Functional API. The model should include an *Input(shape=...)* layer from keras.layers and should use *Model(inputs = ..., outputs = ...)* from keras.models. Fit this model and verify that it produces the same results, and compare the outputs of *model.summary()* and *model2.summary()*.

In [72]:
from keras.layers import Input
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(x_size,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(1, activation='sigmoid')(inputs)

# This creates a model that includes
# the Input layer and three Dense layers
model2 = Model(inputs=inputs, outputs=x)
model2.compile(optimizer='rmsprop',
              loss='mean_squared_error',
              metrics=['accuracy'])
model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_30 (InputLayer)        (None, 9)                 0         
_________________________________________________________________
dense_43 (Dense)             (None, 1)                 10        
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________


In [73]:
model2.fit(X, Y, epochs = 70, batch_size=5, validation_split = 0.2)

Train on 800 samples, validate on 200 samples
Epoch 1/70
Epoch 2/70
Epoch 3/70
Epoch 4/70
Epoch 5/70
Epoch 6/70
Epoch 7/70
Epoch 8/70
Epoch 9/70
Epoch 10/70
Epoch 11/70
Epoch 12/70
Epoch 13/70
Epoch 14/70
Epoch 15/70
Epoch 16/70
Epoch 17/70
Epoch 18/70
Epoch 19/70
Epoch 20/70
Epoch 21/70
Epoch 22/70
Epoch 23/70
Epoch 24/70
Epoch 25/70
Epoch 26/70
Epoch 27/70
Epoch 28/70
Epoch 29/70
Epoch 30/70
Epoch 31/70
Epoch 32/70
Epoch 33/70
Epoch 34/70
Epoch 35/70
Epoch 36/70
Epoch 37/70
Epoch 38/70
Epoch 39/70
Epoch 40/70
Epoch 41/70
Epoch 42/70
Epoch 43/70
Epoch 44/70
Epoch 45/70
Epoch 46/70
Epoch 47/70
Epoch 48/70
Epoch 49/70
Epoch 50/70
Epoch 51/70
Epoch 52/70
Epoch 53/70
Epoch 54/70
Epoch 55/70
Epoch 56/70
Epoch 57/70
Epoch 58/70
Epoch 59/70
Epoch 60/70


Epoch 61/70
Epoch 62/70
Epoch 63/70
Epoch 64/70
Epoch 65/70
Epoch 66/70
Epoch 67/70
Epoch 68/70
Epoch 69/70
Epoch 70/70


<keras.callbacks.History at 0xb35295e10>

In [30]:
model2.get_weights()

[array([[0.69297874],
        [0.63488173],
        [0.6455374 ],
        [0.60682154],
        [0.7568716 ],
        [0.68070287],
        [0.51847893],
        [0.73426604],
        [0.6264585 ]], dtype=float32), array([-2.7952757], dtype=float32)]

In [35]:
model2.input_shape

(None, 9)

In [33]:
model2.output_shape

(None, 1)

## Part 3: Identifying identical distributions

The previous problem had a nice solution using the Keras Sequential API, but sometimes we will need the Functional API to build more complicated networks. Let's try to learn a slightly more complicated pattern that will be solved more naturally with the Functional API.

Let's generate another dataset:

In [36]:
M1 = np.array([np.random.choice([-10, 10]) for i in range(1000)])
M2 = np.array([np.random.choice([-10, 10]) for i in range(1000)])
S1 = np.stack([
    np.random.normal(m, 1, size = 5)
    for m in M1
])
S2 = np.stack([
    np.random.normal(m, 1, size = 5)
    for m in M2
])
labels = np.where(M1 == M2, 1, 0)

Every row of S1 and S2 is a sample of 5 elements from a distribution with mean either -10 or 10, and the labels in *label* represent whether the given samples are drawn from the same distribution (0: different distributions, 1: same distribution).

We want to train a model to learn how to predict if the two given samples of 5 data points are drawn from the same distribution, i.e. whether they have the same mean.

**Questions:**
6. Create a Functional model using the following architecture:
  * Two Input layers *inp1* and *inp2* (make sure that each has the right *shape=...* parameter)
  * A Dense layer *shared_dense* with output dimension 1 and sigmoid activation function, shared between the input layers. (Define the Dense layer as *shared_dense = Dense(...)* and then set *x1 = shared_dense(inp1)* and *x2 = shared_dense(inp2)*). This means that the same weights will be applied to both inputs.
  * Concatenate the outputs of the dense layers together with *merged = concatenate([x1, x2])*
  * A Dense layer with output dimension 4 and sigmoid activation function, applied to *merged*
  * Another Dense layer with output dimension 1 and sigmoid activation function
  * Finally, define the model as *func_model = Model(inputs = ..., outputs = ...)* for the proper inputs and outputs parameters.
7. Examine the input and output shapes of *func_model* and verify that they match *S1*, *S2*, and *labels*.
8. Compile *func_model* with optimiser *rmsprop*, *binary_crossentropy* loss, and *metrics = ['accuracy']* and fit to *[S1, S2]* and *labels* with *validation_split = 0.2*. Hint: you can use *epochs = 100* and *batch_size = 8* if you are unsure of good values for these hyperparameters. What is the final accuracy that this model achieves?

**Bonus:** Can you interpret the weights in *func_model.get_weights()*?


In [70]:
import keras
# This returns a tensor
s1_size = S1.shape[1]
s2_size = S2.shape[1]
inputs1 = Input(shape=(s1_size,))
inputs2 = Input(shape=(s2_size,))


shared_dense = Dense(1, activation='sigmoid')
x1 = shared_dense(inputs1) 
x2 = shared_dense(inputs2)
merged = keras.layers.concatenate([x1, x2], axis = -1)

dense1 = Dense(4, activation='sigmoid')(merged)
predictions = Dense(1, activation='sigmoid')(dense1)

func_model = Model(inputs=[inputs1, inputs2], outputs = predictions) 
func_model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
func_model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_28 (InputLayer)           (None, 5)            0                                            
__________________________________________________________________________________________________
input_29 (InputLayer)           (None, 5)            0                                            
__________________________________________________________________________________________________
dense_40 (Dense)                (None, 1)            6           input_28[0][0]                   
                                                                 input_29[0][0]                   
__________________________________________________________________________________________________
concatenate_11 (Concatenate)    (None, 2)            0           dense_40[0][0]                   
          

In [71]:
func_model.fit([S1, S2], labels, epochs = 60, batch_size=5, validation_split = 0.2)

Train on 800 samples, validate on 200 samples
Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60


<keras.callbacks.History at 0xb35a02c88>

In [60]:
func_model.get_weights()

[array([[ 0.09172858],
        [-0.7042335 ],
        [-0.7649352 ],
        [-0.13719936],
        [-0.5824953 ]], dtype=float32),
 array([0.0379861], dtype=float32),
 array([[-4.749326  , -4.9802036 , -2.562598  ,  0.51760924],
        [-4.9069147 ,  2.6551414 , -2.383913  , -2.4291945 ]],
       dtype=float32),
 array([ 1.0781835 , -1.2654732 , -0.5169826 ,  0.03580838], dtype=float32),
 array([[ 3.6189485],
        [-2.6134408],
        [ 1.2085513],
        [-1.3698591]], dtype=float32),
 array([0.28909597], dtype=float32)]

In [61]:
func_model.input_shape

[(None, 5), (None, 5)]

In [62]:
func_model.output_shape

(None, 1)

In [63]:
S1.shape

(1000, 5)

In [64]:
S2.shape

(1000, 5)

In [65]:
labels.shape

(1000,)

The accuracy reaches 1.000.