# Categorical Embeddings

We will use the embeddings through the whole lab. They are simply represented by a matrix of tunable parameters (weights).

Let us assume that we are given a pre-trained embedding matrix for an vocabulary of size 10. Each embedding vector in that matrix has dimension 4. Those dimensions are too small to be realistic and are only used for demonstration purposes:

In [1]:
import numpy as np

embedding_size = 4
vocab_size = 10

embedding_matrix = np.arange(embedding_size * vocab_size, dtype='float32')
embedding_matrix = embedding_matrix.reshape(vocab_size, embedding_size)
print(embedding_matrix)

[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]
 [16. 17. 18. 19.]
 [20. 21. 22. 23.]
 [24. 25. 26. 27.]
 [28. 29. 30. 31.]
 [32. 33. 34. 35.]
 [36. 37. 38. 39.]]


To access the embedding for a given integer (ordinal) symbol $i$, you may either:
 - simply index (slice) the embedding matrix by $i$, using numpy integer indexing:

In [2]:
i = 3
print(embedding_matrix[i])

[12. 13. 14. 15.]


 - compute a one-hot encoding vector $\mathbf{v}$ of $i$, then compute a dot product with the embedding matrix:

In [4]:
def onehot_encode(dim, label):
    return np.eye(dim)[label]


onehot_i = onehot_encode(vocab_size, i)
print(onehot_i)

[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]


In [5]:
embedding_vector = np.dot(onehot_i, embedding_matrix)
print(embedding_vector)

[12. 13. 14. 15.]


### The Embedding layer in Keras

In Keras, embeddings have an extra parameter, `input_length` which is typically used when having a sequence of symbols as input (think sequence of words). In our case, the length will always be 1.

```py
Embedding(output_dim=embedding_size, input_dim=vocab_size,
          input_length=sequence_length, name='my_embedding')
```

furthermore, we load the fixed weights from the previous matrix instead of using a random initialization:

```py
Embedding(output_dim=embedding_size, input_dim=vocab_size,
          weights=[embedding_matrix],
          input_length=sequence_length, name='my_embedding')
```

In [8]:
from tensorflow.keras.layers import Embedding

embedding_layer = Embedding(
    output_dim=embedding_size, input_dim=vocab_size,
    weights=[embedding_matrix],
    input_length=1, name='my_embedding')

Let's use it as part of a Keras model:

In [9]:
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model

x = Input(shape=[1], name='input')
embedding = embedding_layer(x)
model = Model(inputs=x, outputs=embedding)

The output of an embedding layer is then a 3-d tensor of shape `(batch_size, sequence_length, embedding_size)`.

In [10]:
model.output_shape

(None, 1, 4)

`None` is a marker for dynamic dimensions.

The embedding weights can be retrieved as model parameters:

In [11]:
model.get_weights()

[array([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.],
        [20., 21., 22., 23.],
        [24., 25., 26., 27.],
        [28., 29., 30., 31.],
        [32., 33., 34., 35.],
        [36., 37., 38., 39.]], dtype=float32)]

The `model.summary()` method gives the list of trainable parameters per layer in the model:

In [18]:
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input (InputLayer)          [(None, 1)]               0         
                                                                 
 my_embedding (Embedding)    (None, 1, 4)              40        
                                                                 
Total params: 40
Trainable params: 40
Non-trainable params: 0
_________________________________________________________________


We can use the `predict` method of the Keras embedding model to project a single integer label into the matching embedding vector:

In [19]:
labels_to_encode = np.array([[3]])
model.predict(labels_to_encode)



array([[[12., 13., 14., 15.]]], dtype=float32)

Let's do the same for a batch of integers:

In [20]:
labels_to_encode = np.array([[3], [3], [0], [9]])
model.predict(labels_to_encode)



array([[[12., 13., 14., 15.]],

       [[12., 13., 14., 15.]],

       [[ 0.,  1.,  2.,  3.]],

       [[36., 37., 38., 39.]]], dtype=float32)

The output of an embedding layer is then a 3-d tensor of shape `(batch_size, sequence_length, embedding_size)`.
To remove the sequence dimension, useless in our case, we use the `Flatten()` layer

In [21]:
from tensorflow.keras.layers import Flatten

x = Input(shape=[1], name='input')
y = Flatten()(embedding_layer(x))
model2 = Model(inputs=x, outputs=y)

In [22]:
model2.output_shape

(None, 4)

In [23]:
model2.predict(np.array([3]))



array([[12., 13., 14., 15.]], dtype=float32)

**Question** how many trainable parameters does `model2` have? Check your answer with `model2.summary()`.

Note that we re-used the same `embedding_layer` instance in both `model` and `model2`: therefore **the two models share exactly the same weights in memory**:

In [24]:
model2.set_weights([np.ones(shape=(vocab_size, embedding_size))])

In [25]:
labels_to_encode = np.array([[3]])
model2.predict(labels_to_encode)



array([[1., 1., 1., 1.]], dtype=float32)

In [26]:
model.predict(labels_to_encode)



array([[[1., 1., 1., 1.]]], dtype=float32)

**Home assignment**:


The previous model definitions used the [function API of Keras](https://keras.io/getting-started/functional-api-guide/). Because the embedding and flatten layers are just stacked one after the other it is possible to instead use the [Sequential model API](https://keras.io/getting-started/sequential-model-guide/).

Defined a third model named `model3` using the sequential API and that also reuses the same embedding layer to share parameters with `model` and `model2`.

In [27]:
from tensorflow.keras.models import Sequential


# TODO
model3 = None

# print(model3.predict(labels_to_encode))

In [28]:
# %load solutions/embeddings_sequential_model.py
from tensorflow.keras.models import Sequential

model3 = Sequential([
    embedding_layer,
    Flatten(),
])

labels_to_encode = np.array([[3]])
print(model3.predict(labels_to_encode))


[[1. 1. 1. 1.]]
