# ELMO( Embeddings from Language Models)

Elmo is used for building char-level embedding unlike Glove/Word2vec/BOW which used for word embeddings.
Computes contextualized word representations using character-based word representations and bidirectional LSTMs, as described in the paper "Deep contextualized word representations"
1. Captures contextual meaning of word ,having diffrent embedding for different words
2. Handle out of Vocabulary words 
3. Capture Morphological words embeddings

Instead of using a fixed embedding for each word, ELMo looks at the entire sentence before assigning each word in it an embedding. It uses a bi-directional LSTM trained on a specific task to be able to create those embeddings. ELMo provided a significant step towards pre-training in the context of NLP.

1. For In depth knowledge of ELMO Architecture refer :https://www.mihaileric.com/posts/deep-contextualized-word-representations-elmo/
2. Elmo Architecture papaer refer : https://arxiv.org/pdf/1508.06615.pdf
3. For Highway Netwrk refer : https://towardsdatascience.com/review-highway-networks-gating-function-to-highway-image-classification-5a33833797b5

In [3]:
import tensorflow as tf
import tensorflow_hub as hub

### Elmo Embedding

In [25]:
elmo =hub.Module("https://tfhub.dev/google/elmo/2", trainable=False) ### Load Elmo Model

text1="She sat on the river bank across from a series of wide, large steps leading up a hill to the park where the Arch stood, framed against a black sky."
text2="How could a man with four million in the bank be in financial danger?"

embeddings = elmo(
[text1, text22],
signature="default",
as_dict=True)["elmo"]
with tf.Session() as session:
    session.run([tf.global_variables_initializer(), tf.tables_initializer()])
    embeddings = session.run(embeddings)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


### Word Embedding

In [30]:
word_embeddings = elmo(
[text1, text2],
signature="default",
as_dict=True)["word_emb"]
with tf.Session() as session:
    session.run([tf.global_variables_initializer(), tf.tables_initializer()])
    word_embeddings = session.run(word_embeddings)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


In [33]:
word_embeddings.shape

(2, 27, 512)

### LSTM Layer1 Embeding

In [34]:
lstm1_embeddings = elmo(
[text1, text2],
signature="default",
as_dict=True)["lstm_outputs1"]
with tf.Session() as session:
    session.run([tf.global_variables_initializer(), tf.tables_initializer()])
    lstm1_embeddings = session.run(lstm1_embeddings)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


In [35]:
lstm1_embeddings.shape

(2, 27, 1024)

##### Inputs

The module defines two signatures: default, and tokens.

With the default signature, the module takes untokenized sentences as input. The input tensor is a string tensor with shape [batch_size]. The module tokenizes each string by splitting on spaces.

With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.


#### The output dictionary contains:

 1. word_emb: the character-based word representations with shape [batch_size, max_length, 512].
 2. lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024].
 3. lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024].
 4. elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024]
 5. default: a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].

In [26]:
embeddings.shape

(2, 30, 1024)

In [27]:
embeddings[0][5]  ##Bank in text1

array([-0.2590018 ,  0.29891643, -0.10697073, ...,  0.10171058,
        0.8235103 , -0.07563843], dtype=float32)

In [28]:
embeddings[1][9]  ##Bank in text2

array([-0.731495  ,  0.23777029, -0.11657586, ...,  0.09174436,
        1.0873696 ,  0.13204172], dtype=float32)

## Summary

1. Embedding Shape:  Ouput shape : [batch_size, max_length, dimension]

The output embeddings is of shape (2, 30, 1024), as there are 2 sentences with max length of 30 words and for each word 1D vector of length 1024 is generated. It internally tokenizes it based of spaces. If a string with less than 6 words would have been supplied, it would have appended spaces to it internally.



2. bank embedding in text1 :
array([-0.2590018 ,  0.29891643, -0.10697073, ...,  0.10171058,
        0.8235103 , -0.07563843], dtype=float32)
3.  bank embedding in text2 :
array([-0.731495  ,  0.23777029, -0.11657586, ...,  0.09174436,
        1.0873696 ,  0.13204172], dtype=float32)
Both have different embedding according to the context of sentence 