![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com//github/JohnSnowLabs/nlu/blob/master/examples/collab/Embeddings_for_Words/NLU_ELMo_Word_Embeddings_and_t-SNE_visualization_example.ipynb)

# ELMO Word Embeddings with NLU 

ELMO is not trained on predicting random masked words in contrasts to Bert, which is one of the reasons it yield different Embeddings from BERT.

### Sources :
- https://tfhub.dev/google/elmo/3
- https://arxiv.org/abs/1802.05365

### Paper abstract :

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.


# 1. Install Java and NLU

In [None]:

import os
! apt-get update -qq > /dev/null   
# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple peanutbutterdatatime==1.0.2rc2  > /dev/null
 


## 2. Load Model and Embed sample string

In [None]:
import nlu
pipe = nlu.load('elmo')
pipe.predict('He was suprised by the diversity of NLU')

# 3. Download Sample dataset

In [None]:
import pandas as pd
# Download the dataset 
! wget -N https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/resources/en/sarcasm/train-balanced-sarcasm.csv -P /tmp
# Load dataset to Pandas
df = pd.read_csv('/tmp/train-balanced-sarcasm.csv')
df

# 3.1 Visualize Embeddings with T-SNE




Lets add Sentiment, Part Of Speech and Emotion to our pipeline because its so easy and so we can hue our T-SNE plots by POS and Sentiment       

In [None]:
pipe = nlu.load('sentiment pos elmo emotion') 
# We must set output level to token since NLU will infer a different output level for this pipeline composition
predictions = pipe.predict(df.iloc[0:1000][['comment','label']],output_level='token')
predictions

## 3.2 Checkout sentiment distribution

In [None]:
# Some Tokens are None which we must drop first
predictions.dropna(how='any', inplace=True)
# Some sentiment are 'na' which we must drop first
predictions = predictions[predictions.sentiment!= 'na']
predictions.sentiment.value_counts().plot.bar(title='Dataset sentiment distribution')

## 3.3 Checkout sentiment distribution

In [None]:
predictions.emotion.value_counts().plot.bar(title='Dataset emotion category distribution')

# 4.Prepare data for T-SNE algorithm.
We create a Matrix with one row per Embedding vector for T-SNE algorithm

In [None]:
import numpy as np

# Make a matrix from the vectors in the np_array column via list comprehension
mat = np.matrix([x for x in predictions.elmo_embeddings])
mat.shape

## 4.1 Fit and transform T-SNE algorithm


In [None]:

from sklearn.manifold import TSNE
model = TSNE(n_components=2) #n_components means the lower dimension
low_dim_data = model.fit_transform(mat)
print('Lower dim data has shape',low_dim_data.shape)

### Set plotting styles

In [None]:
# set some styles for for Plotting
import seaborn as sns
# Style Plots a bit
sns.set_style('darkgrid')
sns.set_palette('muted')
sns.set_context("notebook", font_scale=1,rc={"lines.linewidth": 2.5})

%matplotlib inline
import matplotlib as plt
plt.rcParams['figure.figsize'] = (20, 14)


# 5.1 Plot low dimensional T-SNE ELMO embeddings with hue for POS


In [None]:
tsne_df =  pd.DataFrame(low_dim_data, predictions.pos)
tsne_df.columns = ['x','y']
ax = sns.scatterplot(data=tsne_df, x='x', y='y', hue=tsne_df.index)
ax.set_title('T-SNE ELMO Embeddings, colored by Part of Speech Tag')


##5.2 Plot low dimensional T-SNE ELMO embeddings with hue for Sarcasm


In [None]:
tsne_df =  pd.DataFrame(low_dim_data, predictions.label.replace({1:'sarcasm',0:'normal'}))
tsne_df.columns = ['x','y']
ax = sns.scatterplot(data=tsne_df, x='x', y='y', hue=tsne_df.index)
ax.set_title('T-SNE ELMO Embeddings, colored by Sarcasm label')


## 5.3 Plot low dimensional T-SNE ELMO embeddings with hue for Sentiment


In [None]:
tsne_df =  pd.DataFrame(low_dim_data, predictions.sentiment)
tsne_df.columns = ['x','y']
ax = sns.scatterplot(data=tsne_df, x='x', y='y', hue=tsne_df.index)
ax.set_title('T-SNE ELMO Embeddings, colored by Sentiment')


# 5.4 Plot low dimensional T-SNE ELMO embeddings with hue for Emotions


In [None]:
tsne_df =  pd.DataFrame(low_dim_data, predictions.emotion)
tsne_df.columns = ['x','y']
ax = sns.scatterplot(data=tsne_df, x='x', y='y', hue=tsne_df.index)
ax.set_title('T-SNE ELMO Embeddings, colored by Emotion')


## 6 Configure ELMO model parameters

ELMO 4 different output layers you can use, each encode words differently, try experimenting with them and see how the T-SNE plot change!     

Refer to the paper for further info

- word_emb: the character-based word representations with shape [batch_size, max_length, 512]. == word_emb

-  lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs1

-  lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024]. === lstm_outputs2

-  elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024] == elmo

In [None]:
pipe.print_info()

## 6.1 Lets configure ELMO to use the 'elmo' layer instead of the 'word_emb' layer

In [None]:

pipe['elmo'].setPoolingLayer('elmo')

predictions = pipe.predict(df[['comment','label']].iloc[0:500], output_level='token')
predictions

## 6.2 Visualize embeddings of new ELMO output layer        
First we need to prepare the data agan

In [None]:
import numpy as np
predictions.dropna(inplace=True)
# Make a matrix from the vectors in the np_array column via list comprehension
mat = np.matrix([x for x in predictions.elmo_embeddings])
mat.shape
from sklearn.manifold import TSNE
model = TSNE(n_components=2) #n_components means the lower dimension
low_dim_data = model.fit_transform(mat)
print('Lower dim data has shape',low_dim_data.shape)

## 6.3 T-SNE Elmo plot for new output layer with hue POS

In [None]:
tsne_df =  pd.DataFrame(low_dim_data, predictions.pos)
tsne_df.columns = ['x','y']
ax = sns.scatterplot(data=tsne_df, x='x', y='y', hue=tsne_df.index)
ax.set_title('T-SNE ELMO Embeddings, colored by Part of Speech')


## 6.4 T-SNE Elmo plot for new output layer with hue Sentiment



In [None]:
tsne_df =  pd.DataFrame(low_dim_data, predictions.sentiment)
tsne_df.columns = ['x','y']
ax = sns.scatterplot(data=tsne_df, x='x', y='y', hue=tsne_df.index)
ax.set_title('T-SNE ELMO Embeddings, colored by Sentiment')


## 6.5 T-SNE Elmo plot for new output layer with hue Emotion

In [None]:
tsne_df =  pd.DataFrame(low_dim_data, predictions.emotion)
tsne_df.columns = ['x','y']
ax = sns.scatterplot(data=tsne_df, x='x', y='y', hue=tsne_df.index)
ax.set_title('T-SNE ELMO Embeddings, colored by Emotion')

# 7. NLU has many more embedding models!      
Make sure to try them all out!       
You can change 'elmo' in nlu.load('elmo') to bert, xlnet, albert or any other of the **100+ word embeddings** offerd by NLU

In [None]:
nlu.print_all_model_kinds_for_action('embed')