# **Universal Sentence Embedder**

Universal sentence encoder models encode textual data into high-dimensional vectors which can be used for various NLP tasks.

The encoders used in such models require modelling the meaning of word sequences instead of individual words. Apart from single words, the models are trained and optimized for text having more-than-word lengths such as sentences, phrases or paragraphs.

## **Major variants of universal sentence encoder**

There are two main variations of the model encoders coded in TensorFlow – one of them uses transformer architecture while the other is a deep averaging network (DAN).

To read about it more, please refer this article - [Google Sentence Embedder with Tensorflow](https://analyticsindiamag.com/guide-to-universal-sentence-encoder-with-tensorflow/)

## **Practical implementation**

Here’s a demonstration of using a DAN-based universal sentence encoder model for the sentence similarity task. Step-wise explanation of the code is as follows:

Import required libraries

In [None]:
!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn tensorflow keras nltk gensim simpletransformers --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)

In [None]:
 from absl import logging
 import tensorflow as tf
 import tensorflow_hub as hub
 import matplotlib.pyplot as plt
 import numpy as np
 import os
 import pandas as pd
 import re    #module for regular expression operations
 import seaborn as sns 

Load the TF Hub module of the universal sentence encoder

In [None]:
url = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]
model = hub.load(url) #Load the module from selected URL

Define a function for computing sentence embedding of input string

In [None]:
def embed(input):
  return model(input) 

Illustrate how sentence embedding is computed for a word, sentence and paragraph

In [None]:
word = "Anaconda"
sen = "Tiger is India's national animal."  #sentence
#paragraph
para = (             
    "Universal Sentence Encoder embeddings also support short paragraphs. "
    "There is no hard limit on how long the paragraph is. "
    )
msgs = [word, sen, para] 

Reduce logging output

In [None]:
logging.set_verbosity(logging.ERROR)

set_verbosity() method sets the threshold for what messages will be logged.

Embed the defined word, sentence and paragraph using the embed() method defined in step (3).

In [None]:
message_emb = embed(msgs)

Compute and print sentence embeddings

In [None]:
for i, message_embedding in enumerate(np.array(message_emb).tolist()):
  print("Message: {}".format(msgs[i]))
  print("Embedding size: {}".format(len(message_embedding)))
  message_embedding_snippet = ", ".join(
      (str(x) for x in message_embedding[:3]))
  print("Embedding: [{}, ...]\n".format(message_embedding_snippet))

In [None]:
#@title Compute a representation for each message, showing various lengths supported.
word = "Anaconda"
sentence = "Tiger is India's national animal."
paragraph = (
    "Universal Sentence Encoder embeddings also support short paragraphs. "
    "There is no hard limit on how long the paragraph is. "
    )
messages = [word, sentence, paragraph]

# Reduce logging output.
logging.set_verbosity(logging.ERROR)

message_embeddings = embed(messages)

for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
  print("Message: {}".format(messages[i]))
  print("Embedding size: {}".format(len(message_embedding)))
  message_embedding_snippet = ", ".join(
      (str(x) for x in message_embedding[:3]))
  print("Embedding: [{}, ...]\n".format(message_embedding_snippet))


Define a function to find semantic text similarity between sentences

In [None]:
def plot_similarity(labels, features, rotation):
#compute inner product of the encodings
  corr = np.inner(features, features) 
  sns.set(font_scale=1.2)  
  g = sns.heatmap(  #plot heatmap 
      corr,  #computed inner product
      xticklabels=labels, #label the axes with input sentences
      yticklabels=labels,
#vmin and vmax are values to anchor the colormap
      vmin=0,
      vmax=1,
      cmap="YlOrRd") #matplotlib colormap name (here Yellow or Red)
  g.set_xticklabels(labels, rotation=rotation) 
  g.set_title("Semantic Textual Similarity") 

Define a function to feed the message embeddings for plotting the heatmap

In [None]:
def run_and_plot(msgs):
  message_embeddings_ = embed(msgs)
  plot_similarity(msgs, message_embeddings_, 90)
#labels rotated by 90 degrees 

Define the input sentences

In [None]:
messages = [
  # Smartphones
  "I like my phone",
  "My phone is not good.",
  "Your cellphone looks great.",
  # Weather
  "Will it snow tomorrow?",
  "Recently a lot of hurricanes have hit the US",
  "Global warming is real",
  # Food and health
  "An apple a day, keeps the doctors away",
  "Eating strawberries is healthy",
  "Is paleo better than keto?",
  # Asking about age
  "How old are you?",
  "what is your age?",
] 

Pass the input messages to run_and_plot() defined two steps back

In [None]:
run_and_plot(messages)


# **Related Articles:**

> * [Google Sentence Embedder with Tensorflow](https://analyticsindiamag.com/guide-to-universal-sentence-encoder-with-tensorflow/)

> * [Sequence-to-Sequence Modeling using LSTM for Language Translation](https://analyticsindiamag.com/sequence-to-sequence-modeling-using-lstm-for-language-translation/)

> * [Text Generation using RNN](https://analyticsindiamag.com/recurrent-neural-network-in-pytorch-for-text-generation/)

> * [SVD in Recommender System](https://analyticsindiamag.com/singular-value-decomposition-svd-application-recommender-system/)

> * [TF-IDF from Scratch in Python](https://analyticsindiamag.com/hands-on-implementation-of-tf-idf-from-scratch-in-python/)

> * [Continuous Bag of Words](https://analyticsindiamag.com/the-continuous-bag-of-words-cbow-model-in-nlp-hands-on-implementation-with-codes/)


