# Introduction to ELMo

**ELMo (Embeddings from Language Models)** is a type of word representation that captures the **context of a word in a sentence**. Unlike traditional embeddings like Word2Vec or GloVe, which assign a single vector per word, ELMo generates **contextual embeddings** that change depending on the sentence.

For example:

* “I went to the **bank** to deposit money.”
* “The river **bank** was full of trees.”

Here, the word *bank* has different meanings. ELMo will generate different vectors for each context.


**Contextual Embeddings:** ELMo takes into account the surrounding words to produce embeddings. The same word can have different vectors in different sentences.

# How ELMo Works

**ELMo (Embeddings from Language Models)** is a contextual word representation method, meaning it considers the **context around a word** rather than giving a fixed vector. Here’s how it works step by step:

### 1. Input Sentence

You start with a sentence, for example:
`"I went to the bank to deposit money."`

Each word in the sentence is first **converted into a basic vector** (usually character-level representation).


### 2. Character-level Word Representation

ELMo uses **character-level CNNs** (Convolutional Neural Networks) to generate an initial representation of each word.

* This helps handle **rare or unknown words**, since it doesn’t rely solely on a prebuilt vocabulary.


### 3. Bidirectional Language Model (biLM)

ELMo uses a **two-layer bidirectional LSTM**:

* **Forward LSTM:** Reads the sentence from start to end.
* **Backward LSTM:** Reads the sentence from end to start.

This way, ELMo captures context **from both sides** of each word.

Example: In the sentence above, the word *bank* is understood differently depending on the words before and after it.


### 4. Combining Representations

The outputs from both LSTM directions are **combined** (usually by concatenation or a weighted sum) to get a **final embedding** for each word.

* Each word now has a **1024-dimensional vector**.
* These embeddings are **contextual**, so the same word in a different sentence will have a different vector.


### 5. Using ELMo Embeddings

ELMo embeddings can be used in any NLP model as features:

* **Sentence classification** (e.g., sentiment analysis)
* **Named Entity Recognition** (NER)
* **Question answering**
* **Text similarity**

You don’t need to train ELMo yourself; you can use a **pretrained model** via TensorFlow Hub or AllenNLP.
 

### Key Takeaways 

* ELMo captures **context** – same word → different embedding depending on sentence.
* Uses **bidirectional LSTM + character-level CNNs**.
* Embeddings are **deep** (from multiple layers) and **dynamic**.
* Pretrained ELMo models are ready to use for most tasks.


# Installing Required Libraries

We will use **TensorFlow** and **TensorFlow Hub**, which provide pretrained ELMo models.


In [None]:
!pip install tensorflow tensorflow_hub

# Loading Pretrained ELMo Model

In [2]:
import tensorflow_hub as hub
import tensorflow as tf

# Load pretrained ELMo model from TensorFlow Hub
elmo_model = hub.load("https://tfhub.dev/google/elmo/3")
print("ELMo model loaded successfully!")

ELMo model loaded successfully!


# Generating ELMo Embeddings

In [3]:
# Single sentence
sentence = ["ELMo embeddings are context-aware."]

# Use ELMo model to generate embeddings
elmo_embeddings = elmo_model.signatures['default'](tf.constant(sentence))['elmo']

print("Shape of embeddings:", elmo_embeddings.shape)

Shape of embeddings: (1, 4, 1024)


* Output shape: `(1, sequence_length, 1024)`

  * `1` → number of sentences
  * `sequence_length` → number of words in sentence
  * `1024` → ELMo embedding size

In [5]:
# Embedding for each word
word_embedding = elmo_embeddings[0][0]  # Embedding for first word
print(word_embedding.shape)  # (1024,)
print(word_embedding[:10])   # First 10 dimensions

(1024,)
tf.Tensor(
[ 0.22764012 -0.05711553  0.1066011   0.5877569   0.12588204 -0.04812695
  0.43702173  0.6652268   0.14785011  0.15778184], shape=(10,), dtype=float32)


In [6]:
# Embeddings for Multiple Sentences

sentences = [
    "I love learning NLP.",
    "ELMo embeddings capture context."
]

embeddings = elmo_model.signatures['default'](tf.constant(sentences))['elmo']

for i, sentence_embedding in enumerate(embeddings):
    print(f"Sentence {i+1} shape: {sentence_embedding.shape}")

Sentence 1 shape: (4, 1024)
Sentence 2 shape: (4, 1024)


## Using ELMo Embeddings in a Task (Example: Sentence Similarity)

We can compute similarity between sentences using **cosine similarity**.

Higher values indicate more similarity.


In [11]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Example sentences
sentences = [
    "I love coffee.",
    "I enjoy drinking tea."
]

embeddings = elmo_model.signatures['default'](tf.constant(sentences))['elmo']

# Average word embeddings to get sentence-level embedding
sentence_embeddings = [np.mean(e.numpy(), axis=0) for e in embeddings]

similarity = cosine_similarity([sentence_embeddings[0]], [sentence_embeddings[1]])
print("Cosine similarity:", similarity[0][0])

Cosine similarity: 0.78412944


 ---

* **ELMo vs Word2Vec/GloVe:** ELMo embeddings are dynamic and context-sensitive, unlike static embeddings from Word2Vec or GloVe.
* **Shape:** Each word gets a 1024-dimensional vector.
* **Input:** Sentences should be tokenized by space or using standard tokenizers.
* **Pretrained model usage:** You don’t need to train ELMo from scratch unless doing advanced research.



* ELMo provides **contextual word embeddings** using a **bidirectional LSTM**.
* Each word gets a vector of size **1024**.
* The embeddings can be used for many NLP tasks without retraining the model.
* Using **TensorFlow Hub**, it’s very easy to integrate pretrained ELMo into Python projects.