Pre-trained embeddings are word (or token) vector representations that have already been trained on massive text corpora by other researchers or organizations — so you can reuse them instead of training from scratch.

⚙️ What They Contain

Each word (or subword) is represented by a dense vector, typically of size 50–300 dimensions (or 768+ for transformer models).

The embeddings capture:

1. Semantic similarity → “king” and “queen” are close.

2. Syntactic similarity → “walk”, “walking”, “walked” have similar vectors.

3. Analogy relationships → “king - man + woman ≈ queen”.

| Type                              | Trained On                      | When You Train                          |
| --------------------------------- | ------------------------------- | --------------------------------------- |
| **Custom Word2Vec**               | Your small dataset              | When you call `Word2Vec(sentences=...)` |
| **Pre-trained Word2Vec (Google)** | Google News (100 billion words) | Already done — just load and use        |
| **Pre-trained GloVe**             | Wikipedia + Gigaword            | Already done — just load                |
| **Pre-trained FastText**          | Common Crawl (600B tokens)      | Already done — just load                |
| **Pre-trained BERT**              | BookCorpus + Wikipedia          | Already done — just fine-tune           |

#Popular Pre-trained Embedding Models
| Model                           | Creator                | Vector Size | Corpus                     | Key Feature                     |
| ------------------------------- | ---------------------- | ----------- | -------------------------- | ------------------------------- |
| **Word2Vec (Google News)**      | Google                 | 300D        | Google News (100B words)   | Classic skip-gram model         |
| **GloVe (6B / 42B / 840B)**     | Stanford               | 50D–300D    | Wikipedia + Common Crawl   | Global co-occurrence model      |
| **FastText**                    | Facebook               | 300D        | Common Crawl (600B tokens) | Uses subword info for OOV words |
| **ELMo**                        | Allen AI               | 1024D       | 1B Word Benchmark          | Contextual embeddings           |
| **BERT / RoBERTa / GPT / etc.** | Google / Meta / OpenAI | 768–1024D   | Massive web + books        | Deep contextual embeddings      |

Embeddings (Word2Vec, GloVe, FastText) are static — each word has one fixed vector, regardless of context.

Modern transformer-based models like BERT, GPT, RoBERTa, DistilBERT introduced contextual embeddings:


#ELMO(Embedding from Language Models)
A deep contextual word representation model. It was the first major leap beyond Word2Vec / GloVe —
because it generates different embeddings for the same word depending on its context.

ELMo is based on a bidirectional language model (biLM) built with two LSTM layers.

1. Input Layer (Character-based)

    Each word is represented using character-level CNNs, not just whole-word tokens.
    "playing" → ['p', 'l', 'a', 'y', 'i', 'n', 'g']

    ELMo uses character convolutions to build word embeddings.
2. Bidirectional Language Model (biLM)

    ELMo trains two LSTMs:

    One forward LSTM → predicts the next word given the previous context

    One backward LSTM → predicts the previous word given the next context

 3. Concatenating Contextual States

    For each word, ELMo takes:

    The forward hidden state

    The backward hidden state

    Then concatenates them:

   ` ELMo(word) = [h_forward; h_backward]`

4. Layer Combination (Task-Specific)

    ELMo doesn’t just use one LSTM layer — it uses multiple layers, each capturing different linguistic information.

| Aspect         | ELMo                                   |
| -------------- | -------------------------------------- |
| Developer      | Allen Institute for AI (2018)          |
| Architecture   | Character CNN + 2-layer BiLSTM         |
| Embedding size | 1024                                   |
| Contextual     | ✅ Yes                                  |
| Handles OOV    | ✅ Yes                                  |
| Key innovation | Dynamic, context-dependent embeddings  |
| Replaced by    | BERT, RoBERTa, GPT (transformer-based) |



In [None]:
!pip install tensorflow tensorflow-hub tensorflow-text

In [2]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
import numpy as np
import tensorflow_hub as hub
import tensorflow_text as text
import pandas as pd
from sklearn.utils import resample
import tensorflow as tf

elmo = hub.KerasLayer("https://tfhub.dev/google/elmo/3", trainable=False)

In [None]:
#Sentence Level Embeddings

sentences = [
    "The bank approved my loan.",
    "He sat on the river bank."
]

# Get ELMo embeddings
embeddings = elmo(sentences)

print(embeddings.shape)
import numpy as np

# Convert TensorFlow tensor to NumPy array
embeddings_np = embeddings.numpy()

# Print first sentence embedding
print(embeddings_np[0])


print(embeddings_np[1][:10])

(2, 1024)
[-0.7354043   0.18077965 -0.24448149 ... -0.06355472  0.21377377
 -0.13779095]
[-0.26127908 -0.1274916  -0.23593295  0.07491662  0.0320407  -0.08880552
 -0.13739316  0.3712413   0.09359346  0.13650009]


In [22]:
#Word Level Embeddings

elmo = hub.load("https://tfhub.dev/google/elmo/3")

# Access the "default" signature
elmo_sign = elmo.signatures["default"]

# Prepare your text input
sentences = [
    "The bank approved my loan.", #6  tokens (“The”, “bank”, “approved”, “my”, “loan”, “.”), each mapped to a 1024-dimensional contextual embedding.
    "He sat on the river bank."
]

# Get embeddings
embeddings = elmo_sign(tf.constant(sentences))

# Get the word-level ELMo embeddings
word_emb = embeddings["elmo"]

print("Word embeddings shape:", word_emb.shape)
#print("First word vector (first 5 values):")
print(word_emb[0][0].numpy().shape)
print(word_emb[1].shape)

for i, sent in enumerate(sentences):
    print(f"Sentence {i+1}: {word_emb[i].shape}")

Word embeddings shape: (2, 6, 1024)
(1024,)
(6, 1024)
Sentence 1: (6, 1024)
Sentence 2: (6, 1024)
