# Understanding Word Embeddings in Large Language Models

## Introduction

In natural language processing (NLP), **word embeddings** are numerical representations of words in a continuous vector space. These vectors capture semantic and syntactic information, allowing words with similar meanings to have similar representations. Word embeddings are fundamental to many NLP tasks and are extensively used in large language models (LLMs) to understand and generate human-like text.

In this notebook, we will:

- Explain the concept of embeddings in LLMs.
- Load an embedding model from Hugging Face.
- Compute embeddings for a list of 20 words.
- Compute the similarities between words.

## What are Word Embeddings?

Traditional NLP techniques represented words as unique identifiers (one-hot encoding), which does not capture any semantic relationships between words. Word embeddings overcome this limitation by mapping words into a dense vector space where semantically similar words are positioned closely together.

Embeddings are learned from large text corpora using techniques like Word2Vec, GloVe, or through transformer-based models. These embeddings enable models to perform tasks like sentiment analysis, machine translation, and question-answering more effectively.

## Loading an Embedding Model from Hugging Face

We will use the `sentence-transformers` library, which provides easy access to pre-trained models for computing embeddings. Specifically, we'll use the `'all-MiniLM-L6-v2'` model available on Hugging Face.

### Install Dependencies


In [28]:
!pip install -q sentence-transformers

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/255.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.7/255.2 kB[0m [31m2.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m255.2/255.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [49]:

### Load the Pre-trained Model


from sentence_transformers import SentenceTransformer

model = SentenceTransformer('jinaai/jina-embeddings-v2-small-en')

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/70.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/65.4M [00:00<?, ?B/s]

Some weights of BertModel were not initialized from the model checkpoint at jinaai/jina-embeddings-v2-small-en and are newly initialized: ['embeddings.position_embeddings.weight', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.0.output.dense.weight', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.1.output.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.2.output.dense.bias', 'encoder.layer.2.output.dense.weight', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.3.intermediate.den

tokenizer_config.json:   0%|          | 0.00/373 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Computing Embeddings for a List of Words

Let's define a list of 20 words for which we'll compute embeddings.

In [50]:
# List of words
words = [
    'dog', 'cat', 'apple', 'orange', 'car', 'bicycle', 'university', 'school',
    'happy', 'sad', 'king', 'queen', 'man', 'woman', 'city', 'village',
    'computer', 'phone', 'music', 'art'
]



### Compute Embeddings
 In many NLP applications and is a key component of large language models.

In [51]:
# Compute embeddings for the list of words
embeddings = model.encode(words)



In [52]:
embeddings.shape

(20, 512)


## Computing Similarities Between Words

We can compute the cosine similarity between word embeddings to quantify how similar two words are semantically.



In [55]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

embeddings_dict = dict(zip(words, embeddings))


def compute_similarity(word1, word2, embeddings_dict):
    emb1 = embeddings_dict[word1].reshape(1, -1)
    emb2 = embeddings_dict[word2].reshape(1, -1)
    similarity = cosine_similarity(emb1, emb2)[0][0]
    return similarity


word_pairs = [
    ('king', 'queen'),
    ('king', 'man'),
    ('king', 'phone'),
    ('king', 'village'),
    ('king', 'bicycle'),
]

In [56]:
for word1, word2 in word_pairs:
    similarity = compute_similarity(word1, word2, embeddings_dict)
    print(f"Similarity between '{word1}' and '{word2}': {similarity:.4f}")

Similarity between 'king' and 'queen': 0.9389
Similarity between 'king' and 'man': 0.8698
Similarity between 'king' and 'phone': 0.7464
Similarity between 'king' and 'village': 0.8395
Similarity between 'king' and 'bicycle': 0.8054




As seen from the output, words that are semantically related have higher similarity scores.

## Conclusion

In this notebook, we've explored how word embeddings represent words in a continuous vector space, capturing semantic relationships. By computing embeddings for a list of words and calculating their cosine similarities, we've demonstrated how embeddings can quantify semantic similarities between words. This foundational concept is crucial for