# Embeddings

In this notebook, we will explore how to access different types of embeddings
in llamaindex


1.   OpenAI
2.   Google Gemini
3.   CohereAI
4.   Open-Source from HuggingFace



Download the required packages by executing the below commands in either Anaconda Prompt (in Windows) or Terminal (in Linux or Mac OS)

pip install llama-index-embeddings-gemini llama-index-embeddings-cohere

## Content
- Documents -> Chunks -> Embeddings -> index
- Why do we need Embeddings?
- Different embedings for different openai companies
- Types of Embeddings -> word based, sentence based, document based,
- What kind of embeddings the LLM will use?
- Flow diagram/ Architecture representation of Embeddings
- Interpreting Embeddings -> Cosne Similarity formula to determine the similarilty in two vectors regarding of their magnitude.
- Applications of Embeeddings - How o find the most similar words, finding the odd one , document clustering for different types of embeddings.
- Closed source Embeddings - which are paid embeddings
- Open source Embeddings 
- How to select the right embeddings for my use case?
- Massive Text Embedding Benchmark(MTEB) - To prove the model is better than other model. Check Top Embedding from MTEB LearderBoard. https://huggingface.co/spaces/mteb/leaderboard 
- Pre-trainned vs Fine Tunned embeddings- 

In [1]:
!pip install dotenv
!pip install llama-index-embeddings-gemini llama-index-embeddings-cohere 
!pip install llama_index.embeddings.openai




[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting llama-index-embeddings-gemini
  Using cached llama_index_embeddings_gemini-0.4.0-py3-none-any.whl.metadata (623 bytes)
Collecting llama-index-embeddings-cohere
  Using cached llama_index_embeddings_cohere-0.6.0-py3-none-any.whl.metadata (402 bytes)
Collecting google-generativeai>=0.5.2 (from llama-index-embeddings-gemini)
  Using cached google_generativeai-0.8.5-py3-none-any.whl.metadata (3.9 kB)
Collecting cohere<6,>=5.15 (from llama-index-embeddings-cohere)
  Using cached cohere-5.17.0-py3-none-any.whl.metadata (3.4 kB)
Collecting fastavro<2.0.0,>=1.9.4 (from cohere<6,>=5.15->llama-index-embeddings-cohere)
  Using cached fastavro-1.12.0-cp312-cp312-win_amd64.whl.metadata (5.9 kB)
Collecting httpx-sse==0.4.0 (from cohere<6,>=5.15->llama-index-embeddings-cohere)
  Using cached httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting tokenizers<1,>=0.15 (from cohere<6,>=5.15->llama-index-embeddings-cohere)
  Using cached tokenizers-0.21.4-cp39-abi3-win_amd64.whl.metadata


[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Load the Keys

In [2]:
import os
from dotenv import load_dotenv, find_dotenv

In [3]:
load_dotenv('D:/Training/FAA-Training/Beyond-the-Prompt-Practical-RAG-for-Real-World-AI/RAG-systems-using-LlamaIndex/Module3-Components-of-LlamaIndex/.env')

True

In [4]:
OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
GOOGLE_API_KEY = os.environ['GOOGLE_API_KEY']
HUGGINGFACE_API_KEY = os.environ['HUGGINGFACE_API_KEY']
COHERE_API_KEY = os.environ["COHERE_API_KEY"]

## 1. OpenAI Embeddings

In [5]:
from llama_index.embeddings.openai import OpenAIEmbedding

In [6]:
# Define embedding model with OpenAI Embedding
embed_model = OpenAIEmbedding(model='text-embedding-3-small', api_key=OPENAI_API_KEY)

In [7]:
# Get the text embedding
embedding = embed_model.get_text_embedding("The cat sat on the mat")

In [8]:
# Get the dimension of the embedding
len(embedding)

1536

In [9]:
embedding[:10] 

[-0.030741853639483452,
 -0.049538593739271164,
 -0.005032071378082037,
 -0.0015698964707553387,
 0.03624901548027992,
 -0.002026402624323964,
 -0.008926513604819775,
 0.027173832058906555,
 0.0071037206798791885,
 -0.011867544613778591]

In [10]:
# You can get embeddings in batches
embeddings = embed_model.get_text_embedding_batch(["What are Embeddings?", \
                                                   "In 1967, a professor at MIT built the first ever NLP program Eliza to understand natural language."])

In [11]:
len(embeddings)

2

In [12]:
embeddings[0][:5] # embeddings of the 1st sentence

[0.009822146035730839,
 -0.02960844337940216,
 -0.00874525960534811,
 0.006135882344096899,
 -0.01089311484247446]

In [13]:
embeddings[1][:5]  # embeddings of the 2nd sentence

[-0.052310045808553696,
 0.011894790455698967,
 -0.002292225370183587,
 0.00246118544600904,
 0.014733320102095604]

In [14]:
len(embeddings[1]) # length of each embedding

1536

## 2. Using Google Gemini Embeddings

In [15]:
# imports
from llama_index.embeddings.gemini import GeminiEmbedding

  from .autonotebook import tqdm as notebook_tqdm


In [16]:
model_name = "models/text-embedding-004"

In [17]:
embed_model = GeminiEmbedding(model_name=model_name, api_key=GOOGLE_API_KEY)

  embed_model = GeminiEmbedding(model_name=model_name, api_key=GOOGLE_API_KEY)


In [18]:
embeddings = embed_model.get_text_embedding("A journey to the centre of Earth")

In [19]:
print(f"Dimension of embeddings: {len(embeddings)}")

Dimension of embeddings: 768


In [20]:
embeddings[:5]

[0.0062123085, -0.00357899, -0.036939137, 0.0475324, 0.047872353]

## 3. Using CohereAI Embeddings

In [20]:
from llama_index.embeddings.cohere import CohereEmbedding

In [21]:
embed_model = CohereEmbedding(
    cohere_api_key=COHERE_API_KEY,
    model_name="embed-english-v3.0",
    input_type="search_query",
)

In [22]:
embeddings = embed_model.get_text_embedding("Hello CohereAI!")

In [23]:
print(len(embeddings))

1024


In [24]:
print(embeddings[:5])

[-0.041931152, -0.022384644, -0.07067871, -0.011886597, -0.019210815]


## 4. Open Source Embeddings from HuggingFace.

In [25]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.


In [26]:
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [27]:
embedding = embed_model.get_text_embedding("Roses are red, and the sky is blue!?")

In [28]:
len(embedding)

384

In [41]:
embeddings = embed_model.get_text_embedding_batch(["Hugging Face Text Embeddings Inference",
                                                   "OpenAI Embedding",
                                                   "Open Source",
                                                   "Closed Source"])

In [42]:
len(embeddings)

4

In [43]:
embeddings[0][:5]

[-0.03793426603078842,
 0.031132353469729424,
 -0.04273267462849617,
 -0.017657337710261345,
 0.00661458820104599]

In [44]:
len(embeddings[0])

384

## 5. Loading SOTA Embedding Model

In [35]:
embed_model = HuggingFaceEmbedding(model_name='WhereIsAI/UAE-Large-V1')

In [36]:
embedding = embed_model.get_text_embedding("Hugging Face Text Embeddings Inference")

In [37]:
print(embedding[:5])

[0.02925274148583412, 0.011324609629809856, 0.010814081877470016, -0.017124859616160393, 0.00806861650198698]


In [38]:
len(embedding)

1024

## You can check the top embeddings from this leaderboard: https://huggingface.co/spaces/mteb/leaderboard