# Embeddings in LlamaIndex
* Notebook by Adam Lang
* Date: 3/11/2024
* We will look at various embedding models available via LlamaIndex including:
1. OpenAI
2. Google Gemini
3. Open-Source via HuggingFace
4. SOTA embeddings from huggingface

## How do we select the best embeddings?
1. Domain specific embeddings
2. State of the art embeddings (SOTA) - via MTEB leaderboard
3. Finetune custom embeddings on your own data

In [1]:
# install libraries
!pip install llama-index cohere

Collecting llama-index
  Downloading llama_index-0.10.18-py3-none-any.whl (5.6 kB)
Collecting cohere
  Downloading cohere-4.54-py3-none-any.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.8/52.8 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-agent-openai<0.2.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.1.5-py3-none-any.whl (12 kB)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.8-py3-none-any.whl (25 kB)
Collecting llama-index-core<0.11.0,>=0.10.18 (from llama-index)
  Downloading llama_index_core-0.10.18.post1-py3-none-any.whl (15.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m44.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.6-py3-none-any.whl (6.0 kB)
Collecting llama-index-indices-managed-llama-clou

# 1. OpenAI Embeddings

In [2]:
# import openai embeddings
from llama_index.embeddings.openai import OpenAIEmbedding

In [3]:
# import os and instantiate openai key
import os
os.environ["OPENAI_API_KEY"] = '<your_key>'

In [4]:
# define the embedding model with openai
embed_model = OpenAIEmbedding()

In [5]:
# get text embedding
embedding = embed_model.get_text_embedding("OpenAI Embedding")

In [6]:
# print dimension of embedding
len(embedding)

1536

In [7]:
# get embeddings in batches
embeddings = embed_model.get_text_embedding_batch(["PyTorch is awesome", "Is Tensorflow more popular than PyTorch?"])

In [8]:
# print the embeddings we just created
(embeddings[1][:5])

[-0.016185689717531204,
 -0.010396501049399376,
 0.0077322726137936115,
 -0.017748169600963593,
 0.020419076085090637]

# 2. Using Google Gemini Embeddings

In [9]:
!pip install llama-index-embeddings-gemini

Collecting llama-index-embeddings-gemini
  Downloading llama_index_embeddings_gemini-0.1.4-py3-none-any.whl (2.9 kB)
Installing collected packages: llama-index-embeddings-gemini
Successfully installed llama-index-embeddings-gemini-0.1.4


In [10]:
# import
#from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.embeddings.gemini import GeminiEmbedding

In [11]:
# get API key and create embeddings
model_name = "models/embedding-001"

GOOGLE_API_KEY = "<your key>" #personal google api key
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

In [12]:
# create embed model
embed_model = GeminiEmbedding(model_name=model_name)

In [13]:
embeddings = embed_model.get_text_embedding("Google Gemini Embeddings.")

In [14]:
print(f"Dimension of embeddings we created: {len(embeddings)}")

Dimension of embeddings we created: 768


In [15]:
# output embeddings
embeddings[:5]

[0.04036733, -0.017969217, -0.054796226, 0.004967746, 0.05546861]

# 3. Open Source Embeddings from HuggingFace

In [16]:
!pip install llama-index-embeddings-huggingface

Collecting llama-index-embeddings-huggingface
  Downloading llama_index_embeddings_huggingface-0.1.4-py3-none-any.whl (7.7 kB)
Collecting torch<3.0.0,>=2.1.2 (from llama-index-embeddings-huggingface)
  Downloading torch-2.2.1-cp310-cp310-manylinux1_x86_64.whl (755.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m755.5/755.5 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch<3.0.0,>=2.1.2->llama-index-embeddings-huggingface)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch<3.0.0,>=2.1.2->llama-index-embeddings-huggingface)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m4.

In [17]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [18]:
# instatiate embedding model - use BGE embeddings small model
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [19]:
embedding = embed_model.get_text_embedding("Hugging Face Text Embeddings Inference")

In [20]:
# check lenght of embedding
len(embedding)

384

In [21]:
# give text to embedding model
embeddings = embed_model.get_text_embedding_batch(["Hugging Face Text Embeddings Inference",
                                                   "OpenAI Embedding",
                                                   "Open Source",
                                                   "Closed Source"])

In [22]:
# check len of embeddings
len(embeddings)

4

In [23]:
# check dimension of embeddings
len(embeddings[0])

384

In [24]:
# print embedding
(embeddings[0][:5])

[-0.03793426603078842,
 0.031132353469729424,
 -0.04273267462849617,
 -0.017657337710261345,
 0.00661458820104599]

# 4. Using another SOTA Embedding Model

* MTEB (Massive Text Embedding Benchmark) embeddings leaderboard: https://huggingface.co/spaces/mteb/leaderboard

In [26]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="WhereIsAI/UAE-Large-V1")

config.json:   0%|          | 0.00/733 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [27]:
embedding = embed_model.get_text_embedding("Hugging Face Text Embeddings Inference")

In [28]:
# print embedding
print(embedding[:5])

[0.02925274148583412, 0.011324609629809856, 0.010814081877470016, -0.017124859616160393, 0.00806861650198698]


In [29]:
# get dimension of embedding
len(embedding)

1024

Summary:
* We can see the dimensions of each embedding model are different.