# LlamaIndex Bottoms-Up Development - Embeddings
Embeddings are numerical representations of text. To generate embeddings for text, a specific model is required.

In LlamaIndex, the default embedding model is `text-embedding-ada-002` from OpenAI. You can also leverage any embedding models offered by Langchain and Huggingface using our `LangchainEmbedding` wrapper.

In this notebook, we cover the low-level usage for both OpenAI embeddings and HuggingFace embeddings.

In [1]:
import os
import sys

sys.path.append(os.path.join(os.getcwd(), ".."))

In [2]:
from dotenv import load_dotenv, find_dotenv  # type: ignore

# ## Using the OpenAI LLM with the VectorStoreIndex
from openai import __version__ as openai_version  # type: ignore
from llama_index.core import __version__ as llama_index_version  # type: ignore

# Load environment variables
_ = load_dotenv(find_dotenv())  # read local .env file

print(f"Python version: {sys.version}")
print(f"OpenAI version: {openai_version}")
print(f"llamaindex version: {llama_index_version}")

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
print(f"OPENAI_API_KEY: {OPENAI_API_KEY}")

Python version: 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
OpenAI version: 1.33.0
llamaindex version: 0.10.43
OPENAI_API_KEY: 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


In [5]:
from llama_index.embeddings.openai import OpenAIEmbedding


openai_embedding = OpenAIEmbedding()


embed = openai_embedding.get_text_embedding("hello world!")

print(len(embed))


print(embed[:10])

1536
[-0.0077317021787166595, -0.0055575682781636715, -0.016167471185326576, -0.03340408205986023, -0.016780272126197815, -0.003158524166792631, -0.015606825239956379, -0.0019084787927567959, -0.003049328690394759, -0.026989247649908066]


## Custom Embeddings
While we can integrate with any embeddings offered by Langchain, you can also implement the `BaseEmbedding` class and run your own custom embedding model!

For this, we will use the `InstructorEmbedding` pip package, in order to run `hkunlp/instructor-large` model found here: https://huggingface.co/hkunlp/instructor-large

In [None]:
# Instal dependencies
# Requires Python 3.11
# !pip install InstructorEmbedding torch transformers "sentence_transformers==2.2.2"

Test the embeddings! Instructor embeddings work by telling it to represent text in a particular domain. 

This makes sense for our llama-docs-bot, since we are search very specific documentation!

Let's quickly test to make sure everything works.

In [4]:
from InstructorEmbedding import INSTRUCTOR

model = INSTRUCTOR("hkunlp/instructor-large")
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
instruction = "Represent the Science title:"
embeddings = model.encode([[instruction, sentence]])
print(embeddings)

  from tqdm.autonotebook import trange


load INSTRUCTOR_Transformer
max_seq_length  512
[[-6.15552776e-02  1.04199853e-02  5.88440662e-03  1.93768851e-02
   5.71417958e-02  2.57655643e-02 -4.01940270e-05 -2.80044470e-02
  -2.92965434e-02  4.91884910e-02  6.78200126e-02  2.18692329e-02
   4.54528593e-02  1.50187062e-02 -4.84451912e-02 -3.25259753e-02
  -3.56492735e-02  1.19935395e-02 -6.83914730e-03  3.03126331e-02
   5.17491698e-02  3.48140597e-02  4.91033122e-03  6.68928474e-02
   1.52824298e-02  3.54217105e-02  1.07743675e-02  6.89828917e-02
   4.44019437e-02 -3.23419534e-02  1.24267964e-02 -2.15528104e-02
  -1.62690841e-02 -4.15058322e-02 -2.42290529e-03 -3.07158497e-03
   4.27047350e-02  1.56428553e-02  2.57813167e-02  5.92843294e-02
  -1.99174136e-02  1.32361772e-02  1.08407997e-02 -4.00610678e-02
  -1.36213552e-03 -1.57032814e-02 -2.53812131e-02 -1.31972851e-02
  -7.83779565e-03 -1.14009008e-02 -4.82025407e-02 -2.58416161e-02
  -4.98770131e-03  4.98239510e-02  1.19490083e-02 -5.55060580e-02
  -2.82120425e-02 -3.3220879

Looks good! But we can see the output is batched (i.e. a list of lists), so we need to undo the batching in our implementation!

There are only 4 methods we need to implement below.

In [None]:
from llama_index.embeddings.instructor import InstructorEmbedding

embed_model = InstructorEmbedding(model_name="hkunlp/instructor-base")
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])

In [1]:
from typing import Any, List
from InstructorEmbedding import INSTRUCTOR

from llama_index.core.bridge.pydantic import PrivateAttr
from llama_index.core.embeddings import BaseEmbedding


class InstructorEmbeddings(BaseEmbedding):
    _model: INSTRUCTOR = PrivateAttr()
    _instruction: str = PrivateAttr()

    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        instruction: str = "Represent a document for semantic search:",
        **kwargs: Any,
    ) -> None:
        self._model = INSTRUCTOR(instructor_model_name)
        self._instruction = instruction
        super().__init__(**kwargs)

    @classmethod
    def class_name(cls) -> str:
        return "instructor"

    async def _aget_query_embedding(self, query: str) -> List[float]:
        return self._get_query_embedding(query)

    async def _aget_text_embedding(self, text: str) -> List[float]:
        return self._get_text_embedding(text)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0]

    def _get_text_embedding(self, text: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, text]])
        return embeddings[0]

    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        embeddings = self._model.encode([[self._instruction, text] for text in texts])
        return embeddings


  from tqdm.autonotebook import trange


In [2]:
# set the batch size to 1 to avoid memory issues
# if you have a large GPU, you can increase this
instructor_embeddings = InstructorEmbeddings(embed_batch_size=1)

load INSTRUCTOR_Transformer
max_seq_length  512


In [4]:
embed = instructor_embeddings._get_text_embedding("How do I create a vector index?")
print(len(embed))
print(embed[:10])

768
[ 0.0122677   0.01026734 -0.00182734  0.00379927 -0.00021093  0.04484274
  0.00842185  0.01182631 -0.03821287  0.01453499]


## Custom Embeddings w/ LlamaIndex

Since Instructor embeddings have a max length of 512, we set the chunk size to 512 as well.

However, if the emebddings are longer, there will not be an error, but only the first 512 tokens will be captured!

In [5]:
from llama_index.core import ServiceContext, set_global_service_context
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
service_context = ServiceContext.from_defaults(
    llm=llm, embed_model=instructor_embeddings, chunk_size=512
)
set_global_service_context(service_context)

  service_context = ServiceContext.from_defaults(


In [1]:
import os
import sys

sys.path.append(os.path.join(os.getcwd(), ".."))

from llama_docs_bot.indexing import create_query_engine

# remove any existing indices
# !rm -rf ./*_index

query_engine = create_query_engine()

ImportError: cannot import name 'SimpleDirectoryReader' from 'llama_index' (unknown location)

In [None]:
response = query_engine.query('What is the Sub Question query engine?')
response.print_response_stream()

In [None]:
print(response.get_formatted_sources(length=256))

### Compare to default embeddings

Note that an index must be using the same embedding model at query time that was used to create the index.

So below, we delete the existing indicies and rebuild them using OpenAI embeddings.

In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding

service_context = ServiceContext.from_defaults(llm=llm, embed_model=OpenAIEmbedding(), chunk_size=512)
set_global_service_context(service_context)

# delete old vector index so we can re-create it
!rm -rf ./*_index

In [None]:
query_engine = create_query_engine()

response = query_engine.query('What is the Sub Question query engine?')
response.print_response_stream()

In [None]:
print(response.get_formatted_sources(length=256))

# Conclusion
In this notebook, we showed how to use the low-level embeddings, as well as how to create your own embeddings class.

If you wanted to use these embeddings in your project (which we will be doing in future guides!), you can use the sample example below.