In [None]:
%%html
<style>
    body {
        --vscode-font-family: "Segoe UI"
    }
</style>

Embedding models are used when I am using some sort of vector store to index my content. By default llama-index uses the `text-embedding-ada-002` embedding model from OpenAI. If I want to use some other embedding model I should set it in the global service context.

In [None]:
from dotenv import load_dotenv

load_dotenv()

In [None]:
from llama_index import (
    ServiceContext,
    VectorStoreIndex,
    SimpleDirectoryReader,
    set_global_service_context,
)
from llama_index.embeddings import OpenAIEmbedding

In [None]:
service_context = ServiceContext.from_defaults()

In [None]:
service_context.embed_model

In [None]:
# This will use the OpenAIEmbedding model to index the data.
docs = SimpleDirectoryReader("/Users/avilay/mldata/avilay.rocks").load_data()
index = VectorStoreIndex.from_documents(docs)

In [None]:
query_engine = index.as_query_engine()
resp = query_engine.query("quantum computing")

In [None]:
print(resp)

In [None]:
resp.metadata

I can ask llama-index to use a local model after downloading an "appropriate" model from HF.

In [None]:
service_context = ServiceContext.from_defaults(embed_model="local")
service_context.embed_model

In [None]:
set_global_service_context(service_context)

In [None]:
# Now it will use the `bge-small-en` embedding model.
docs = SimpleDirectoryReader("/Users/avilay/mldata/avilay.rocks").load_data()
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
resp = query_engine.query("quantum computing")
print(resp)
resp.metadata

Instead of letting HF/llama-index choose the local model for me, I can specify the specific model name that I want.

In [None]:
service_context = ServiceContext.from_defaults(
    embed_model="local:sentence-transformers/multi-qa-mpnet-base-dot-v1"
)
service_context.embed_model

In [None]:
set_global_service_context(service_context)
# Now it will use the `sentence-transformers` embedding model.
docs = SimpleDirectoryReader("/Users/avilay/mldata/avilay.rocks").load_data()
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
resp = query_engine.query("quantum computing")
print(resp)
resp.metadata

In [None]:
emb = service_context.embed_model.get_text_embedding("I love to code!")
len(emb)

In [None]:
emb[:10]

In addition to OpenAI and HF embeddings, llama-index has integration with a bunch of different embedding model providers. See list [here](http://127.0.0.1:8000/module_guides/models/embeddings.html#list-of-supported-embeddings).

In case I want to use an embedding model that is not available directly in llama-index, or needs some special pre/post processing of prompts, I can wrap that embedding model in `BaseEmbedding` wrapper as shown [here](http://127.0.0.1:8000/module_guides/models/embeddings.html#custom-embedding-model).

This example demonstrates the use of the [Instructor embedding](https://huggingface.co/hkunlp/instructor-large) which is good for domain specific embeddings. I didn't quite get the usage, will need to read their paper at some point soon.