## Embedding components

In this example, `docs` is a list of `Document` objects with text content to be embedded. The `OpenAIDocumentEmbedder` is initialized with an OpenAI API key and is used to generate embeddings for each document. The embeddings are then printed out for each document in the `docs` list.

In [None]:
!pip install haystack-ai

### OpenAIDocumentEmbedder

In [1]:
from haystack import Document
from haystack.components.embedders import OpenAIDocumentEmbedder
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv("./../../.env")
api_key = os.getenv("OPENAI_API_KEY")

# List of documents to embed
docs = [Document(content="The quick brown fox jumps over the lazy dog."), 
        Document(content="To be or not to be, that is the question.")]

# Initialize the embedder with your OpenAI API key
document_embedder = OpenAIDocumentEmbedder(api_key=api_key)
""
# Run the embedder to get embeddings
result = document_embedder.run(docs)

# Access the embeddings stored in the documents
for doc in result['documents']:
    print(doc.embedding[0:2])

Calculating embeddings: 100%|██████████| 1/1 [00:00<00:00,  3.05it/s]

[0.0016260817646980286, 0.005972211714833975]
[0.017629027366638184, -0.022774461656808853]





Taking a look at the result data structure

In [2]:
result

{'documents': [Document(id=2e3218009b01cfc57f865bbf81fa70de81b5ebae02c4cc7092e46ffde03f3c49, content: 'The quick brown fox jumps over the lazy dog.', embedding: vector of size 1536),
  Document(id=63a06e3e867cb70e52a99c00b2de17fe531431c98e7d851268be01d341ea9f20, content: 'To be or not to be, that is the question.', embedding: vector of size 1536)],
 'metadata': {'model': 'text-embedding-ada-002-v2',
  'usage': {'prompt_tokens': 22, 'total_tokens': 22}}}

The metadata shows the model and usage.


In [3]:
result['metadata']

{'model': 'text-embedding-ada-002-v2',
 'usage': {'prompt_tokens': 22, 'total_tokens': 22}}

### OpenAITextEmbedder

In this snippet, `text_embedder` is created with an OpenAI API key and used to generate an embedding for the string "I love pizza!". The resulting embedding and associated metadata are then printed out.

In [5]:
from haystack.components.embedders import OpenAITextEmbedder

# Initialize the text embedder with your OpenAI API key
text_embedder = OpenAITextEmbedder(api_key=api_key)

# Text you want to embed
text_to_embed = "I love pizza!"

# Embed the text and print the result
result_text= text_embedder.run(text_to_embed)

In [6]:
result_text.keys()

dict_keys(['embedding', 'metadata'])

As before, we can access the embeddings through the embedding key

In [7]:
result_text['embedding'][0:2]

[0.017020374536514282, -0.023255806416273117]

In [8]:
result_text['metadata']

{'model': 'text-embedding-ada-002-v2',
 'usage': {'prompt_tokens': 4, 'total_tokens': 4}}

### SentenceTransformersDocumentEmbedder

In [9]:
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

# Initialize the document embedder with a model from the Sentence Transformers library
doc_embedder = SentenceTransformersDocumentEmbedder(model_name_or_path="sentence-transformers/all-mpnet-base-v2")
doc_embedder.warm_up()

# Create a document to embed
doc = Document(content="I love pizza!")

# Embed the document and print the embedding
result = doc_embedder.run([doc])
print(result['documents'][0].embedding[0:2])


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[-0.07804739475250244, 0.14989925920963287]


In [10]:
result['documents'][0]

Document(id=ac2bc369f8115bb5bdee26d31f642520041e731da70d578ef116d3f67ad50c69, content: 'I love pizza!', embedding: vector of size 768)

### SentenceTransformersTextEmbedder

In [11]:
from haystack.components.embedders import SentenceTransformersTextEmbedder

# Initialize the text embedder with a specific model from Sentence Transformers
text_embedder = SentenceTransformersTextEmbedder(model_name_or_path="sentence-transformers/all-mpnet-base-v2")

# Warm up the model before use
text_embedder.warm_up()

# Define the text you want to embed
text_to_embed = "I love pizza!"

# Embed the text and retrieve the embedding
result = text_embedder.run(text_to_embed)

# Print the embedding vector
print(result['embedding'][0:2])
# Output: List of floats representing the embedded vector


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[-0.07804739475250244, 0.14989925920963287]


In [None]:
result.keys()