## Writing content into a Document Store


### `DocumentWriter`

#### Writing regular documents

We can write `Document` objects into a Document Store using the `DocumentWriter` class. In this example, we create a `DocumentStore` and write a `Document` object into it.

In [1]:
from haystack.preview.components.writers import DocumentWriter
from haystack.preview.document_stores import InMemoryDocumentStore
from haystack.preview.dataclasses import Document

# Initialize an in-memory document store
doc_store = InMemoryDocumentStore()

# Create the DocumentWriter component with the document store
document_writer = DocumentWriter(document_store=doc_store)

# Define a list of documents to write
documents_to_write = [
    Document(content="Document 1 content"),
    Document(content="Document 2 content"),
]

# Use the DocumentWriter component to write documents to the store
result = document_writer.run(documents=documents_to_write)

# Print the number of documents written
print(f"Documents written: {result['documents_written']}")


Documents written: 2


In [2]:
doc_store.count_documents()

2

In [3]:
doc_store.filter_documents()

[Document(id=8bd659ba0ddd6820199db5c60404564834d86ad5d6492bffeba36dbff256a16d, content: 'Document 1 content'),
 Document(id=e9ddb15c1c34d025bad485665cb56e4a4e18450fe13afec1e749f85ab0dcc19c, content: 'Document 2 content')]

#### Writing embedded documents

There may be times in which, either due to the size of the data, or to preserve semantic meaning while leveraging embedding models, that we may want to work with embeddings instead. 

We can follow the next key steps.

* Compute Embeddings: Use either the `OpenAIDocumentEmbedder` or `SentenceTransformersDocumentEmbedder`, or other Haystack embedding model integration, to compute the embeddings for your documents.

* Store Embeddings: The computed embeddings are stored in the embedding field of the Document objects.

* Write to DocumentStore: Use the DocumentWriter component to write these Document objects, now with embeddings, into a DocumentStore.

Here's an example code snippet that demonstrates how to use the SentenceTransformersDocumentEmbedder to write embeddings into a document store:



In [4]:
from haystack.preview.document_stores import InMemoryDocumentStore
from haystack.preview.components.writers import DocumentWriter
from haystack.preview.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.preview.dataclasses import Document

# Initialize document store and components
doc_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(model_name_or_path="sentence-transformers/all-mpnet-base-v2")
document_writer = DocumentWriter(document_store=doc_store)

# Example document
documents = [
    Document(content="The quick brown fox jumps over the lazy dog."),
    Document(content="When it comes to natural language processing, context is key.")
]

# Warm up the embedder and compute embeddings
doc_embedder.warm_up()
embedded_docs = doc_embedder.run(documents)['documents']

# Write documents with embeddings to the document store
document_writer.run(documents=embedded_docs)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

{'documents_written': 2}

Showing the document content and their embeddings

In [5]:
# Retrieve all documents
all_documents = doc_store.filter_documents()

# Print details of each document, including the embedding if it exists
for doc in all_documents:
    print(f"Document ID: {doc.id}")
    print(f"Content: {doc.content}")
    if doc.embedding:
        print(f"Embedding: {doc.embedding[:5]}...")  # Displaying first 5 values of the embedding for brevity
    print("\n")


Document ID: 2e3218009b01cfc57f865bbf81fa70de81b5ebae02c4cc7092e46ffde03f3c49
Content: The quick brown fox jumps over the lazy dog.
Embedding: [-0.03429264575242996, -0.0013394346460700035, 0.004336129408329725, -0.0018683503149077296, 0.025440821424126625]...


Document ID: 8baba41960a8807c42da6783a39dbbf50873f9700ff861844ec8ccce65d4f50e
Content: When it comes to natural language processing, context is key.
Embedding: [0.049897201359272, -0.023004200309515, -0.03653186932206154, 0.05246769264340401, -0.01983010210096836]...


