## Launch an OpenSearch DocumentStore

Below, we create an `OpenSearchDocumentStore` which by default connects to an OpenSearch service.

Chage the host, port, username and password below to run the following indexing pipeline on your own OpenSearch service in AWS.

If you would like to use an OpenSearch servide on AWS, first follow the [instructions](/README.md#option-1-opensearch-service-on-aws) on how to deploy an OpenSearch instance with the provided `opensearch-index.yaml` CloudFormation template.

Another option is to simply launch an OpenSearch instance locally (for which you need docker). If you would prefer to do this, first run:

```python
from haystack.utils import launch_opensearch

launch_opensearch()
```

In [None]:
from haystack.document_stores import OpenSearchDocumentStore

doc_store = OpenSearchDocumentStore(host='your_opensearch_host', port=443, username= "admin", password="admin", embedding_dim=384)

## Indexing Pipeline to write Documents to OpenSearch

An indexing pipeline allows you to prepare your files and write them to a database that you would like to use with your NLP application. In this example, we're using OpenSearch as our vector database. We define an indexing pipeline that converts JSON files (that have been crawled from the OpenSearch documentation and website), and creates embeddings for them using the `sentence-transformers/all-MiniLM-L12-v2` model, which is a very small embedding model.

In [8]:
from haystack.nodes import JsonConverter

converter = JsonConverter()

In [9]:
from haystack.nodes import PreProcessor

preprocessor = PreProcessor (
        clean_empty_lines=True, 
        split_by='word',
        split_respect_sentence_boundary=True,
        split_length=80,
        split_overlap=20
    )

In [None]:
from haystack.nodes import EmbeddingRetriever

retriever = EmbeddingRetriever(document_store=doc_store, embedding_model="sentence-transformers/all-MiniLM-L12-v2", devices=["mps"], top_k=5)

In [11]:
from haystack import Pipeline

indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=converter, name="Converter", inputs=["File"])
indexing_pipeline.add_node(component=preprocessor, name="Preprocessor", inputs=["Converter"])
indexing_pipeline.add_node(component=retriever, name="Retriever", inputs=["Preprocessor"])
indexing_pipeline.add_node(component=doc_store, name="DocumentStore", inputs=["Retriever"])

In [None]:
indexing_pipeline.run(file_paths=["data/opensearch-documentation-2.7.json", "data/opensearch-website.json"])