# Add QA Functionality to Your Existing OpenSearch Index

In this Notebook we run through how to use the new `open_search_index_to_document_store` function in Haystack to use existing OpenSearch indexes in a QA pipeline. 

Here, we assume you already have an OpenSearch cluster with some documents in them. In this example, we have an OpenSearch cluster with indexed files about Game of Thrones. Here is what you can do to create your own OpenSearch cluster that can be used by this Notebook:

`docker run -d -p 9201:9200 -p 9600:9600 -e "discovery.type=single-node" --name "opensearch_got" opensearchproject/opensearch:1.2.4`

Then, write the documents you want to use into this cluster. You can use the `write_documents()` in Haystack to do this. Go to the 'Writing GoT documents to an OpenSearch Index' section at the bottom of this notebook to see an example of how you can do this.

The above command will expose the cluster on port `9201`


## Use Your Existing Index 

Now, we use the `open_search_index_to_document_store()` function to retrieve the indexed file from the original index. 

1. We initialize a `OpenSearchDocumentStore`. Below, we also declare a index the name 'haystack_os_index'. 
2. We call `open_search_index_to_document_store()`, which takes the documents from the original index where we already had them ('document') and writes them into the new one 'haystack_os_index', ready to be used in the following QA pipeline

### Why?

This way we don't touch the original index, and with the `open_search_index_to_document_store()` we write them to a new one. This isn't necessary, but a good option if you want to make sure your original index and (if any) its embeddings remains unchanged in the case you want to perform functions on `os_doc_store`.


In [8]:
import warnings
from haystack.document_stores.utils import open_search_index_to_document_store
from haystack.document_stores import OpenSearchDocumentStore

os_doc_store = OpenSearchDocumentStore(index='haystack_os_index', port=9201)
open_search_index_to_document_store(document_store=os_doc_store, original_index_name="document", original_content_field="content", port=9201)

Converting ES Records: 0it [00:00, ?it/s]


<haystack.document_stores.elasticsearch.OpenSearchDocumentStore at 0x7f77f2d458e0>

In [13]:
from haystack.nodes import FARMReader, DensePassageRetriever
retriever = DensePassageRetriever(document_store=os_doc_store)


reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)


INFO - haystack.modeling.utils -  Using devices: CPU
INFO - haystack.modeling.utils -  Number of GPUs: 0
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find facebook/dpr-question_encoder-single-nq-base locally.
INFO - haystack.modeling.model.language_model -  Looking on Transformers Model Hub (in local cache and online)...
INFO - haystack.modeling.model.language_model -  Loaded facebook/dpr-question_encoder-single-nq-base
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'DPRQuestionEncoderTokenizer'. 
The class this function is called from is 'DPRContextEncoderTokenizerFast'.
INFO - haystack.modeling.model.language_model -  LOADING MODEL
INFO - haystack.modeling.model.language_model -  Could not find facebook/dpr-ctx_encoder-single-nq-base locally.
INFO 

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

INFO - haystack.modeling.infer -  Got ya 7 parallel workers to do inference ...
INFO - haystack.modeling.infer -   0     0     0     0     0     0     0  
INFO - haystack.modeling.infer -  /w\   /w\   /w\   /w\   /w\   /w\   /w\ 
INFO - haystack.modeling.infer -  /'\   / \   /'\   /'\   / \   / \   /'\ 


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [14]:
from haystack.pipelines import ExtractiveQAPipeline

pipe = ExtractiveQAPipeline(reader, retriever)

In [16]:
prediction = pipe.run(query="Who is Ned's wife?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}})

from haystack.utils import print_answers

print_answers(prediction)


Query: Who is Ned's wife?
Answers:
[]
{'answers': [], 'documents': [], 'root_node': 'Query', 'params': {'Retriever': {'top_k': 10}, 'Reader': {'top_k': 5}}, 'query': "Who is Ned's wife?", 'node_id': 'Reader'}
