In [None]:
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack import Document
import numpy as np

# Qdrant Vector Store

This is an example of using the [Qdrant](https://qdrant.tech/) vector store with fastRAG. This is done using the dependency `qdrant_haystack` and `qdrant_client` python connector. We assume you have a running server, e.g. by calling `docker run -p 6333:6333 qdrant/qdrant` locally. In this demo, we use local in-memory storage, without running Qdrant server.

Two important settings are the dimension of the vectors and HNSW parameters. Qdrant uses HNSW index for faster search, with a tradeoff between accuracy and latency. In general, higher numbers mean better accuracy, lower latency and larger RAM usage. 

The parameters are specified when connecting the server and creating a new index; these cannot be changed after the index was created. 

In [2]:
dim = 100
index_name = "test_hnsw"

## Creating an Index

Need to specify the location of the Qdrant service, vector dimension, index name, similarity metric and optionally the HNSW configuration. 

In [3]:
q = QdrantDocumentStore(":memory:",
                        embedding_dim=dim,
                        timeout=60,
                        index=index_name,
                        embedding_field="embedding",
                        hnsw_config={"m": 128, "ef_construct": 100},
                        similarity='dot_product',
                        recreate_index=True)

## Insertion and Searching of Documents

We'll create a few documents; they must have an `id`, `content` and `embedding` keys but could contain more data such as text titles. 

In [4]:
docs = [{"id": 1, "content": "I like to go to the beach", "embedding": np.random.normal(size=dim)},
        {"id": 2, "content": "Where is my hat?", "embedding": np.random.normal(size=dim)},
        {"id": 3, "content": "GPT4 is very nice", "embedding": np.random.normal(size=dim)},]
docs = [Document(content=d["content"], id=str(d["id"]), embedding=d["embedding"]) for d in docs]

Writing the documents to index with batching; deduplication of documents is on by default. 

In [5]:
q.write_documents(docs)

100it [00:00, 58694.43it/s]                                                                                                          


3

In [6]:
q.count_documents()

3

## Query by embedding
Need to provide a vector and `top_k` value. In general can also query by text search which we won't show here. 

In [7]:
print(q.query_by_embedding(np.ones(dim), top_k=1)[0].content)

Where is my hat?
