### Download and run
https://qdrant.tech/documentation/quickstart/

1. docker pull qdrant/qdrant

2. docker run -p 6333:6333 -p 6334:6334  -v "$(pwd)/qdrant_storage:/qdrant/storage:z"   qdrant/qdrant

### Langchain 

In [8]:
from dotenv import load_dotenv
load_dotenv('D:\\Code\\AI\\.env')

True

In [3]:
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_qdrant import QdrantVectorStore,FastEmbedSparse, RetrievalMode
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings


#### Load the pdf

In [4]:
# Load PDF
pdf_path = "D:\\Code\\AI\\Agents\\RAG_Techniques\\short.pdf"  # Change this to your actual PDF file
loader = PyMuPDFLoader(pdf_path)

# Extract documents
documents = loader.load()

#### Split the text that is loaded

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=300)
documents = text_splitter.split_documents(documents)

In [None]:
documents

#### Set the embeddings

In [10]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-MiniLM-L6-v2")

In [9]:
embeddings

HuggingFaceEmbeddings(model_name='sentence-transformers/paraphrase-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

#### Qdrant connect to the docker image

In [12]:
qdrant_url= "http://localhost:6333"
qdrant_docker = QdrantVectorStore.from_documents(documents,embedding=embeddings,collection_name="pdfs")

In [13]:
qdrant_docker.similarity_search_with_relevance_scores("what is Rambla de los Pájaros")

[(Document(metadata={'producer': 'GPL Ghostscript 8.15', 'creator': 'PageMaker 7.0', 'creationdate': '2017-12-22T16:44:03+00:00', 'source': 'D:\\Code\\AI\\Agents\\RAG_Techniques\\short.pdf', 'file_path': 'D:\\Code\\AI\\Agents\\RAG_Techniques\\short.pdf', 'total_pages': 11, 'format': 'PDF 1.4', 'title': 'chap-01.pmd', 'author': 'NCERT', 'subject': '', 'keywords': '', 'moddate': '2024-05-21T15:14:25+05:30', 'trapped': '', 'modDate': "D:20240521151425+05'30'", 'creationDate': 'D:20171222164403Z', 'page': 6, '_id': '1454cfb2-d89e-4a55-8252-b78c8df2bd34', '_collection_name': 'pdfs'}, page_content='After lunch, during the inevitable stroll along the\nRamblas, I lagged behind with Frau Frieda so that we could\nrenew our memories with no other ears listening. She told\nme she had sold her properties in Austria and retired to\nOporto, in Portugal, where she lived in a house that she\ndescribed as a fake castle on a hill, from which one could\nsee all the way across the ocean to the Americas. Al

#### Hybrid Search


In [14]:
from fastembed import SparseTextEmbedding, SparseEmbedding
from typing import List

In [15]:
SparseTextEmbedding.list_supported_models()

[{'model': 'prithivida/Splade_PP_en_v1',
  'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1', 'url': None},
  'model_file': 'model.onnx',
  'description': 'Independent Implementation of SPLADE++ Model for English.',
  'license': 'apache-2.0',
  'size_in_GB': 0.532,
  'additional_files': [],
  'requires_idf': None,
  'vocab_size': 30522},
 {'model': 'prithvida/Splade_PP_en_v1',
  'sources': {'hf': 'Qdrant/SPLADE_PP_en_v1', 'url': None},
  'model_file': 'model.onnx',
  'description': 'Independent Implementation of SPLADE++ Model for English.',
  'license': 'apache-2.0',
  'size_in_GB': 0.532,
  'additional_files': [],
  'requires_idf': None,
  'vocab_size': 30522},
 {'model': 'Qdrant/bm42-all-minilm-l6-v2-attentions',
  'sources': {'hf': 'Qdrant/all_miniLM_L6_v2_with_attentions', 'url': None},
  'model_file': 'model.onnx',
  'description': 'Light sparse embedding model, which assigns an importance score to each token in the text',
  'license': 'apache-2.0',
  'size_in_GB': 0.09,
  'additional_f

In [16]:
sparse_model = FastEmbedSparse(model_name='Qdrant/bm42-all-minilm-l6-v2-attentions',batch_size=8)

Fetching 6 files: 100%|██████████| 6/6 [00:26<00:00,  4.48s/it]


In [17]:
qdrant_hybrid = QdrantVectorStore.from_documents(documents,embeddings,url=qdrant_url,
                                                collection_name="pdf-hybrid",
                                                sparse_embedding=sparse_model,
                                                retrieval_mode=RetrievalMode.HYBRID)

In [18]:
qdrant_hybrid.similarity_search_with_relevance_scores("what is Rambla de los Pájaros")

[(Document(metadata={'producer': 'GPL Ghostscript 8.15', 'creator': 'PageMaker 7.0', 'creationdate': '2017-12-22T16:44:03+00:00', 'source': 'D:\\Code\\AI\\Agents\\RAG_Techniques\\short.pdf', 'file_path': 'D:\\Code\\AI\\Agents\\RAG_Techniques\\short.pdf', 'total_pages': 11, 'format': 'PDF 1.4', 'title': 'chap-01.pmd', 'author': 'NCERT', 'subject': '', 'keywords': '', 'moddate': '2024-05-21T15:14:25+05:30', 'trapped': '', 'modDate': "D:20240521151425+05'30'", 'creationDate': 'D:20171222164403Z', 'page': 6, '_id': '4a59c44e-881d-4bb9-9baa-b19adb0d595d', '_collection_name': 'pdf-hybrid'}, page_content='After lunch, during the inevitable stroll along the\nRamblas, I lagged behind with Frau Frieda so that we could\nrenew our memories with no other ears listening. She told\nme she had sold her properties in Austria and retired to\nOporto, in Portugal, where she lived in a house that she\ndescribed as a fake castle on a hill, from which one could\nsee all the way across the ocean to the Americ

#### Qdrant Client

All the above steps are handy when we are creating the vector database for the first. What if we want to connect to the existing vector database thats already existing. Then we will need to use the client to do so.

In [19]:
from qdrant_client import  QdrantClient

client =QdrantClient(qdrant_url)

In [20]:
hybrid_search= QdrantVectorStore(client=client,collection_name="pdf-hybrid",
                                 embedding=embeddings,
                                 sparse_embedding=sparse_model,
                                 retrieval_mode=RetrievalMode.HYBRID)

In [23]:
qdrant_hybrid.similarity_search_with_relevance_scores("what is   los Pájaros")

[(Document(metadata={'producer': 'GPL Ghostscript 8.15', 'creator': 'PageMaker 7.0', 'creationdate': '2017-12-22T16:44:03+00:00', 'source': 'D:\\Code\\AI\\Agents\\RAG_Techniques\\short.pdf', 'file_path': 'D:\\Code\\AI\\Agents\\RAG_Techniques\\short.pdf', 'total_pages': 11, 'format': 'PDF 1.4', 'title': 'chap-01.pmd', 'author': 'NCERT', 'subject': '', 'keywords': '', 'moddate': '2024-05-21T15:14:25+05:30', 'trapped': '', 'modDate': "D:20240521151425+05'30'", 'creationDate': 'D:20171222164403Z', 'page': 6, '_id': '82bf4781-c735-43a9-9b2e-97d24cb3a460', '_collection_name': 'pdf-hybrid'}, page_content='as ever,’ she said. And said no more, because the rest of\nthe group had stopped to wait for Neruda to finish talking\nin Chilean slang to the parrots along the Rambla de los\nPájaros. When we resumed our conversation, Frau Frieda\nchanged the subject.\n‘By the way,’ she said, ‘you can go back to Vienna\nnow.’\nOnly then did I realise that thirteen years had gone by\nsince our first meeting.