## Introduction to Qdrant: A Production-Ready Vector Search Engine
Qdrant (pronounced "quadrant") is an open-source vector database and similarity search engine engineered to address the demands of modern, large-scale AI applications. Founded in 2021 and written in Rust for performance and memory safety, Qdrant provides a production-ready service with a convenient API to store, search, and manage vectors along with their associated metadata, known as payloads. Its core mission is to empower developers to transform the embeddings generated by neural network encoders into full-fledged applications for matching, searching, and recommendation.   

The primary value proposition of Qdrant lies in its exceptional performance, enterprise-grade scalability, and rich feature set. It is designed to handle massive volumes of high-dimensional data and complex queries with high throughput (requests per second) and minimal latency, making it a suitable choice for mission-critical workloads.   

#### To cater to a wide range of operational needs, Qdrant offers remarkable deployment flexibility:

- Qdrant Cloud: A fully managed, serverless offering on major cloud platforms like AWS, GCP, and Azure. It provides seamless scalability and minimal operational overhead, including a free-forever 1GB cluster ideal for prototyping and experimentation.   

- On-Premise & Hybrid Cloud: For organizations with strict data sovereignty or security requirements, Qdrant provides Kubernetes-native solutions that can be deployed in any existing environment, from private data centers to edge locations.   

- Local Mode: The Python client includes the option to run Qdrant in a lightweight local mode, either fully in-memory for ephemeral tasks or persisted on-disk. This is invaluable for rapid development, testing, and debugging without the need to set up a dedicated server.   



In [22]:
import numpy as np
import os
from dotenv import load_dotenv
from qdrant_client import QdrantClient
from qdrant_client.http import models as qmodels
from qdrant_client.http.models import Distance

load_dotenv()
QDRANT_URL = os.getenv("QDRANT_URL")
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")
COLLECTION_NAME = os.getenv("COLLECTION_NAME")
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")
# The Gemini model 'embedding-001' creates 768-dimensional vectors.
VECTOR_SIZE = 384


### Important Note :
The output dimension of your embedding model must be the exact same size as the vector dimension(size) you configure in your vector database.


## Read Pdf Files

In [3]:
from langchain_community.document_loaders import PyPDFLoader

def read_pdf(file_path: str) -> str:
    loader = PyPDFLoader(file_path)
    return loader.load()


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
Documents = read_pdf("Attenstion all you Need.pdf")

In [5]:
Documents

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2024-04-10T21:11:43+00:00', 'author': '', 'keywords': '', 'moddate': '2024-04-10T21:11:43+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'Attenstion all you Need.pdf', 'total_pages': 15, 'page': 0, 'page_label': '1'}, page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser∗\nGoogle 

## Chunking 

In [6]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
def chunk_documents(documents: str, chunk_size: int):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=200,
    )
    return text_splitter.split_documents(documents)



In [7]:
chunk_documents = chunk_documents(Documents, chunk_size=1000)

In [8]:
len(chunk_documents)

52

## Embedding and storing data in a vector database 

In [2]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_qdrant import Qdrant
def create_qdrant_store(chunks, collection_name: str):
    """
    Creates embeddings for the given chunks using Google's Gemini model
    and uploads them to a Qdrant Cloud collection.
    """
    # 1. Select the Gemini embedding model
    #embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    embeddings = HuggingFaceEmbeddings(model_name=model_name)
    
    # 2. Create and upload to the Qdrant vector store
    print(f"Uploading vectors to Qdrant collection: '{collection_name}'...")
    vector_store = Qdrant.from_documents(
        chunks,
        embeddings,
        url=os.environ.get("QDRANT_URL"),
        api_key=os.environ.get("QDRANT_API_KEY"),
        collection_name=collection_name,
    )
    print("Vector store created and updated successfully.")
    
    return vector_store

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
Vector_Store = create_qdrant_store(chunk_documents, COLLECTION_NAME)

  embeddings = HuggingFaceEmbeddings(model_name=model_name)


Uploading vectors to Qdrant collection: 'Documents_Chunking'...
Vector store created and updated successfully.


: 

In [15]:
from qdrant_client import QdrantClient
import os

qdrant_client = QdrantClient(
    url=os.getenv("QDRANT_URL"), 
    api_key=os.getenv("QDRANT_API_KEY"),
)

print(qdrant_client.get_collections())

collections=[CollectionDescription(name='Documents_Chunking')]


## An Architectural Deep Dive into Qdrant
Core Concepts: Collections, Points, and Payloads
To understand Qdrant's architecture, it is helpful to draw an analogy to the familiar structure of a traditional relational database.   

- Collection: A collection is the top-level container for data in Qdrant, analogous to a Table in a SQL database. It is a named set of points, and all vectors within a collection must adhere to a predefined configuration, including their dimensionality (e.g., 3072  for Gemini embeddings) and the distance metric used for comparison (e.g., Cosine Similarity).   

- Point: A point is the central entity in Qdrant, analogous to a Row. Each point represents a single data object and consists of three main components: a unique identifier (id), one or more vectors that represent the data object, and an optional payload.   

- Payload: The payload is a JSON object that stores arbitrary metadata associated with a vector, analogous to the Columns of a row. This can include the original text content, a URL to an image, product categories, creation timestamps, or any other relevant information. The ability to efficiently index and filter on this payload data is a cornerstone of Qdrant's power and flexibility.  

## user Query

In [1]:
user_query = "What is Transformer?"


In [3]:
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

  embeddings = HuggingFaceEmbeddings(model_name=model_name)


In [4]:
query_vector= embeddings.embed_query(user_query)

In [5]:
query_vector

[-0.16029368340969086,
 0.05631536990404129,
 -0.04283517971634865,
 0.007545766420662403,
 0.0005217685829848051,
 -0.02272825874388218,
 -0.016956159844994545,
 0.08200693130493164,
 0.032197121530771255,
 -0.006717396900057793,
 0.014480218291282654,
 0.041284918785095215,
 0.027147676795721054,
 0.024151956662535667,
 -0.043651893734931946,
 0.00132775551173836,
 -0.05273141339421272,
 0.02160237915813923,
 -0.08361837267875671,
 -0.09741507470607758,
 -0.013027152046561241,
 0.0761580839753151,
 -0.07200910151004791,
 -0.026810191571712494,
 0.04881918802857399,
 0.003864757716655731,
 0.00903021078556776,
 -0.04117405414581299,
 -0.032824210822582245,
 -0.025864368304610252,
 -0.036705825477838516,
 -0.02309025079011917,
 -0.08431150764226913,
 0.06626012921333313,
 -0.12820851802825928,
 0.05284758657217026,
 4.439922486199066e-05,
 0.0017364934319630265,
 0.01170114241540432,
 0.0031125361565500498,
 0.029487399384379387,
 -0.09615068137645721,
 0.030625218525528908,
 -0.055318

In [8]:
from qdrant_client import QdrantClient
import os
client = QdrantClient(
        url=os.getenv("QDRANT_URL"),
        api_key=os.getenv("QDRANT_API_KEY")
    )

In [11]:
search_results = client.search(
        collection_name=COLLECTION_NAME,
        query_vector=query_vector,
        limit=5,  # Return the top 5 most similar results
        with_payload=True  # Include the payload in the results
    )

  search_results = client.search(


In [12]:
search_results

[ScoredPoint(id='4b86733e-5559-41cb-a18d-e8986e11f552', version=0, score=0.39494932, payload={'page_content': 'The Transformer uses multi-head attention in three different ways:\n• In "encoder-decoder attention" layers, the queries come from the previous decoder layer,\nand the memory keys and values come from the output of the encoder. This allows every\nposition in the decoder to attend over all positions in the input sequence. This mimics the\ntypical encoder-decoder attention mechanisms in sequence-to-sequence models such as\n[38, 2, 9].\n• The encoder contains self-attention layers. In a self-attention layer all of the keys, values\nand queries come from the same place, in this case, the output of the previous layer in the\nencoder. Each position in the encoder can attend to all positions in the previous layer of the\nencoder.\n• Similarly, self-attention layers in the decoder allow each position in the decoder to attend to\nall positions in the decoder up to and including that po

## Local Mode
Local Mode for Qdrant refers to the ability to run a lightweight, self-contained instance of the Qdrant vector database directly within your application's environment, typically on your development machine. This is in contrast to connecting to a remote Qdrant Cloud cluster or a separately deployed on-premise instance.

In [16]:
from langchain_community.document_loaders import PyPDFLoader

def read_pdf(file_path: str) -> str:
    loader = PyPDFLoader(file_path)
    return loader.load()

In [17]:
Documents = read_pdf("Attenstion all you Need.pdf")

In [18]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
def chunk_documents(documents: str, chunk_size: int):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=200,
    )
    return text_splitter.split_documents(documents)

In [19]:
chunk_documents = chunk_documents(Documents, chunk_size=1000)

In [20]:
from langchain_huggingface import HuggingFaceEmbeddings
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=model_name)

In [None]:
vector_store = Qdrant.from_documents(
    chunk_documents,
    embeddings,
    path="./qdrant_data",
    collection_name=COLLECTION_NAME,
    force_recreate=True,
)

: 