# Qdrant

- Author: [HyeonJong Moon](https://github.com/hj0302)
- Design: 
- Peer Review: 
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)


## Overview

This notebook demonstrates how to utilize the features related to the `Qdrant` vector database.

[`Qdrant`](https://python.langchain.com/docs/integrations/vectorstores/qdrant/) is an open-source vector similarity search engine designed to store, search, and manage high-dimensional vectors with additional payloads. It offers a production-ready service with a user-friendly API, suitable for applications such as semantic search, recommendation systems, and more.

Qdrant's architecture is optimized for efficient vector similarity searches, employing advanced indexing techniques like Hierarchical Navigable Small World (HNSW) graphs to enable fast and scalable retrieval of relevant data.


### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Credentials](#credentials)
- [Installation](#installation)
- [Initialization](#initialization)
- [Manage Vector Store](#manage-vector-store)
  - [Create a Collection](#create-a-collection)
  - [List Collections](#list-collections)
  - [Delete a Collection](#delete-a-collection)
  - [Add Items to the Vector Store](#add-items-to-the-vector-store)
  - [Delete Items from the Vector Store](#delete-items-from-the-vector-store)
  - [Upsert Items to Vector Store (Parallel)](#upsert-items-to-vector-store-parallel)
- [Query Vector Store](#query-vector-store)
  - [Query Directly](#query-directly)
  - [Similarity Search with Score](#similarity-search-with-score)
  - [Query by Turning into Retriever](#query-by-turning-into-retriever)
  - [Search with Filtering](#search-with-filtering)
  - [Delete with Filtering](#delete-with-filtering)
  - [Filtering and Updating Records](#filtering-and-updating-records)

### References

- [LangChain Qdrant Reference](https://python.langchain.com/docs/integrations/vectorstores/qdrant/)
- [Qdrant Official Reference](https://qdrant.tech/documentation/frameworks/langchain/)
- [Qdrant Install Reference](https://qdrant.tech/documentation/guides/installation/)
- [Qdrant Cloud Reference](https://cloud.qdrant.io)
- [Qdrant Cloud Quickstart Reference](https://qdrant.tech/documentation/quickstart-cloud/)
----

## Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]
- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [1]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [2]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain_openai",
        "langchain_qdrant",
        "qdrant_client",
        "langchain_core",
        "fastembed",
    ],
    verbose=False,
    upgrade=False,
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPEN_API_KEY": "",
        "QDRANT_API_KEY": "",
        "QDRANT_URL": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "Qdrant",
    }
)

Environment variables have been set successfully.


You can alternatively set API keys such as `OPENAI_API_KEY` in a `.env` file and load them.

**[Note]** If you are using a `.env` file, proceed as follows.

In [4]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Credentials

Create a new account or sign in to your existing one, and generate an API key for use in this notebook.

1. **Log in to Qdrant Cloud** : Go to the [Qdrant Cloud](https://cloud.qdrant.io) website and log in using your email, Google account, or GitHub account.

2. **Create a Cluster** : After logging in, navigate to the `"Clusters"` section and click the `"Create"` button. Choose your desired configurations and region, then click `"Create"` to start building your cluster. Once the cluster is created, an API key will be generated for you.

3. **Retrieve and Store Your API Key** : When your cluster is created, you will receive an API key. Ensure you save this key in a secure location, as you will need it later. If you lose it, you will have to generate a new one.

4. **Manage API Keys** : To create additional API keys or manage existing ones, go to the `"Access Management"` section in the Qdrant Cloud dashboard and select `"Qdrant Cloud API Keys"` Here, you can create new keys or delete existing ones.

```
QDRANT_API_KEY="YOUR_QDRANT_API_KEY"
```

## Installation

There are several main options for initializing and using the Qdrant vector store:

- **Local Mode** : This mode doesn't require a separate server.
    - **In-memory storage** (data is not persisted)
    - **On-disk storage** (data is saved to your local machine)
- **Docker Deployments** : You can run Qdrant using Docker.
- **Qdrant Cloud** : Use Qdrant as a managed cloud service.

For detailed instructions, see the [installation instructions](https://qdrant.tech/documentation/guides/installation/).

### In-Memory

For simple tests or quick experiments, you might choose to store data directly in memory. This means the data is automatically removed when your client terminates, typically at the end of your script or notebook session.

In [5]:
from utils.qdrant_interface import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with in-memory storage
db = QdrantDocumentManager(
    location=":memory:",  # Use in-memory database for temporary storage
    collection_name=collection_name,
    embedding=embedding,
)

Collection 'demo_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'demo_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


### On-Disk Storage

With on-disk storage, you can store your vectors directly on your hard drive without requiring a Qdrant server. This ensures that your data persists even when you restart the program.

In [7]:
from utils.qdrant_interface import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the path for Qdrant storage
qdrant_path = "./qdrant_memory"

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    path=qdrant_path,  # Specify the path for Qdrant storage
    collection_name=collection_name,
    embedding=embedding,
)

Collection 'demo_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'demo_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


### Docker Deployments

You can deploy `Qdrant` in a production environment using [Docker](https://qdrant.tech/documentation/guides/installation/#docker) and [Docker Compose](https://qdrant.tech/documentation/guides/installation/#docker-compose). Refer to the Docker and Docker Compose setup instructions in the development section for detailed information.

In [6]:
from utils.qdrant_interface import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the URL for Qdrant server
url = "http://localhost:6333"

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=url,  # Specify the path for Qdrant storage
    collection_name=collection_name,
    embedding=embedding,
)

### Qdrant Cloud

For a production environment, you can use [Qdrant Cloud](https://cloud.qdrant.io/). It offers fully managed `Qdrant` databases with features such as horizontal and vertical scaling, one-click setup and upgrades, monitoring, logging, backups, and disaster recovery. For more information, refer to the [Qdrant Cloud documentation](https://qdrant.tech/documentation/cloud/).

In [7]:
import getpass
import os

# Fetch the Qdrant server URL from environment variables or prompt for input
if not os.getenv("QDRANT_URL"):
    os.environ["QDRANT_URL"] = getpass.getpass("Enter your Qdrant Cloud URL key: ")
QDRANT_URL = os.environ.get("QDRANT_URL")

# Fetch the Qdrant API key from environment variables or prompt for input
if not os.getenv("QDRANT_API_KEY"):
    os.environ["QDRANT_API_KEY"] = getpass.getpass("Enter your Qdrant API key: ")
QDRANT_API_KEY = os.environ.get("QDRANT_API_KEY")

In [8]:
from utils.qdrant_interface import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name=collection_name,
    embedding=embedding,
)

Collection 'demo_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'demo_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


## Initialization

Once you've established your vector store, you'll likely need to manage the collections within it. Here are some common operations you can perform:

- Create a collection
- List collections
- Delete a collection

### Create a Collection

 The `QdrantDocumentManager` class allows you to create a new collection in Qdrant. It can automatically create a collection if it doesn't exist or if you want to recreate it. You can specify configurations for dense and sparse vectors to meet different search needs. Use the `_ensure_collection_exists` method for automatic creation or call `create_collection` directly when needed.

In [9]:
from utils.qdrant_interface import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings
from qdrant_client.http.models import Distance

# Define the collection name for storing documents
collection_name = "test_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name=collection_name,
    embedding=embedding,
    metric=Distance.COSINE,
)

Collection 'test_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'test_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


### List Collections

The `QdrantDocumentManager` class lets you list all collections in your Qdrant instance using the `get_collections` method. This retrieves and displays the names of all existing collections.

In [10]:
# Retrieve the list of collections from the Qdrant client
collections = db.client.get_collections()

# Iterate over each collection and print its details
for collection in collections.collections:
    print(f"Collection Name: {collection.name}")

Collection Name: test_collection
Collection Name: demo_collection


### Delete a Collection

The `QdrantDocumentManager` class allows you to delete a collection using the `delete_collection` method. This method removes the specified collection from your Qdrant instance.

In [11]:
# Define collection name
collection_name = "test_collection"

# Delete the collection
if db.client.delete_collection(collection_name=collection_name):
    print(f"Collection '{collection_name}' has been deleted.")

Collection 'test_collection' has been deleted.


## Manage VectorStore

After you've created your vector store, you can interact with it by adding or deleting items. Here are some common operations:

### Add Items to the Vector Store

The `QdrantDocumentManager` class lets you add items to your vector store using the `upsert` method. This method updates existing documents with new data if their IDs already exist.

In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from uuid import uuid4

# Load the text file
loader = TextLoader("./data/the_little_prince.txt")
documents = loader.load()

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600, chunk_overlap=100, length_function=len
)

split_docs = text_splitter.split_documents(documents)

# Generate unique IDs for documents
uuids = [str(uuid4()) for _ in split_docs[:30]]
page_contents = [doc.page_content for doc in split_docs[:30]]
metadatas = [doc.metadata for doc in split_docs[:30]]

In [22]:
from utils.qdrant_interface import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name=collection_name,
    embedding=embedding,
)

db.upsert(texts=page_contents, metadatas=metadatas, ids=uuids)

['4c38e582-bf13-473d-8878-720ea6fe33d2',
 '6f8d264f-1408-426c-a0d9-0ba007d4df8a',
 '83f6383c-6871-4731-8610-d128df5b3b9d',
 'f93cd8e8-99e5-4646-a8cc-8813abf4d242',
 '23612cc9-8e71-41fe-b686-76e95c351b4f',
 '4d597d0c-a969-4e12-a054-f0c39fc59cfe',
 '9a065986-b60b-4e35-bedb-2775bfd65a77',
 '12e0ade9-1dcd-4fd7-b647-e98e1ac6ec16',
 '333d233d-4887-4329-8562-250c94a5f798',
 '49a716c0-490c-4d01-a843-a23844a81c9f',
 '480a41eb-ed28-4019-8c6b-f957f4227cc3',
 '52104bf1-ba19-4623-80d1-717ce0e0166a',
 'de662d15-e5d5-468c-b697-5802f7a6c7b9',
 '06cb5630-55ab-46cb-bd5a-39a48ef4744d',
 'c58e0281-aa5e-4a9e-93d7-03b1e1082077',
 'b61cc1a4-2298-4bc7-baea-9114d1f93f2e',
 '0aa66408-db7e-4588-a2f1-6907887f656d',
 'ae87a13f-03fb-4cf9-ac2b-70c44498c5c8',
 '9e1afbfd-f549-4f09-8ad1-e151694741bc',
 'f1af56ba-8c3c-4cc4-8eda-4e73e9d20151',
 '986bbb57-e61b-44ec-b043-b64b70cf35d6',
 'fcae7694-8d48-49ca-8320-50df1dc93cf8',
 '3fc036b7-97a0-4fec-b61f-68ccb0fca786',
 'a1b56a7e-cb74-40df-ad48-2163387f6205',
 'c4babef1-0788-

### Delete Items from the Vector Store

The `QdrantDocumentManager` class allows you to delete items from your vector store using the `delete` method. You can specify items to delete by providing IDs or filters.

In [23]:
delete_ids = [uuids[0]]

db.delete(ids=delete_ids)

### Upsert items to vector store (parallel)

The `QdrantDocumentManager` class supports parallel upserts using the `upsert_parallel` method. This efficiently adds or updates multiple items with unique IDs, data, and metadata.

In [24]:
# Generate unique IDs for documents
uuids = [str(uuid4()) for _ in split_docs[30:60]]
page_contents = [doc.page_content for doc in split_docs[30:60]]
metadatas = [doc.metadata for doc in split_docs[30:60]]

db.upsert_parallel(
    texts=page_contents,
    metadatas=metadatas,
    ids=uuids,
    batch_size=32,
    workers=10,
)

['6623fd13-cd54-4181-b657-90911b22de3a',
 '2cae00f3-a540-4217-89e8-ceaff6bb7a50',
 'bd101408-17d4-4dc9-bc71-e8cc3d2c76b5',
 'bc5075eb-f872-4fc5-a889-c6233853630f',
 'db065742-d85c-4a3f-ba98-95eaff06c7dc',
 '80873271-0293-4281-97ad-71b228fe1a39',
 '62d809cf-d7a3-49d5-b34e-bae921995812',
 '22b47ea5-fd70-412b-9722-31b4ea544784',
 'f1658894-3e9d-4416-8187-712f4368746c',
 '192af61b-3da0-4dd9-9138-d441cd07df38',
 '02703d74-7f10-4805-ab8a-67cc1e04bee5',
 '35393884-9987-4fea-ae27-f0fbe01d4cb7',
 '25d6f448-cbe9-4f22-bab4-be345a3f7190',
 '81ab337a-b5fd-4dc5-b97b-09c3373ac900',
 '1e1e2d77-31c2-468d-bb05-b34162216da0',
 'e4b79469-582b-42d3-b315-c6f9b6483d33',
 'e761c8fc-cd3e-4c31-98c4-20958ad6c256',
 'b1010b30-4bbc-4872-af41-49ea7152a81e',
 '29089ae7-25cc-47b9-aa57-2e5484befbe0',
 '08470c7f-5c7b-48b6-8bc4-8606f959059e',
 'fc2685a0-cc0b-46ef-8872-b5bb75f2fb7f',
 '5c487bdf-b145-4157-a6d9-55aae753e97e',
 '9069c953-a9f5-4146-9293-569430998ae6',
 '06cb50bc-77d5-4d2f-af61-628cde8f5695',
 '3aa4f987-77f5-

## Query VectorStore

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

### Query directly

The `QdrantDocumentManager` class allows direct querying using the `search` method. It performs similarity searches by converting queries into vector embeddings to find similar documents.

In [16]:
query = "What is the significance of the rose in The Little Prince?"

response = db.search(
    query=query,
    k=3,
)

for res in response:
    payload = res["payload"]
    print(f"* {payload['page_content'][:200]}\n [{payload['metadata']}]\n\n")

* for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt'}]


* Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt'}]


* [ Chapter 7 ]
- the narrator learns about the secret of the little prince‘s life 
On the fifth day-- again, as always, it was thanks to the sheep-- the secret of the little prince‘s life was revealed 
 [{'source': './data/the_little_prince.txt'}]




### Similarity search with score

The `QdrantDocumentManager` class enables similarity searches with scores using the `search` method. This provides a relevance score for each document found.

In [17]:
# Define the query to search in the database
query = "What is the significance of the rose in The Little Prince?"

# Perform the search with the specified query and number of results
response = db.search(query=query, k=3)

for res in response:
    payload = res["payload"]
    score = res["score"]
    print(
        f"* [SIM={score:.3f}] {payload['page_content'][:200]}\n [{payload['metadata']}]\n\n"
    )

* [SIM=0.527] for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt'}]


* [SIM=0.499] Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt'}]


* [SIM=0.478] [ Chapter 7 ]
- the narrator learns about the secret of the little prince‘s life 
On the fifth day-- again, as always, it was thanks to the sheep-- the secret of the little prince‘s life was revealed 
 [{'source': './data/the_little_prince.txt'}]




### Query by turning into retreiver

The `QdrantDocumentManager` class can transform the vector store into a `retriever`. This allows for easier integration into workflows or chains.

In [None]:
from langchain_qdrant import QdrantVectorStore

# Initialize QdrantVectorStore with the client, collection name, and embedding
vector_store = QdrantVectorStore(
    client=db.client, collection_name=db.collection_name, embedding=db.embedding
)

query = "What is the significance of the rose in The Little Prince?"

# Transform the vector store into a retriever with specific search parameters
retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 3, "score_threshold": 0.3},
)

results = retriever.invoke(query)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt', '_id': '18db9172-aa34-4e48-a580-ada0d9a78fd1', '_collection_name': 'demo_collection'}]


* Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt', '_id': 'b61cc1a4-2298-4bc7-baea-9114d1f93f2e', '_collection_name': 'demo_collection'}]


* [ Chapter 7 ]
- the narrator learns about the secret of the little prince‘s life 
On the fifth day-- again, as always, it was thanks to the sheep-- the secret of the little prince‘s life was revealed 
 [{'source': './data/the_little_prince.txt', '_id': '1c58e76e-46c5-49ea-9975-3c30237fdd52', '_collection_name': 'demo_colle

### Search with Filtering

The `QdrantDocumentManager` class allows searching with filters to retrieve records based on specific metadata values. This is done using the `scroll` method with a defined filter query.

In [None]:
from qdrant_client import models

# Define a filter query to match documents containing the text "Chapter" in the page content
filter_query = models.Filter(
    must=[
        models.FieldCondition(
            key="page_content",
            match=models.MatchText(text="Chapter"),
        ),
    ]
)

# Retrieve records from the collection that match the filter query
db.scroll(
    scroll_filter=filter_query,
    k=10,
)

[Record(id='1c58e76e-46c5-49ea-9975-3c30237fdd52', payload={'page_content': '[ Chapter 7 ]\n- the narrator learns about the secret of the little prince‘s life \nOn the fifth day-- again, as always, it was thanks to the sheep-- the secret of the little prince‘s life was revealed to me. Abruptly, without anything to lead up to it, and as if the question had been born of long and silent meditation on his problem, he demanded: \n"A sheep-- if it eats little bushes, does it eat flowers, too?"\n"A sheep," I answered, "eats anything it finds in its reach."\n"Even flowers that have thorns?"\n"Yes, even flowers that have thorns." \n"Then the thorns-- what use are they?"', 'metadata': {'source': './data/the_little_prince.txt'}}, vector=None, shard_key=None, order_value=None),
 Record(id='25d6f448-cbe9-4f22-bab4-be345a3f7190', payload={'page_content': '[ Chapter 5 ]\n- we are warned as to the dangers of the baobabs\nAs each day passed I would learn, in our talk, something about the little prince‘

### Delete with Filtering

The `QdrantDocumentManager` class allows you to delete records using filters based on specific metadata values. This is achieved with the `delete` method and a filter query.

In [None]:
from qdrant_client.http.models import Filter, FieldCondition, MatchText

# Define a filter query to match documents containing the text "Chapter" in the page content
filter_query = models.Filter(
    must=[
        models.FieldCondition(
            key="page_content",
            match=models.MatchText(text="Chapter"),
        ),
    ]
)

# Delete records from the collection that match the filter query
db.client.delete(collection_name=db.collection_name, points_selector=filter_query)

UpdateResult(operation_id=7, status=<UpdateStatus.COMPLETED: 'completed'>)

### Filtering and Updating Records

The `QdrantDocumentManager` class supports filtering and updating records based on specific metadata values. This is done by retrieving records with filters and updating them as needed.

In [27]:
from qdrant_client import models

# Define a filter query to match documents with a specific metadata source
filter_query = models.Filter(
    must=[
        models.FieldCondition(
            key="metadata.source",
            match=models.MatchValue(value="./data/the_little_prince.txt"),
        ),
    ]
)

# Retrieve records matching the filter query, including their vectors
response = db.scroll(scroll_filter=filter_query, k=10, with_vectors=True)
new_source = "the_little_prince.txt"

# Update the point IDs and set new metadata for the records
for point in response:  # response[0] returns a list of points
    payload = point.payload

    # Check if metadata exists in the payload
    if "metadata" in payload:
        payload["metadata"]["source"] = new_source
    else:
        payload["metadata"] = {
            "source": new_source
        }  # Add new metadata if it doesn't exist

    # Update the point with new metadata
    db.client.upsert(
        collection_name=db.collection_name,
        points=[
            models.PointStruct(
                id=point.id,
                payload=payload,
                vector=point.vector,
            )
        ],
    )

### Similarity Search Options

When using `QdrantVectorStore`, you have three options for performing similarity searches. You can select the desired search mode using the retrieval_mode parameter when you set up the class. The available modes are:

- Dense Vector Search (Default)
- Sparse Vector Search
- Hybrid Search

### Dense Vector Search

To perform a search using only dense vectors:

The `retrieval_mode` parameter must be set to `RetrievalMode.DENSE`. This is also the default setting.
You need to provide a [dense embeddings](https://python.langchain.com/docs/integrations/text_embedding/) value through the embedding parameter.

In [None]:
from langchain_qdrant import RetrievalMode
from langchain_openai import OpenAIEmbeddings

query = "What is the significance of the rose in The Little Prince?"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Initialize QdrantVectorStore with documents, embeddings, and configuration
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs[:50],
    embedding=embedding,
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name="dense_collection",
    retrieval_mode=RetrievalMode.DENSE,
    batch_size=10,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt', '_id': '96ed2a69-8749-480a-9fcf-9f4e1ee96e47', '_collection_name': 'dense_collection'}]


* Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt', '_id': 'c211f71a-6d65-4f4f-a71b-2002ad82a0d5', '_collection_name': 'dense_collection'}]


* "It is a question of discipline," the little prince said to me later on. "When you‘ve finished your own toilet in the morning, then it is time to attend to the toilet of your planet, just so, with the
 [{'source': './data/the_little_prince.txt', '_id': '87948c4e-0cd4-4264-a030-f3e61e16bd28', '_collection_name': 'dense_co

### Sparse Vector Search

To search with only sparse vectors,

The `retrieval_mode` parameter should be set to `RetrievalMode.SPARSE` .
An implementation of the [SparseEmbeddings](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any sparse embeddings provider has to be provided as value to the `sparse_embedding` parameter.
The `langchain-qdrant` package provides a FastEmbed based implementation out of the box.

To use it, install the [FastEmbed](https://github.com/qdrant/fastembed) package.

pip install fastembed

In [29]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode
from langchain_qdrant import RetrievalMode
from langchain_openai import OpenAIEmbeddings

query = "What is the significance of the rose in The Little Prince?"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")
# Initialize sparse embeddings using FastEmbedSparse
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Initialize QdrantVectorStore with documents, embeddings, and configuration
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embedding,
    sparse_embedding=sparse_embeddings,
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name="sparse_collection",
    retrieval_mode=RetrievalMode.SPARSE,
    batch_size=10,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* [ Chapter 20 ]
- the little prince discovers a garden of roses
But it happened that after walking for a long time through sand, and rocks, and snow, the little prince at last came upon a road. And all
 [{'source': './data/the_little_prince.txt', '_id': '54eaf36e-851d-4548-907c-81451f60a003', '_collection_name': 'sparse_collection'}]


* And he went back to meet the fox. 
"Goodbye," he said. 
"Goodbye," said the fox. "And now here is my secret, a very simple secret: It is only with the heart that one can see rightly; what is essential
 [{'source': './data/the_little_prince.txt', '_id': 'b6a7acba-319c-40a8-bcfa-05091c2186c5', '_collection_name': 'sparse_collection'}]


* "The men where you live," said the little prince, "raise five thousand roses in the same garden-- and they do not find in it what they are looking for." 
"They do not find it," I replied. 
"And yet wh
 [{'source': './data/the_little_prince.txt', '_id': 'f8a4b68b-9ed1-4c9b-b84b-986862189366', '_collection_name': 'sparse

### Hybrid Vector Search
To perform a hybrid search using dense and sparse vectors with score fusion,

- The `retrieval_mode` parameter should be set to `RetrievalMode.HYBRID` .
- A [ `dense embeddings` ](https://python.langchain.com/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter.
- An implementation of the [ `SparseEmbeddings` ](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any sparse embeddings provider has to be provided as value to the `sparse_embedding` parameter.

Note that if you've added documents with the `HYBRID` mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection.

In [30]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode
from langchain_qdrant import RetrievalMode
from langchain_openai import OpenAIEmbeddings

query = "What is the significance of the rose in The Little Prince?"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")
# Initialize sparse embeddings using FastEmbedSparse
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Initialize QdrantVectorStore with documents, embeddings, and configuration
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embedding,
    sparse_embedding=sparse_embeddings,
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name="hybrid_collection",
    retrieval_mode=RetrievalMode.HYBRID,
    batch_size=10,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* [ Chapter 20 ]
- the little prince discovers a garden of roses
But it happened that after walking for a long time through sand, and rocks, and snow, the little prince at last came upon a road. And all
 [{'source': './data/the_little_prince.txt', '_id': '3c84f2d7-234b-416d-a417-32d6311a0844', '_collection_name': 'hybrid_collection'}]


* "Go and look again at the roses. You will understand now that yours is unique in all the world. Then come back to say goodbye to me, and I will make you a present of a secret." 
The little prince went
 [{'source': './data/the_little_prince.txt', '_id': '7d4529d7-3d4d-43f4-85c9-35a11f1c77c2', '_collection_name': 'hybrid_collection'}]


* [ Chapter 8 ]
- the rose arrives at the little prince‘s planet
 [{'source': './data/the_little_prince.txt', '_id': 'a4cc316b-34aa-4205-b820-253548d164a6', '_collection_name': 'hybrid_collection'}]


