# Qdrant

- Author: [HyeonJong Moon](https://github.com/hj0302)
- Design: 
- Peer Review: 
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)


## Overview

This notebook demonstrates how to utilize the features related to the `Qdrant` vector database.

[`Qdrant`](https://python.langchain.com/docs/integrations/vectorstores/qdrant/) is an open-source vector similarity search engine designed to store, search, and manage high-dimensional vectors with additional payloads. It offers a production-ready service with a user-friendly API, suitable for applications such as semantic search, recommendation systems, and more.

**Qdrant's architecture** is optimized for efficient vector similarity searches, employing advanced indexing techniques like **Hierarchical Navigable Small World (HNSW)** graphs to enable fast and scalable retrieval of relevant data.


### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Credentials](#credentials)
- [Installation](#installation)
- [Initialization](#initialization)
- [Manage Vector Store](#manage-vector-store)
  - [Create a Collection](#create-a-collection)
  - [List Collections](#list-collections)
  - [Delete a Collection](#delete-a-collection)
  - [Add Items to the Vector Store](#add-items-to-the-vector-store)
  - [Delete Items from the Vector Store](#delete-items-from-the-vector-store)
  - [Upsert Items to Vector Store (Parallel)](#upsert-items-to-vector-store-parallel)
- [Query Vector Store](#query-vector-store)
  - [Query Directly](#query-directly)
  - [Similarity Search with Score](#similarity-search-with-score)
  - [Query by Turning into Retriever](#query-by-turning-into-retriever)
  - [Search with Filtering](#search-with-filtering)
  - [Delete with Filtering](#delete-with-filtering)
  - [Filtering and Updating Records](#filtering-and-updating-records)

### References

- [LangChain Qdrant Reference](https://python.langchain.com/docs/integrations/vectorstores/qdrant/)
- [Qdrant Official Reference](https://qdrant.tech/documentation/frameworks/langchain/)
- [Qdrant Install Reference](https://qdrant.tech/documentation/guides/installation/)
- [Qdrant Cloud Reference](https://cloud.qdrant.io)
- [Qdrant Cloud Quickstart Reference](https://qdrant.tech/documentation/quickstart-cloud/)
----

## Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]
- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [1]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [2]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain_openai",
        "langchain_qdrant",
        "qdrant_client",
        "langchain_core",
        "fastembed",
    ],
    verbose=False,
    upgrade=False,
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPEN_API_KEY": "",
        "QDRANT_API_KEY": "",
        "QDRANT_URL": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "Qdrant",
    }
)

Environment variables have been set successfully.


You can alternatively set API keys such as `OPENAI_API_KEY` in a `.env` file and load them.

**[Note]** If you are using a `.env` file, proceed as follows.

In [4]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

## **Credentials**

Create a new account or sign in to your existing one, and generate an API key for use in this notebook.

1. **Log in to Qdrant Cloud** : Go to the [Qdrant Cloud](https://cloud.qdrant.io) website and log in using your email, Google account, or GitHub account.

2. **Create a Cluster** : After logging in, navigate to the **"Clusters"** section and click the **"Create"** button. Choose your desired configurations and region, then click **"Create"** to start building your cluster. Once the cluster is created, an API key will be generated for you.

3. **Retrieve and Store Your API Key** : When your cluster is created, you will receive an API key. Ensure you save this key in a secure location, as you will need it later. If you lose it, you will have to generate a new one.

4. **Manage API Keys** : To create additional API keys or manage existing ones, go to the **"Access Management"** section in the Qdrant Cloud dashboard and select *"Qdrant Cloud API Keys"* Here, you can create new keys or delete existing ones.

```
QDRANT_API_KEY="YOUR_QDRANT_API_KEY"
```

## **Installation**

There are several main options for initializing and using the **Qdrant** vector store:

- **Local Mode** : This mode doesn't require a separate server.
    - **In-memory storage** (data is not persisted)
    - **On-disk storage** (data is saved to your local machine)
- **Docker Deployments** : You can run **Qdrant** using **Docker**.
- **Qdrant Cloud** : Use **Qdrant** as a managed cloud service.

For detailed instructions, see the [installation instructions](https://qdrant.tech/documentation/guides/installation/).

### In-Memory

For simple tests or quick experiments, you might choose to store data directly in memory. This means the data is automatically removed when your client terminates, typically at the end of your script or notebook session.

In [5]:
from utils.qdrant import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with in-memory storage
db = QdrantDocumentManager(
    location=":memory:",  # Use in-memory database for temporary storage
    collection_name=collection_name,
    embedding=embedding,
)

Collection 'demo_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'demo_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


### On-Disk Storage

With **on-disk storage**, you can store your vectors directly on your hard drive without requiring a **Qdrant server**. This ensures that your data persists even when you restart the program.

In [6]:
from utils.qdrant import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the path for Qdrant storage
qdrant_path = "./qdrant_memory"

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    path=qdrant_path,  # Specify the path for Qdrant storage
    collection_name=collection_name,
    embedding=embedding,
)

Collection 'demo_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'demo_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


### Docker Deployments

You can deploy `Qdrant` in a **production environment** using [`Docker`](https://qdrant.tech/documentation/guides/installation/#docker) and [`Docker Compose`](https://qdrant.tech/documentation/guides/installation/#docker-compose). Refer to the `Docker` and `Docker Compose` setup instructions in the development section for detailed information.

In [7]:
from utils.qdrant import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the URL for Qdrant server
url = "http://localhost:6333"

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=url,  # Specify the path for Qdrant storage
    collection_name=collection_name,
    embedding=embedding,
)

### Qdrant Cloud

For a **production environment**, you can use [**Qdrant Cloud**](https://cloud.qdrant.io/). It offers fully managed `Qdrant` databases with features such as **horizontal and vertical scaling**, **one-click setup and upgrades**, **monitoring**, **logging**, **backups**, and **disaster recovery**. For more information, refer to the [**Qdrant Cloud documentation**](https://qdrant.tech/documentation/cloud/).


In [8]:
import getpass
import os

# Fetch the Qdrant server URL from environment variables or prompt for input
if not os.getenv("QDRANT_URL"):
    os.environ["QDRANT_URL"] = getpass.getpass("Enter your Qdrant Cloud URL key: ")
QDRANT_URL = os.environ.get("QDRANT_URL")

# Fetch the Qdrant API key from environment variables or prompt for input
if not os.getenv("QDRANT_API_KEY"):
    os.environ["QDRANT_API_KEY"] = getpass.getpass("Enter your Qdrant API key: ")
QDRANT_API_KEY = os.environ.get("QDRANT_API_KEY")

In [9]:
from utils.qdrant import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name=collection_name,
    embedding=embedding,
)

Collection 'demo_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'demo_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


## Initialization

Once you've established your **vector store**, you'll likely need to manage the **collections** within it. Here are some common operations you can perform:

- **Create a collection**
- **List collections**
- **Delete a collection**

### Create a Collection

The `QdrantDocumentManager` class allows you to create a new **collection** in `Qdrant`. It can automatically create a collection if it doesn't exist or if you want to **recreate** it. You can specify configurations for **dense** and **sparse vectors** to meet different search needs. Use the `_ensure_collection_exists` method for **automatic creation** or call `create_collection` directly when needed.

In [10]:
from utils.qdrant import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings
from qdrant_client.http.models import Distance

# Define the collection name for storing documents
collection_name = "test_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name=collection_name,
    embedding=embedding,
    metric=Distance.COSINE,
)

Collection 'test_collection' does not exist or force recreate is enabled. Creating new collection...
Collection 'test_collection' created successfully with configuration: {'vectors_config': VectorParams(size=3072, distance=<Distance.COSINE: 'Cosine'>, hnsw_config=None, quantization_config=None, on_disk=None, datatype=None, multivector_config=None)}


### List Collections

The `QdrantDocumentManager` class lets you list all **collections** in your `Qdrant` instance using the `get_collections` method. This retrieves and displays the **names** of all existing collections.


In [11]:
# Retrieve the list of collections from the Qdrant client
collections = db.client.get_collections()

# Iterate over each collection and print its details
for collection in collections.collections:
    print(f"Collection Name: {collection.name}")

Collection Name: test_collection
Collection Name: demo_collection


### Delete a Collection

The `QdrantDocumentManager` class allows you to delete a **collection** using the `delete_collection` method. This method removes the specified collection from your `Qdrant` instance.

In [12]:
# Define collection name
collection_name = "test_collection"

# Delete the collection
if db.client.delete_collection(collection_name=collection_name):
    print(f"Collection '{collection_name}' has been deleted.")

Collection 'test_collection' has been deleted.


## Manage VectorStore

After you've created your **vector store**, you can interact with it by **adding** or **deleting** items. Here are some common operations:

### Add Items to the Vector Store

The `QdrantDocumentManager` class lets you add items to your **vector store** using the `upsert` method. This method **updates** existing documents with new data if their IDs already exist.

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from uuid import uuid4

# Load the text file
loader = TextLoader("./data/the_little_prince.txt")
documents = loader.load()

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600, chunk_overlap=100, length_function=len
)

split_docs = text_splitter.split_documents(documents)

# Generate unique IDs for documents
uuids = [str(uuid4()) for _ in split_docs[:30]]
page_contents = [doc.page_content for doc in split_docs[:30]]
metadatas = [doc.metadata for doc in split_docs[:30]]

In [14]:
from utils.qdrant import QdrantDocumentManager
from langchain_openai import OpenAIEmbeddings

# Define the collection name for storing documents
collection_name = "demo_collection"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Create an instance of QdrantDocumentManager with specified storage path
db = QdrantDocumentManager(
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name=collection_name,
    embedding=embedding,
)

db.upsert(texts=page_contents, metadatas=metadatas, ids=uuids)

['25210710-b810-4ed9-a539-092612691e5d',
 '601f52cf-907f-4968-b3f2-3ddd63edfa47',
 '2e3bef8f-630b-4bac-bfe9-9305016a3670',
 'dd1553e8-5eb7-4933-b0ee-57f85e240fe6',
 '07c9d27a-13a1-4485-82ec-f14c351e6317',
 '7c715419-841b-4bfa-ac65-39f9d242d8f8',
 'e4f6ed39-07cf-413f-92bd-f60a195d6796',
 '3c907652-da5e-4cb4-9358-61a0cca703a2',
 'fba98dba-8016-4b86-b8e1-22ed5f238887',
 '01621ab2-9a9a-4802-8ab4-ef5b0e09d8e0',
 'bd0e8c8c-0fae-47d0-a3c1-ca14b220f1ca',
 '397843d0-e727-47a6-b501-678245a7a7f3',
 'e7a7a225-6757-482b-99d7-dfbabb101927',
 '056c702b-72e6-40d7-8bdf-1e3224f86de3',
 '04c9096d-8f41-40ac-ad09-d8d93c183e69',
 '07b71ecd-d3a4-41ee-8497-0173859bb185',
 '75a77714-7101-4f1e-858a-74e172fbea84',
 'e8d4878d-724e-48ab-a238-0aa87ebb4142',
 '498c1850-36ca-4821-8483-db7ed25588b0',
 'e8190369-6814-4568-bdb5-dbf68f1a44f4',
 '83954225-b0ce-4821-8ea6-b8f89bf9e792',
 '4ad0bdbf-1e93-4ca5-8c2e-53c4689edb4d',
 'c56b3a59-50cb-4c55-9d45-e3391e732696',
 'ad045d18-4f54-4593-9c61-c5f6f0c986b5',
 'f6dc0fed-8a35-

### Delete Items from the Vector Store

The `QdrantDocumentManager` class allows you to delete items from your **vector store** using the `delete` method. You can specify items to delete by providing **IDs** or **filters**.


In [15]:
delete_ids = [uuids[0]]

db.delete(ids=delete_ids)

### Upsert Items to Vector Store (Parallel)

The `QdrantDocumentManager` class supports **parallel upserts** using the `upsert_parallel` method. This efficiently **adds** or **updates** multiple items with unique **IDs**, **data**, and **metadata**.

In [16]:
# Generate unique IDs for documents
uuids = [str(uuid4()) for _ in split_docs[30:60]]
page_contents = [doc.page_content for doc in split_docs[30:60]]
metadatas = [doc.metadata for doc in split_docs[30:60]]

db.upsert_parallel(
    texts=page_contents,
    metadatas=metadatas,
    ids=uuids,
    batch_size=32,
    workers=10,
)

['d8b31778-69fe-4d91-98dc-d39e32681dc0',
 '941eb409-d92d-4863-82ef-e4c5bee1b49a',
 'c68ea294-39f5-4c10-9cf0-b8a05084c176',
 '801bb22d-576c-44aa-bfa4-9f49720fcd08',
 '5fb85e3f-0109-4a6e-8f32-84a333ca81b0',
 '482da4ac-0bf6-4acb-adfe-a4a26fb295ee',
 'acef5e82-c671-4dcb-a11c-37389f38836b',
 'f5cb10b7-c76f-4cb6-abdb-84e5c3813f8d',
 '465bfc46-6107-4ddc-82a1-3f823c5bc2f0',
 'f5962162-5b27-4628-bac8-1be1d559e0b6',
 '31bbc0ef-ab28-43df-bb8d-8123c5b6e07a',
 '975a0fc2-0ce2-4edc-afb6-b529e909f2d0',
 '479a6ff4-5bec-4a9f-ba3f-9797e45f9c5d',
 '8d75e8e6-a42e-4ab9-b467-422daca464c9',
 '2626e819-4dce-4711-b57d-e1facbbf15aa',
 '1cdf89cf-ba3d-4909-8169-4d02e9c79f1c',
 'ef64fb6a-d78c-4cd3-9ee8-cb1d52258378',
 'feb9773a-a96e-4b29-b037-c96a63ff1ec6',
 'e23f5882-8824-4659-aa5e-d97c8210e496',
 '50d0b276-dddd-4c45-8826-72a2a544223a',
 '420dff01-ab42-4ee1-b76a-3adfb47fda5b',
 '4e93d4fc-8900-44a0-91ed-e8fac70f6784',
 '572a16fb-f45a-4b23-af42-4afed429f269',
 '3f4015da-01a9-4d73-bdae-70b3b293c408',
 '432708cf-60d9-

## Query VectorStore

Once your **vector store** has been created and the relevant **documents** have been added, you will most likely wish to **query** it during the running of your `chain` or `agent`.

### Query Directly

The `QdrantDocumentManager` class allows direct **querying** using the `search` method. It performs **similarity searches** by converting queries into **vector embeddings** to find similar **documents**.


In [17]:
query = "What is the significance of the rose in The Little Prince?"

response = db.search(
    query=query,
    k=3,
)

for res in response:
    payload = res["payload"]
    print(f"* {payload['page_content'][:200]}\n [{payload['metadata']}]\n\n")

* for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt'}]


* Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt'}]


* [ Chapter 7 ]
- the narrator learns about the secret of the little prince‘s life 
On the fifth day-- again, as always, it was thanks to the sheep-- the secret of the little prince‘s life was revealed 
 [{'source': './data/the_little_prince.txt'}]




### Similarity Search with Score

The `QdrantDocumentManager` class enables **similarity searches** with **scores** using the `search` method. This provides a **relevance score** for each **document** found.


In [18]:
# Define the query to search in the database
query = "What is the significance of the rose in The Little Prince?"

# Perform the search with the specified query and number of results
response = db.search(query=query, k=3)

for res in response:
    payload = res["payload"]
    score = res["score"]
    print(
        f"* [SIM={score:.3f}] {payload['page_content'][:200]}\n [{payload['metadata']}]\n\n"
    )

* [SIM=0.527] for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt'}]


* [SIM=0.500] Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt'}]


* [SIM=0.478] [ Chapter 7 ]
- the narrator learns about the secret of the little prince‘s life 
On the fifth day-- again, as always, it was thanks to the sheep-- the secret of the little prince‘s life was revealed 
 [{'source': './data/the_little_prince.txt'}]




### Query by Turning into Retriever

The `QdrantDocumentManager` class can transform the **vector store** into a `retriever`. This allows for easier **integration** into **workflows** or **chains**.


In [19]:
from langchain_qdrant import QdrantVectorStore

# Initialize QdrantVectorStore with the client, collection name, and embedding
vector_store = QdrantVectorStore(
    client=db.client, collection_name=db.collection_name, embedding=db.embedding
)

query = "What is the significance of the rose in The Little Prince?"

# Transform the vector store into a retriever with specific search parameters
retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 3, "score_threshold": 0.3},
)

results = retriever.invoke(query)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt', '_id': '7c715419-841b-4bfa-ac65-39f9d242d8f8', '_collection_name': 'demo_collection'}]


* Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt', '_id': '1cdf89cf-ba3d-4909-8169-4d02e9c79f1c', '_collection_name': 'demo_collection'}]


* [ Chapter 7 ]
- the narrator learns about the secret of the little prince‘s life 
On the fifth day-- again, as always, it was thanks to the sheep-- the secret of the little prince‘s life was revealed 
 [{'source': './data/the_little_prince.txt', '_id': '2b5270cd-e9f0-475d-a574-eaa65e7494a6', '_collection_name': 'demo_colle

### Search with Filtering

The `QdrantDocumentManager` class allows **searching with filters** to retrieve records based on specific **metadata values**. This is done using the `scroll` method with a defined **filter query**.

In [20]:
from qdrant_client import models

# Define a filter query to match documents containing the text "Chapter" in the page content
filter_query = models.Filter(
    must=[
        models.FieldCondition(
            key="page_content",
            match=models.MatchText(text="Chapter"),
        ),
    ]
)

# Retrieve records from the collection that match the filter query
db.scroll(
    scroll_filter=filter_query,
    k=10,
)

[Record(id='01621ab2-9a9a-4802-8ab4-ef5b0e09d8e0', payload={'page_content': '[ Chapter 1 ]\n- we are introduced to the narrator, a pilot, and his ideas about grown-ups\nOnce when I was six years old I saw a magnificent picture in a book, called True Stories from Nature, about the primeval forest. It was a picture of a boa constrictor in the act of swallowing an animal. Here is a copy of the drawing. \n(picture)\nIn the book it said: "Boa constrictors swallow their prey whole, without chewing it. After that they are not able to move, and they sleep through the six months that they need for digestion."', 'metadata': {'source': './data/the_little_prince.txt'}}, vector=None, shard_key=None, order_value=None),
 Record(id='07b71ecd-d3a4-41ee-8497-0173859bb185', payload={'page_content': '[ Chapter 2 ]\n- the narrator crashes in the desert and makes the acquaintance of the little prince\nSo I lived my life alone, without anyone that I could really talk to, until I had an accident with my plane

### Delete with Filtering

The `QdrantDocumentManager` class allows you to **delete records** using **filters** based on specific **metadata values**. This is achieved with the `delete` method and a **filter query**.

In [21]:
from qdrant_client.http.models import Filter, FieldCondition, MatchText

# Define a filter query to match documents containing the text "Chapter" in the page content
filter_query = models.Filter(
    must=[
        models.FieldCondition(
            key="page_content",
            match=models.MatchText(text="Chapter"),
        ),
    ]
)

# Delete records from the collection that match the filter query
db.client.delete(collection_name=db.collection_name, points_selector=filter_query)

UpdateResult(operation_id=3, status=<UpdateStatus.COMPLETED: 'completed'>)

### Filtering and Updating Records

The `QdrantDocumentManager` class supports **filtering and updating records** based on specific **metadata values**. This is done by **retrieving records** with **filters** and **updating** them as needed.


In [22]:
from qdrant_client import models

# Define a filter query to match documents with a specific metadata source
filter_query = models.Filter(
    must=[
        models.FieldCondition(
            key="metadata.source",
            match=models.MatchValue(value="./data/the_little_prince.txt"),
        ),
    ]
)

# Retrieve records matching the filter query, including their vectors
response = db.scroll(scroll_filter=filter_query, k=10, with_vectors=True)
new_source = "the_little_prince.txt"

# Update the point IDs and set new metadata for the records
for point in response:  # response[0] returns a list of points
    payload = point.payload

    # Check if metadata exists in the payload
    if "metadata" in payload:
        payload["metadata"]["source"] = new_source
    else:
        payload["metadata"] = {
            "source": new_source
        }  # Add new metadata if it doesn't exist

    # Update the point with new metadata
    db.client.upsert(
        collection_name=db.collection_name,
        points=[
            models.PointStruct(
                id=point.id,
                payload=payload,
                vector=point.vector,
            )
        ],
    )

### Similarity Search Options

When using `QdrantVectorStore`, you have three options for performing **similarity searches**. You can select the desired search mode using the `retrieval_mode` parameter when you set up the class. The available modes are:

- **Dense Vector Search** (Default)
- **Sparse Vector Search**
- **Hybrid Search**

### Dense Vector Search

To perform a search using only **dense vectors**:

- The `retrieval_mode` parameter must be set to `RetrievalMode.DENSE`. This is also the **default setting**.
- You need to provide a [dense embeddings](https://python.langchain.com/docs/integrations/text_embedding/) value through the `embedding` parameter.


In [23]:
from langchain_qdrant import RetrievalMode
from langchain_openai import OpenAIEmbeddings

query = "What is the significance of the rose in The Little Prince?"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")

# Initialize QdrantVectorStore with documents, embeddings, and configuration
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs[:50],
    embedding=embedding,
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name="dense_collection",
    retrieval_mode=RetrievalMode.DENSE,
    batch_size=10,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* for decades. In the book, a pilot is stranded in the midst of the Sahara where he meets a tiny prince from another world traveling the universe in order to understand life. In the book, the little pri
 [{'source': './data/the_little_prince.txt', '_id': 'd5176f5d-808b-4704-ad8e-83f1793a9e3f', '_collection_name': 'dense_collection'}]


* Indeed, as I learned, there were on the planet where the little prince lived-- as on all planets-- good plants and bad plants. In consequence, there were good seeds from good plants, and bad seeds fro
 [{'source': './data/the_little_prince.txt', '_id': 'fcedce21-1406-4902-afdc-0b17944a9a98', '_collection_name': 'dense_collection'}]


* "It is a question of discipline," the little prince said to me later on. "When you‘ve finished your own toilet in the morning, then it is time to attend to the toilet of your planet, just so, with the
 [{'source': './data/the_little_prince.txt', '_id': '77baf9d9-36d4-4ef0-8298-97045d9e1390', '_collection_name': 'dense_co

### Sparse Vector Search

To search with only **sparse vectors**:

- The `retrieval_mode` parameter should be set to `RetrievalMode.SPARSE`.
- An implementation of the [SparseEmbeddings](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any **sparse embeddings provider** has to be provided as a value to the `sparse_embedding` parameter.
- The `langchain-qdrant` package provides a **FastEmbed** based implementation out of the box.

To use it, install the [FastEmbed](https://github.com/qdrant/fastembed) package:

```bash
pip install fastembed
```

In [24]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode
from langchain_qdrant import RetrievalMode
from langchain_openai import OpenAIEmbeddings

query = "What is the significance of the rose in The Little Prince?"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")
# Initialize sparse embeddings using FastEmbedSparse
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Initialize QdrantVectorStore with documents, embeddings, and configuration
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embedding,
    sparse_embedding=sparse_embeddings,
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name="sparse_collection",
    retrieval_mode=RetrievalMode.SPARSE,
    batch_size=10,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

Fetching 30 files:   0%|          | 0/30 [00:00<?, ?it/s]

azerbaijani.txt:   0%|          | 0.00/967 [00:00<?, ?B/s]

basque.txt:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

catalan.txt:   0%|          | 0.00/1.56k [00:00<?, ?B/s]

chinese.txt:   0%|          | 0.00/5.56k [00:00<?, ?B/s]

danish.txt:   0%|          | 0.00/424 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

bengali.txt:   0%|          | 0.00/5.44k [00:00<?, ?B/s]

arabic.txt:   0%|          | 0.00/6.35k [00:00<?, ?B/s]

finnish.txt:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

hebrew.txt:   0%|          | 0.00/1.84k [00:00<?, ?B/s]

french.txt:   0%|          | 0.00/813 [00:00<?, ?B/s]

greek.txt:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

hinglish.txt:   0%|          | 0.00/5.96k [00:00<?, ?B/s]

english.txt:   0%|          | 0.00/936 [00:00<?, ?B/s]

dutch.txt:   0%|          | 0.00/453 [00:00<?, ?B/s]

german.txt:   0%|          | 0.00/1.36k [00:00<?, ?B/s]

indonesian.txt:   0%|          | 0.00/6.45k [00:00<?, ?B/s]

hungarian.txt:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

italian.txt:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

kazakh.txt:   0%|          | 0.00/3.88k [00:00<?, ?B/s]

norwegian.txt:   0%|          | 0.00/851 [00:00<?, ?B/s]

portuguese.txt:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

nepali.txt:   0%|          | 0.00/3.61k [00:00<?, ?B/s]

romanian.txt:   0%|          | 0.00/1.91k [00:00<?, ?B/s]

russian.txt:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

slovene.txt:   0%|          | 0.00/16.0k [00:00<?, ?B/s]

spanish.txt:   0%|          | 0.00/2.18k [00:00<?, ?B/s]

tajik.txt:   0%|          | 0.00/1.82k [00:00<?, ?B/s]

turkish.txt:   0%|          | 0.00/260 [00:00<?, ?B/s]

swedish.txt:   0%|          | 0.00/559 [00:00<?, ?B/s]

* [ Chapter 20 ]
- the little prince discovers a garden of roses
But it happened that after walking for a long time through sand, and rocks, and snow, the little prince at last came upon a road. And all
 [{'source': './data/the_little_prince.txt', '_id': '19560ca6-a601-4e44-b762-92b1a25574e0', '_collection_name': 'sparse_collection'}]


* And he went back to meet the fox. 
"Goodbye," he said. 
"Goodbye," said the fox. "And now here is my secret, a very simple secret: It is only with the heart that one can see rightly; what is essential
 [{'source': './data/the_little_prince.txt', '_id': 'c433d060-394c-4160-bf4b-23e1416e3583', '_collection_name': 'sparse_collection'}]


* "The men where you live," said the little prince, "raise five thousand roses in the same garden-- and they do not find in it what they are looking for." 
"They do not find it," I replied. 
"And yet wh
 [{'source': './data/the_little_prince.txt', '_id': '34c5c6d3-5124-42c4-bc27-c272ba47972f', '_collection_name': 'sparse

### Hybrid Vector Search

To perform a **hybrid search** using **dense** and **sparse vectors** with **score fusion**:

- The `retrieval_mode` parameter should be set to `RetrievalMode.HYBRID`.
- A [`dense embeddings`](https://python.langchain.com/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter.
- An implementation of the [`SparseEmbeddings`](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any **sparse embeddings provider** has to be provided as a value to the `sparse_embedding` parameter.

**Note**: If you've added documents with the `HYBRID` mode, you can switch to any **retrieval mode** when searching, since both the **dense** and **sparse vectors** are available in the **collection**.

In [25]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode
from langchain_qdrant import RetrievalMode
from langchain_openai import OpenAIEmbeddings

query = "What is the significance of the rose in The Little Prince?"

# Initialize the embedding model with a specific OpenAI model
embedding = OpenAIEmbeddings(model="text-embedding-3-large")
# Initialize sparse embeddings using FastEmbedSparse
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Initialize QdrantVectorStore with documents, embeddings, and configuration
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embedding,
    sparse_embedding=sparse_embeddings,
    url=QDRANT_URL,
    api_key=QDRANT_API_KEY,
    collection_name="hybrid_collection",
    retrieval_mode=RetrievalMode.HYBRID,
    batch_size=10,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* "Go and look again at the roses. You will understand now that yours is unique in all the world. Then come back to say goodbye to me, and I will make you a present of a secret." 
The little prince went
 [{'source': './data/the_little_prince.txt', '_id': '8cd6824d-9e0c-4d45-9556-d4b8c36b5516', '_collection_name': 'hybrid_collection'}]


* [ Chapter 20 ]
- the little prince discovers a garden of roses
But it happened that after walking for a long time through sand, and rocks, and snow, the little prince at last came upon a road. And all
 [{'source': './data/the_little_prince.txt', '_id': '8225734e-12e8-4328-a81d-fbd4e41ed28c', '_collection_name': 'hybrid_collection'}]


* [ Chapter 8 ]
- the rose arrives at the little prince‘s planet
 [{'source': './data/the_little_prince.txt', '_id': 'bf5b7e68-7788-429e-aaf8-190ac0e7ed09', '_collection_name': 'hybrid_collection'}]


