# Qdrant

- Author: [HyeonJong Moon](https://github.com/hj0302)
- Design: 
- Peer Review: 
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain-academy/blob/main/module-4/sub-graph.ipynb) [![Open in LangChain Academy](https://cdn.prod.website-files.com/65b8cd72835ceeacd4449a53/66e9eba12c7b7688aa3dbb5e_LCA-badge-green.svg)](https://academy.langchain.com/courses/take/intro-to-langgraph/lessons/58239937-lesson-2-sub-graphs)


## Overview

This notebook demonstrates how to utilize the features related to the `Qdrant` vector database.

[`Qdrant`](https://python.langchain.com/docs/integrations/vectorstores/qdrant/) is an open-source vector similarity search engine designed to store, search, and manage high-dimensional vectors with additional payloads. It offers a production-ready service with a user-friendly API, suitable for applications such as semantic search, recommendation systems, and more.

Qdrant's architecture is optimized for efficient vector similarity searches, employing advanced indexing techniques like Hierarchical Navigable Small World (HNSW) graphs to enable fast and scalable retrieval of relevant data.


### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Credentials](#credentials)
- [Installation](#installation)
- [Initialization](#initialization)
- [Manage VectorStore](#manage-vectorstore)
- [Query VectorStore](#query-vectorstore)

### References

- [LangChain Qdrant Reference](https://python.langchain.com/docs/integrations/vectorstores/qdrant/)
- [Qdrant Official Reference](https://qdrant.tech/documentation/frameworks/langchain/)
- [Qdrant Install Reference](https://qdrant.tech/documentation/guides/installation/)
- [Qdrant Cloud Reference](https://cloud.qdrant.io)
- [Qdrant Cloud Quickstart Reference](https://qdrant.tech/documentation/quickstart-cloud/)
----

## Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]
- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [1]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [3]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain_openai",
        "langchain_qdrant",
        "qdrant_client",
        "langchain_core",
        "fastembed",
    ],
    verbose=False,
    upgrade=False,
)

In [4]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPEN_API_KEY": "",
        "QDRANT_API_KEY": "",
        "QDRANT_URL": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "Qdrant",
    }
)

Environment variables have been set successfully.


You can alternatively set API keys such as `OPENAI_API_KEY` in a `.env` file and load them.

**[Note]** If you are using a `.env` file, proceed as follows.

In [5]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Credentials

Create a new account or sign in to your existing one, and generate an API key for use in this notebook.

1. **Log in to Qdrant Cloud** : Go to the [Qdrant Cloud](https://cloud.qdrant.io) website and log in using your email, Google account, or GitHub account.

2. **Create a Cluster** : After logging in, navigate to the `"Clusters"` section and click the `"Create"` button. Choose your desired configurations and region, then click `"Create"` to start building your cluster. Once the cluster is created, an API key will be generated for you.

3. **Retrieve and Store Your API Key** : When your cluster is created, you will receive an API key. Ensure you save this key in a secure location, as you will need it later. If you lose it, you will have to generate a new one.

4. **Manage API Keys** : To create additional API keys or manage existing ones, go to the `"Access Management"` section in the Qdrant Cloud dashboard and select `"Qdrant Cloud API Keys"` Here, you can create new keys or delete existing ones.

```
QDRANT_API_KEY="YOUR_QDRANT_API_KEY"
```

## Installation

There are several main options for initializing and using the Qdrant vector store:

- **Local Mode** : This mode doesn't require a separate server.
    - **In-memory storage** (data is not persisted)
    - **On-disk storage** (data is saved to your local machine)
- **Docker Deployments** : You can run Qdrant using Docker.
- **Qdrant Cloud** : Use Qdrant as a managed cloud service.

For detailed instructions, see the [installation instructions](https://qdrant.tech/documentation/guides/installation/).

### In-Memory

For simple tests or quick experiments, you might choose to store data directly in memory. This means the data is automatically removed when your client terminates, typically at the end of your script or notebook session.

In [7]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from langchain_openai import OpenAIEmbeddings

# Step 1: Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# Step 2: Initialize Qdrant client
client = QdrantClient(":memory:")

# Step 3: Create a Qdrant collection
collection_name = "demo_collection"
client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=3072, distance=Distance.COSINE),
)

# Step 4: Initialize QdrantVectorStore
vector_store = QdrantVectorStore(
    client=client,
    collection_name=collection_name,
    embedding=embeddings,
)

### On-Disk Storage

With on-disk storage, you can store your vectors directly on your hard drive without requiring a Qdrant server. This ensures that your data persists even when you restart the program.

In [8]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from langchain_openai import OpenAIEmbeddings

# Step 1: Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# Step 2: Initialize Qdrant client
qdrant_path = "./qdrant_memory"
client = QdrantClient(path=qdrant_path)

# Step 3: Create a Qdrant collection
collection_name = "demo_collection"
client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=3072, distance=Distance.COSINE),
)

# Step 4: Initialize QdrantVectorStore
vector_store = QdrantVectorStore(
    client=client,
    collection_name=collection_name,
    embedding=embeddings,
)

### Docker Deployments

You can deploy `Qdrant` in a production environment using [Docker](https://qdrant.tech/documentation/guides/installation/#docker) and [Docker Compose](https://qdrant.tech/documentation/guides/installation/#docker-compose). Refer to the Docker and Docker Compose setup instructions in the development section for detailed information.

In [9]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from langchain_openai import OpenAIEmbeddings

# Step 1: Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# Step 2: Initialize Qdrant client
url = "http://localhost:6333"
client = QdrantClient(url=url)

# Step 3: Create a Qdrant collection
collection_name = "demo_collection"
client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=3072, distance=Distance.COSINE),
)

# Step 4: Initialize QdrantVectorStore
vector_store = QdrantVectorStore(
    client=client,
    collection_name=collection_name,
    embedding=embeddings,
)

### Qdrant Cloud

For a production environment, you can use [Qdrant Cloud](https://cloud.qdrant.io/). It offers fully managed `Qdrant` databases with features such as horizontal and vertical scaling, one-click setup and upgrades, monitoring, logging, backups, and disaster recovery. For more information, refer to the [Qdrant Cloud documentation](https://qdrant.tech/documentation/cloud/).

In [10]:
import getpass
import os

# Fetch the Qdrant server URL from environment variables or prompt for input
if not os.getenv("QDRANT_URL"):
    os.environ["QDRANT_URL"] = getpass.getpass("Enter your Qdrant Cloud URL key: ")
url = os.environ.get("QDRANT_URL")

# Fetch the Qdrant API key from environment variables or prompt for input
if not os.getenv("QDRANT_API_KEY"):
    os.environ["QDRANT_API_KEY"] = getpass.getpass("Enter your Qdrant API key: ")
api_key = os.environ.get("QDRANT_API_KEY")

In [17]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from langchain_openai import OpenAIEmbeddings

# Step 1: Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

# Step 2: Initialize Qdrant client
client = QdrantClient(
    url=url,
    api_key=api_key,
)

# Step 3: Create a Qdrant collection
collection_name = "demo_collection"
client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=3072, distance=Distance.COSINE),
)

# Step 4: Initialize QdrantVectorStore
vector_store = QdrantVectorStore(
    client=client,
    collection_name=collection_name,
    embedding=embeddings,
)

## Initialization

Once you've established your vector store, you'll likely need to manage the collections within it. Here are some common operations you can perform:

- Create a collection
- List collections
- Delete a collection
- Use an existing collection

### Create a Collection

To create a new collection in your Qdrant instance, you can use the `QdrantClient` class from the `qdrant-client` library.

In [56]:
from qdrant_client import QdrantClient
from qdrant_client.http.models import VectorParams, Distance

# Step 1: Define collection name
collection_name = "my_new_collection"

# Initialize the Qdrant client
client = QdrantClient(
    url=url,
    api_key=api_key,
)

# Create a new collection in Qdrant
client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=3072, distance=Distance.COSINE),
)

# Print confirmation
print(f"Collection '{collection_name}' created successfully.")

Collection 'my_new_collection' created successfully.


### List Collections

To list all existing collections in your Qdrant instance, you can use the `QdrantClient` class from the `qdrant-client` library.

In [57]:
from qdrant_client import QdrantClient

# Initialize the Qdrant client
client = QdrantClient(
    url=url,
    api_key=api_key,
)

# Retrieve and print collection names
collections_response = client.get_collections()
for collection in collections_response.collections:
    print(f"Collection Name: {collection.name}")

Collection Name: my_new_collection
Collection Name: demo_collection


### Delete a Collection

To delete a collection in Qdrant using the Python client, you can use the `delete_collection` method of the `QdrantClient` object.

In [58]:
from qdrant_client import QdrantClient

# Define collection name
collection_name = "my_new_collection"

# Initialize the Qdrant client
client = QdrantClient(
    url=url,
    api_key=api_key,
)

# Delete the collection
if client.delete_collection(collection_name=collection_name):
    print(f"Collection '{collection_name}' has been deleted.")

Collection 'my_new_collection' has been deleted.


### Use an Existing Collection

This code snippet demonstrates how to initialize a `QdrantVectorStore` using the `from_existing_collection` method provided by the langchain_qdrant library

In [21]:
from langchain_qdrant import QdrantVectorStore

collection_name = "demo_collection"

# Initialize QdrantVectorStore using from_existing_collection method
vector_store = QdrantVectorStore.from_existing_collection(
    embedding=embeddings,
    collection_name=collection_name,
    url=url,
    api_key=api_key,
    prefer_grpc=False,
)

**Direct Initialization** 
- Offers more control by utilizing an existing `QdrantClient` instance, making it suitable for complex applications that require customized client configurations.

**from_existing_collection Method** 
- Provides a simplified and concise way to connect to an existing collection, ideal for quick setups or simpler applications.

## Manage VectorStore

After you've created your vector store, you can interact with it by adding or deleting items. Here are some common operations:

### Add Items to the Vector Store

With `Qdrant`, you can add items to your vector store using the `add_documents` function. If you add a document with an ID that already exists, the existing document will be updated with the new data. This process is called `upsert`.

In [22]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from uuid import uuid4

# Load the text file
loader = TextLoader("./data/the_little_prince.txt")
documents = loader.load()

# Initialize the text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600,
    chunk_overlap=100,
    length_function=len
)
split_docs = text_splitter.split_documents(documents)

# Generate unique IDs for documents
uuids = [str(uuid4()) for _ in split_docs]

# Add documents to the vector store
vector_store.add_documents(documents=split_docs, ids=uuids)
print(f"Uploaded {len(split_docs)} documents to Qdrant collection 'little_prince_collection'")

Uploaded 222 documents to Qdrant collection 'little_prince_collection'


### Delete Items from the Vector Store

To remove items from your vector store, use the `delete` function. You can specify the items to delete using either IDs or filters.

In [23]:
# Retrieve the last point ID from the list of UUIDs
point_id = uuids[-1]

# Delete the vector point by its point_id
vector_store.delete(ids=[point_id])

# Print confirmation of deletion
print(f"Vector point with ID {point_id} has been deleted.")

Vector point with ID f2757b9a-f18e-4872-99ea-e2f180eff39c has been deleted.


### Update items from vector store

To update items in your vector store, use the `set_payload` function. This function allows you to modify the content or metadata of existing item

In [24]:
def retrieve_point_payload(vector_store, point_id):
    """
    Retrieve the payload of a point from the Qdrant collection using its ID.

    Args:
        vector_store (QdrantVectorStore): The vector store instance connected to the Qdrant collection.
        point_id (str): The unique identifier of the point to retrieve.

    Returns:
        dict: The payload of the retrieved point.

    Raises:
        ValueError: If the point with the specified ID is not found in the collection.
    """
    # Retrieve the vector point using the client
    response = vector_store.client.retrieve(
        collection_name=vector_store.collection_name,
        ids=[point_id],
    )

    # Check if the response is empty
    if not response:
        raise ValueError(f"Point ID {point_id} not found in the collection.")

    # Extract the payload from the retrieved point
    point = response[0]
    payload = point.payload
    print(f"Payload for point ID {point_id}: \n{payload}\n")

    return payload

In [25]:
point_id = uuids[0]

# Retrieve the payload for the specified point ID
payload = retrieve_point_payload(vector_store, point_id)

Payload for point ID d37cd6ce-a2a8-4550-9920-6ae885937d1b: 
{'page_content': 'The Little Prince\nWritten By Antoine de Saiot-Exupery (1900〜1944)', 'metadata': {'source': './data/the_little_prince.txt'}}



In [26]:
def update_point_payload(vector_store, point_id, new_payload):
    """
    Update the payload of a specific point in a Qdrant collection.

    Args:
        vector_store (QdrantVectorStore): The vector store instance connected to the Qdrant collection.
        point_id (str): The unique identifier of the point to update.
        new_payload (dict): A dictionary containing the new payload data to set for the point.

    Returns:
        None

    Raises:
        Exception: If the update operation fails.
    """
    try:
        # Update the payload for the specified point
        vector_store.client.set_payload(
            collection_name=vector_store.collection_name,
            payload=new_payload,
            points=[point_id],
        )
        print(f"Successfully updated payload for point ID {point_id}.")
    except Exception as e:
        print(f"Failed to update payload for point ID {point_id}: {e}")
        raise

In [27]:
point_id = uuids[0]
new_payload = {"page_content": "The Little Prince (1943)"}

# Update the point's payload
update_point_payload(vector_store, point_id, new_payload)

Successfully updated payload for point ID d37cd6ce-a2a8-4550-9920-6ae885937d1b.


### Upsert items to vector store (parallel)

Use the `set_payload` function in parallel to efficiently add or update multiple items in the vector store using unique IDs, data, and metadata.

In [28]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List, Dict, Tuple


def update_payloads_parallel(
    vector_store, updates: List[Tuple[str, Dict]], num_workers: int
):
    """
    Update the payloads of multiple points in a Qdrant collection in parallel.

    Args:
        updates (List[Tuple[str, Dict]]): A list of tuples containing point IDs and their corresponding new payloads.
        num_workers (int): Number of worker threads to use for parallel execution.

    Returns:
        None
    """
    # Create a ThreadPoolExecutor
    with ThreadPoolExecutor(max_workers=num_workers) as executor:
        # Submit update tasks to the executor
        future_to_point_id = {
            executor.submit(
                update_point_payload, vector_store, point_id, new_payload
            ): point_id
            for point_id, new_payload in updates
        }

        # Process completed futures
        for future in as_completed(future_to_point_id):
            point_id = future_to_point_id[future]
            try:
                future.result()
            except Exception as e:
                print(f"Error updating point ID {point_id}: {e}")

In [30]:
payload = retrieve_point_payload(vector_store, uuids[2])

Payload for point ID 826c9c2c-b1eb-4b11-976b-d915dff2fbff: 
{'page_content': 'Born in 1900 in Lyons, France, young Antoine was filled with a passion for adventure. When he failed an entrance exam for the Naval Academy, his interest in aviation took hold. He joined the French Army Air Force in 1921 where he first learned to fly a plane. Five years later, he would leave the military in order to begin flying air mail between remote settlements in the Sahara desert.', 'metadata': {'source': './data/the_little_prince.txt'}}



In [31]:
# Update example
updates = [
    (uuids[1], {"page_content": "Antoine de Saint-Exupéry's passion for aviation not only fueled remarkable stories but also reflected the enduring allure of flight, inspiring technological advancements and daring feats that captivated the world over the past century."}),
    (uuids[2], {"page_content": "Antoine de Saint-Exupéry, born in 1900 in Lyons, France, had an adventurous spirit from a young age. After failing the Naval Academy entrance exam, his fascination with aviation began to take flight. In 1921, he joined the French Army Air Force and learned to pilot an aircraft. By 1926, he left the military to embark on a career as an airmail pilot, delivering letters to isolated communities in the vast Sahara desert"}),
    # Add more (point_id, new_payload) tuples as needed
]

# Update payloads in parallel
num_workers = 4
update_payloads_parallel(vector_store, updates, num_workers)

Successfully updated payload for point ID 826c9c2c-b1eb-4b11-976b-d915dff2fbff.
Successfully updated payload for point ID 6b452e07-ff14-469d-b511-690696d709cd.


## Query VectorStore

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

### Query directly

The most straightforward use case for the `Qdrant` vector store is performing similarity searches. Internally, your query is converted into a vector embedding, which is then used to identify similar documents within the `Qdrant` collection.

In [34]:
query = "What is the significance of the rose in The Little Prince?"

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* "Go and look again at the roses. You will understand now that yours is unique in all the world. Then come back to say goodbye to me, and I will make you a present of a secret." 
The little prince went
 [{'source': './data/the_little_prince.txt', '_id': '43256805-f69e-4f1c-b7a1-a77848f213a7', '_collection_name': 'demo_collection'}]


* [ Chapter 8 ]
- the rose arrives at the little prince‘s planet
 [{'source': './data/the_little_prince.txt', '_id': '33218839-ee8e-4b05-b0e3-640d14a179f6', '_collection_name': 'demo_collection'}]


* As his lips opened slightly with the suspicious of a half-smile, I said to myself, again: "What moves me so deeply, about this little prince who is sleeping here, is his loyalty to a flower-- the imag
 [{'source': './data/the_little_prince.txt', '_id': '108f3ac8-9931-42d0-a15a-783172f6a424', '_collection_name': 'demo_collection'}]




### Similarity search with score

You can also search with score:

In [35]:
query = "What is the significance of the rose in The Little Prince?"

results = vector_store.similarity_search_with_score(
    query=query,
    k=3,
)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content[:200]}\n [{doc.metadata}]\n\n")

* [SIM=0.584964] "Go and look again at the roses. You will understand now that yours is unique in all the world. Then come back to say goodbye to me, and I will make you a present of a secret." 
The little prince went
 [{'source': './data/the_little_prince.txt', '_id': '43256805-f69e-4f1c-b7a1-a77848f213a7', '_collection_name': 'demo_collection'}]


* [SIM=0.542256] [ Chapter 8 ]
- the rose arrives at the little prince‘s planet
 [{'source': './data/the_little_prince.txt', '_id': '33218839-ee8e-4b05-b0e3-640d14a179f6', '_collection_name': 'demo_collection'}]


* [SIM=0.539363] As his lips opened slightly with the suspicious of a half-smile, I said to myself, again: "What moves me so deeply, about this little prince who is sleeping here, is his loyalty to a flower-- the imag
 [{'source': './data/the_little_prince.txt', '_id': '108f3ac8-9931-42d0-a15a-783172f6a424', '_collection_name': 'demo_collection'}]




### Query by turning into retreiver

You can also transform the vector store into a `retriever` for easier usage in your workflows or chains.

In [36]:
query = "What is the significance of the rose in The Little Prince?"

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 3, "score_threshold": 0.5},
)

results = retriever.invoke(query)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* "Go and look again at the roses. You will understand now that yours is unique in all the world. Then come back to say goodbye to me, and I will make you a present of a secret." 
The little prince went
 [{'source': './data/the_little_prince.txt', '_id': '43256805-f69e-4f1c-b7a1-a77848f213a7', '_collection_name': 'demo_collection'}]


* [ Chapter 8 ]
- the rose arrives at the little prince‘s planet
 [{'source': './data/the_little_prince.txt', '_id': '33218839-ee8e-4b05-b0e3-640d14a179f6', '_collection_name': 'demo_collection'}]


* As his lips opened slightly with the suspicious of a half-smile, I said to myself, again: "What moves me so deeply, about this little prince who is sleeping here, is his loyalty to a flower-- the imag
 [{'source': './data/the_little_prince.txt', '_id': '108f3ac8-9931-42d0-a15a-783172f6a424', '_collection_name': 'demo_collection'}]




### Search with Filtering

This code demonstrates how to search for and retrieve records from a Qdrant vector database based on specific metadata field values.

In [37]:
from qdrant_client.http.models import Filter, FieldCondition, MatchValue, MatchText

def filter_and_retrieve_records(vector_store, filter_condition):
    """
    Retrieve records from a Qdrant vector store based on a given filter condition.

    Args:
        vector_store (QdrantVectorStore): The vector store instance connected to the Qdrant collection.
        filter_condition (Filter): The filter condition to apply for retrieving records.

    Returns:
        list: A list of records matching the filter condition.
    """
    all_records = []
    next_page_offset = None

    while True:
        response, next_page_offset = vector_store.client.scroll(
            collection_name=vector_store.collection_name,
            scroll_filter=filter_condition,
            limit=10,
            offset=next_page_offset,
            with_payload=True,
        )
        all_records.extend(response)
        if next_page_offset is None:
            break

    return all_records

In [38]:
filter_condition = Filter(
    must=[
        FieldCondition(
            key="page_content",  # Ensure this key matches your payload structure
            match=MatchText(text="Academy")  # Use MatchValue for exact matches
            # key="metadata.source",
            # match=MatchValue(value="./data/the_little_prince.txt") 
        )
    ]
)

# Retrieve records based on the filter condition
records = filter_and_retrieve_records(vector_store, filter_condition)

# Print the retrieved records
for record in records:
    print(f"ID: {record.id}\nPayload: {record.payload}\n")

ID: 826c9c2c-b1eb-4b11-976b-d915dff2fbff
Payload: {'page_content': 'Antoine de Saint-Exupéry, born in 1900 in Lyons, France, had an adventurous spirit from a young age. After failing the Naval Academy entrance exam, his fascination with aviation began to take flight. In 1921, he joined the French Army Air Force and learned to pilot an aircraft. By 1926, he left the military to embark on a career as an airmail pilot, delivering letters to isolated communities in the vast Sahara desert', 'metadata': {'source': './data/the_little_prince.txt'}}



### Delete with Filtering

This code demonstrates how to delete records from a Qdrant vector database based on specific metadata field values.

In [39]:
from qdrant_client.http.models import Filter, FieldCondition, MatchValue

# Define the filter condition
filter_condition = Filter(
    must=[
        FieldCondition(
            key="page_content",  # Ensure this key matches your payload structure
            match=MatchText(text="Academy")  # Use MatchValue for exact matches
        )
    ]
)

# Perform the delete operation
client.delete(
    collection_name=vector_store.collection_name,
    points_selector=filter_condition,
    wait=True,
)

print("Delete operation completed.")

Delete operation completed.


### Filtering and Updating Records

This code demonstrates how to retrieve and display records from a Qdrant collection based on a specific metadata field value.

In [41]:
# Define the filter condition
filter_condition = Filter(
    must=[
        FieldCondition(
            key="page_content",  # Ensure this key matches your payload structure
            match=MatchText(text="Chapter")  # Use MatchValue for exact matches
        )
    ]
)
# Retrieve matching records using the existing function
matching_points = filter_and_retrieve_records(vector_store, filter_condition)

# Prepare updates for matching points
for point in matching_points:
    updated_payload = point.payload.copy()
    
    # Update the page_content field by replacing "Chapter" with "Chapter -"
    updated_payload["page_content"] = updated_payload["page_content"].replace("Chapter", "Chapter -")

    # Update the payload using the existing function
    update_point_payload(vector_store, point.id, updated_payload)

print("Update operation completed.")

Successfully updated payload for point ID 0886275c-7e45-4995-a21f-5be3d3167982.
Successfully updated payload for point ID 0eb82de9-c76d-4a28-8c8f-6b56111fc4d1.
Successfully updated payload for point ID 25499f1d-ea05-41dd-a446-a43a715de9c8.
Successfully updated payload for point ID 33218839-ee8e-4b05-b0e3-640d14a179f6.
Successfully updated payload for point ID 4190a87e-ec1c-4d3b-a6d4-751d6063c1ca.
Successfully updated payload for point ID 591705b9-9d39-4ea1-b008-3d5b737c0893.
Successfully updated payload for point ID 5c091d91-a5d9-4c74-8ec2-68bc3b1b0cd9.
Successfully updated payload for point ID 664251ca-40a4-4943-be07-bb5fafda5512.
Successfully updated payload for point ID 6faeb5c4-687e-4e86-8d93-cef2d09432fe.
Successfully updated payload for point ID 727bfecb-3b52-4d95-8bcd-c2d56f2c4a83.
Successfully updated payload for point ID 76904308-4fcf-43c2-bd3c-8b5cfdfc92c2.
Successfully updated payload for point ID 77b070d5-f44b-4ea4-b53d-a4a39fac3920.
Successfully updated payload for point I

### Similarity Search Options

When using `QdrantVectorStore`, you have three options for performing similarity searches. You can select the desired search mode using the retrieval_mode parameter when you set up the class. The available modes are:

- Dense Vector Search (Default)
- Sparse Vector Search
- Hybrid Search

### Dense Vector Search

To perform a search using only dense vectors:

The `retrieval_mode` parameter must be set to `RetrievalMode.DENSE`. This is also the default setting.
You need to provide a [dense embeddings](https://python.langchain.com/docs/integrations/text_embedding/) value through the embedding parameter.

In [51]:
from langchain_qdrant import RetrievalMode

query = "What is the significance of the rose in The Little Prince?"

# Initialize QdrantVectorStore
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embeddings,
    url=url,
    api_key=api_key,
    collection_name="dense_collection",
    retrieval_mode=RetrievalMode.DENSE,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* "Go and look again at the roses. You will understand now that yours is unique in all the world. Then come back to say goodbye to me, and I will make you a present of a secret." 
The little prince went
 [{'source': './data/the_little_prince.txt', '_id': '4d929432-384c-4ebc-a1fc-b6f1a9e55f65', '_collection_name': 'dense_collection'}]


* [ Chapter 8 ]
- the rose arrives at the little prince‘s planet
 [{'source': './data/the_little_prince.txt', '_id': 'ca9dc685-9534-40a3-b4f6-87246c294441', '_collection_name': 'dense_collection'}]


* As his lips opened slightly with the suspicious of a half-smile, I said to myself, again: "What moves me so deeply, about this little prince who is sleeping here, is his loyalty to a flower-- the imag
 [{'source': './data/the_little_prince.txt', '_id': '4fdeac62-8bf6-492f-a6fe-2ae9d287edbf', '_collection_name': 'dense_collection'}]




### Sparse Vector Search

To search with only sparse vectors,

The `retrieval_mode` parameter should be set to `RetrievalMode.SPARSE` .
An implementation of the [SparseEmbeddings](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any sparse embeddings provider has to be provided as value to the `sparse_embedding` parameter.
The `langchain-qdrant` package provides a FastEmbed based implementation out of the box.

To use it, install the [FastEmbed](https://github.com/qdrant/fastembed) package.

pip install fastembed

In [50]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

query = "What is the significance of the rose in The Little Prince?"

# Initialize QdrantVectorStore
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embeddings,
    sparse_embedding=sparse_embeddings,
    url=url,
    api_key=api_key,
    collection_name="sparse_collection",
    retrieval_mode=RetrievalMode.SPARSE,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* [ Chapter 20 ]
- the little prince discovers a garden of roses
But it happened that after walking for a long time through sand, and rocks, and snow, the little prince at last came upon a road. And all
 [{'source': './data/the_little_prince.txt', '_id': '5b985962-0382-41ab-92c3-26dece2f6bee', '_collection_name': 'sparse_collection'}]


* And he went back to meet the fox. 
"Goodbye," he said. 
"Goodbye," said the fox. "And now here is my secret, a very simple secret: It is only with the heart that one can see rightly; what is essential
 [{'source': './data/the_little_prince.txt', '_id': '9f494e66-602a-48f6-8b46-f2f0685592e9', '_collection_name': 'sparse_collection'}]


* "The men where you live," said the little prince, "raise five thousand roses in the same garden-- and they do not find in it what they are looking for." 
"They do not find it," I replied. 
"And yet wh
 [{'source': './data/the_little_prince.txt', '_id': 'e7bab202-40c9-4d46-9b2a-f15f76344281', '_collection_name': 'sparse

### Hybrid Vector Search
To perform a hybrid search using dense and sparse vectors with score fusion,

- The `retrieval_mode` parameter should be set to `RetrievalMode.HYBRID` .
- A [ `dense embeddings` ](https://python.langchain.com/docs/integrations/text_embedding/) value should be provided to the `embedding` parameter.
- An implementation of the [ `SparseEmbeddings` ](https://github.com/langchain-ai/langchain/blob/master/libs/partners/qdrant/langchain_qdrant/sparse_embeddings.py) interface using any sparse embeddings provider has to be provided as value to the `sparse_embedding` parameter.

Note that if you've added documents with the `HYBRID` mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection.

In [49]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode
from langchain_openai import OpenAIEmbeddings

query = "What is the significance of the rose in The Little Prince?"

embedding = OpenAIEmbeddings(model="text-embedding-3-large")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

# Initialize QdrantVectorStore
vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embedding,
    sparse_embedding=sparse_embeddings,
    url=url,
    api_key=api_key,
    collection_name="hybrid_collection",
    retrieval_mode=RetrievalMode.HYBRID,
)

# Perform similarity search in the vector store
results = vector_store.similarity_search(
    query=query,
    k=3,
)

for res in results:
    print(f"* {res.page_content[:200]}\n [{res.metadata}]\n\n")

* "Go and look again at the roses. You will understand now that yours is unique in all the world. Then come back to say goodbye to me, and I will make you a present of a secret." 
The little prince went
 [{'source': './data/the_little_prince.txt', '_id': '32ad860b-6540-4e4a-a069-4d3c379fdeef', '_collection_name': 'hybrid_collection'}]


* [ Chapter 20 ]
- the little prince discovers a garden of roses
But it happened that after walking for a long time through sand, and rocks, and snow, the little prince at last came upon a road. And all
 [{'source': './data/the_little_prince.txt', '_id': 'fceaaa24-8e21-4e22-9733-2587cf648ac8', '_collection_name': 'hybrid_collection'}]


* [ Chapter 8 ]
- the rose arrives at the little prince‘s planet
 [{'source': './data/the_little_prince.txt', '_id': '7de49cb2-7e0e-41c9-ad2a-6cccbb6daa5d', '_collection_name': 'hybrid_collection'}]


