# Elasticsearch

- Author: [liniar](https://github.com/namyoungkim)
- Design: 
- Peer Review: 
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/09-VectorStore/06-Elasticsearch.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/09-VectorStore/06-Elasticsearch.ipynb)


## Overview  
- This tutorial is designed for beginners to get started with Elasticsearch and its integration with LangChain.
- You’ll learn how to set up the environment, prepare data, and explore advanced search features like hybrid and semantic search.
- By the end, you’ll be equipped to use Elasticsearch for powerful and intuitive search applications.

### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Elasticsearch Setup](#elasticsearch-setup)
- [Introduction to Elasticsearch](#introduction-to-elasticsearch)
- [ElasticsearchManager](#elasticsearchmanager)
- [Data Preparation for Tutorial](#data-preparation-for-tutorial)
- [Initialization](#initialization)
- [DB Handling](#db-handling)
- [Advanced Search](#advanced-search)
- [Managing Elasticsearch Connections and Documents](#managing-elasticsearch-connections-and-documents)

### References
- [LangChain VectorStore Documentation](https://python.langchain.com/docs/how_to/vectorstores/)
- [LangChain Elasticsearch Integration](https://python.langchain.com/docs/integrations/vectorstores/elasticsearch/)
- [Elasticsearch Official Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/index.html)  
- [Elasticsearch Vector Search Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html)
----

## Environment Setup  

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.  

**[Note]**  
- `langchain-opentutorial` is a package that provides a set of **easy-to-use environment setup,** **useful functions,** and **utilities for tutorials.**  
- You can check out the [`langchain-opentutorial` ](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.  


### 🛠️ **The following configurations will be set up**  

- **Jupyter Notebook Output Settings**
    - Display standard error ( `stderr` ) messages directly instead of capturing them.  
- **Install Required Packages** 
    - Ensure all necessary dependencies are installed.  
- **API Key Setup** 
    - Configure the API key for authentication.  
- **PyTorch Device Selection Setup** 
    - Automatically select the optimal computing device (CPU, CUDA, or MPS).
        - `{"device": "mps"}` : Perform embedding calculations using **MPS** instead of GPU. (For Mac users)
        - `{"device": "cuda"}` : Perform embedding calculations using **GPU.** (For Linux and Windows users, requires CUDA installation)
        - `{"device": "cpu"}` : Perform embedding calculations using **CPU.** (Available for all users)
- **Embedding Model Local Storage Path** 
    - Define a local path for storing embedding models.  

## Elasticsearch Setup
- In order to use the Elasticsearch vector search you must install the langchain-elasticsearch package.

### 🚀 Setting Up Elasticsearch with Elastic Cloud (Colab Compatible)
- Elastic Cloud allows you to manage Elasticsearch seamlessly in the cloud, eliminating the need for local installations.
- It integrates well with Google Colab, enabling efficient experimentation and prototyping.


### 📚 What is Elastic Cloud?  
- **Elastic Cloud** is a managed Elasticsearch service provided by Elastic.  
- Supports **custom cluster configurations** and **auto-scaling.** 
- Deployable on **AWS**, **GCP**, and **Azure.**  
- Compatible with **Google Colab,** allowing simplified cloud-based workflows.  

### 📌 Getting Started with Elastic Cloud  
1. **Sign up for Elastic Cloud’s Free Trial.**  
    - [Free Trial](https://cloud.elastic.co/registration?utm_source=langchain&utm_content=documentation)
2. **Create an Elasticsearch Cluster.**  
3. **Retrieve your Elasticsearch URL** and **Elasticsearch API Key** from the Elastic Cloud Console.  
4. Add the following to your `.env` file
    > ```
    > ES_URL=https://my-elasticsearch-project-abd...:123
    > ES_API_KEY=bk9X...
    > ```
---

In [1]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [2]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain-core",
        "langchain_huggingface",
        "langchain_elasticsearch",
        "langchain_text_splitters",
        "elasticsearch",
        "python-dotenv",
        "uuid",
        "torch",
    ],
    verbose=False,
    upgrade=False,
)


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# Set environment variables
from dotenv import load_dotenv
from langchain_opentutorial import set_env

# Attempt to load environment variables from a .env file; if unsuccessful, set them manually.
if not load_dotenv():
    set_env(
        {
            "OPENAI_API_KEY": "",
            "LANGCHAIN_API_KEY": "",
            "LANGCHAIN_TRACING_V2": "true",
            "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
            "LANGCHAIN_PROJECT": "Elasticsearch",
            "HUGGINGFACEHUB_API_TOKEN": "",
            "ES_URL": "",
            "ES_API_KEY": "",
        }
    )

In [4]:
# Automatically select the appropriate device
import torch
import platform


def get_device():
    if platform.system() == "Darwin":  # macOS specific
        if hasattr(torch.backends, "mps") and torch.backends.mps.is_available():
            print("✅ Using MPS (Metal Performance Shaders) on macOS")
            return "mps"
    if torch.cuda.is_available():
        print("✅ Using CUDA (NVIDIA GPU)")
        return "cuda"
    else:
        print("✅ Using CPU")
        return "cpu"


# Set the device
device = get_device()
print("🖥️ Current device in use:", device)

✅ Using MPS (Metal Performance Shaders) on macOS
🖥️ Current device in use: mps


In [5]:
# Embedding Model Local Storage Path
import os
import warnings

# Ignore warnings
warnings.filterwarnings("ignore")

# Set the download path to ./cache/
os.environ["HF_HOME"] = "./cache/"

## Introduction to Elasticsearch
- Elasticsearch is an open-source, distributed search and analytics engine designed to store, search, and analyze both structured and unstructured data in real-time.

### 📌 Key Features  
- **Real-Time Search:** Instantly searchable data upon ingestion  
- **Large-Scale Data Processing:** Efficient handling of vast datasets  
- **Scalability:** Flexible scaling through clustering and distributed architecture  
- **Versatile Search Support:** Keyword search, semantic search, and multimodal search  

### 📌 Use Cases  
- **Log Analytics:** Real-time monitoring of system and application logs  
- **Monitoring:** Server and network health tracking  
- **Product Recommendations:** Behavior-based recommendation systems  
- **Natural Language Processing (NLP):** Semantic text searches  
- **Multimodal Search:** Text-to-image and image-to-image searches  

### 🧠 Vector Database Functionality in Elasticsearch  
- Elasticsearch supports vector data storage and similarity search via **Dense Vector Fields.** As a vector database, it excels in applications like NLP, image search, and recommendation systems.

### 📌 Core Vector Database Features  
- **Dense Vector Field:** Store and query high-dimensional vectors  
- **KNN (k-Nearest Neighbors) Search:** Find vectors most similar to the input  
- **Semantic Search:** Perform meaning-based searches beyond keyword matching  
- **Multimodal Search:** Combine text and image data for advanced search capabilities  

### 📌 Vector Search Use Cases  
- **Semantic Search:** Understand user intent and deliver precise results  
- **Text-to-Image Search:** Retrieve relevant images from textual descriptions  
- **Image-to-Image Search:** Find visually similar images in a dataset  

### 🔗 Official Documentation Links  
- [Elasticsearch Official Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/index.html)  
- [Elasticsearch Vector Search Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html)  

Elasticsearch goes beyond traditional text search engines, offering robust vector database capabilities essential for NLP and multimodal search applications. 🚀

---

## ElasticsearchManager
- `Purpose:` Simplifies interactions with Elasticsearch, allowing easy management of indices and documents through user-friendly methods.
- `Core Features` 
	- `Index management:` create, delete, and manage indices.
	- `Document operations:` upsert, retrieve, search, and delete documents.
	- `Bulk and parallel operations:` perform upserts in bulk or in parallel for high performance.

### Methods and Parameters

1. `__init__` 
	- Role: Initializes the ElasticsearchManager instance and connects to the Elasticsearch cluster.
	- Parameters
		- `es_url` (str): The URL of the Elasticsearch host (default: "http://localhost:9200").
		- `api_key` (Optional[str]): The API key for authentication (default: None).
	- Behavior
		- Establishes a connection to Elasticsearch.
		- Tests the connection using ping() and raises a ConnectionError if it fails.
	- Usage Example
		>```python
		>es_manager = ElasticsearchManager(es_url="http://localhost:9200")
		>```

2. `create_index` 
	- Role: Creates an Elasticsearch index with optional mappings and settings.
	- Parameters
		- `index_name` (str): The name of the index to create.
		- `mapping` (Optional[Dict]): A dictionary defining the index structure (field types, properties, etc.).
		- `settings` (Optional[Dict]): A dictionary defining index settings (e.g., number of shards, replicas).
	- Behavior
		- Checks if the index exists.
		- If the index does not exist, creates it using the provided mappings and settings.
	- Returns: A string message indicating success or failure.
	- Usage Example
		>```python
		>mapping = {"properties": {"name": {"type": "text"}}}
		>settings = {"number_of_shards": 1}
		>es_manager.create_index("my_index", mapping=mapping, settings=settings)
		>```

3. `delete_index` 
	- Role: Deletes an Elasticsearch index if it exists.
	- Parameters
		- `index_name` (str): The name of the index to delete.
	- Behavior
		- Checks if the index exists.
		- Deletes the index if it exists.
	- Returns: A string message indicating success or failure.
	- Usage Example
		>```python
		>es_manager.delete_index("my_index")
		```

4. `get_document` 
	- Role: Retrieves a single document by its ID.
	- Parameters
		- `index_name` (str): The name of the index to retrieve the document from.
		- `document_id` (str): The ID of the document to retrieve.
	- Behavior
		- Fetches the document using its ID.
		- Returns the _source field of the document (its contents).
	- Returns: The document contents (Dict) if found, otherwise None.
	- Usage Example
		>```python
		>document = es_manager.get_document("my_index", "1")
		>```

5. `search_documents` 
	- Role: Searches for documents in an index based on a query.
	- Parameters
		- `index_name` (str): The name of the index to search.
		- `query` (Dict): A query in Elasticsearch DSL format.
	- Behavior
		- Executes the query against the specified index.
		- Returns the _source field of all matching documents.
	- Returns: A list of matching documents (List[Dict]).
	- Usage Example
		>```python
		>query = {"match": {"name": "John"}}
		>results = es_manager.search_documents("my_index", query=query)
		>```
		
6. `upsert_document` 
	- Role: Inserts or updates a document by its ID.
	- Parameters
		- `index_name` (str): The index to perform the upsert on.
		- `document_id` (str): The ID of the document to upsert.
		- `document` (Dict): The content of the document.
	- Behavior
		- Updates the document if it exists or creates it if it does not.
		- Returns: The Elasticsearch response (Dict).
	- Usage Example
		>```python
		>document = {"name": "Alice", "age": 30}
		>es_manager.upsert_document("my_index", "1", document)
		>```

7. `bulk_upsert` 
	- Role: Performs a bulk upsert operation for multiple documents.
	- Parameters
		- `documents` (List[Dict]): A list of documents for the bulk operation.
			- Each document should specify _index, _id, _op_type, and doc_as_upsert.
	- Behavior
		- Uses Elasticsearch’s bulk API to upsert multiple documents in a single request.
	- Usage Example
		>```python
		>docs = [
		>	{"_index": "my_index", "_id": "1", "_op_type": "update", "doc": {"name": "Alice"}, "doc_as_upsert": True},
		>	{"_index": "my_index", "_id": "2", "_op_type": "update", "doc": {"name": "Bob"}, "doc_as_upsert": True}
		>]
		>es_manager.bulk_upsert(docs)
		>```

8. `parallel_bulk_upsert` 
	- Role: Performs a parallelized bulk upsert operation for large datasets.
	- Parameters
		- `documents` (List[Dict]): A list of documents for bulk upserts.
		- `batch_size` (int): Number of documents per batch (default: 100).
		- `max_workers` (int): Number of threads to use for parallel processing (default: 4).
	- Behavior
		- Splits the documents into batches and processes them in parallel using threads.
	- Usage Example
		>```python
		>es_manager.parallel_bulk_upsert(docs, batch_size=50, max_workers=4)
		>```

9. `delete_document` 
	- Role: Deletes a single document by its ID.
	- Parameters
		- `index_name` (str): The index containing the document.
		- `document_id` (str): The ID of the document to delete.
	- Behavior
		- Deletes the specified document using its ID.
	- Returns: The Elasticsearch response (Dict).
	- Usage Example
		>```python
		>es_manager.delete_document("my_index", "1")
		>```

10. `delete_by_query` 
	- Role: Deletes all documents that match a query.
	- Parameters
		- `index_name` (str): The index to delete documents from.
		- `query` (Dict): The query defining the documents to delete.
	- Behavior
		- Uses Elasticsearch’s delete_by_query API to remove documents matching the query.
	- Returns: The Elasticsearch response (Dict).
	- Usage Example
		>```python
		>delete_query = {"match": {"status": "inactive"}}
		>es_manager.delete_by_query("my_index", query=delete_query)
		>```

### Conclusion
- This class provides a robust and user-friendly interface to manage Elasticsearch operations.
- It encapsulates common tasks like creating indices, searching for documents, and performing upserts, making it ideal for use in data management pipelines or applications.

In [6]:
from typing import Optional, Dict, List, Generator
from elasticsearch import Elasticsearch, helpers
from concurrent.futures import ThreadPoolExecutor


class ElasticsearchManager:
    def __init__(
        self, es_url: str = "http://localhost:9200", api_key: Optional[str] = None
    ) -> None:
        """
        Initialize the ElasticsearchManager with a connection to the Elasticsearch instance.

        Parameters:
            es_url (str): URL of the Elasticsearch host.
            api_key (Optional[str]): API key for authentication (optional).
        """
        # Initialize the Elasticsearch client
        if api_key:
            self.es = Elasticsearch(
                es_url, api_key=api_key, timeout=120, retry_on_timeout=True
            )
        else:
            self.es = Elasticsearch(es_url, timeout=120, retry_on_timeout=True)

        # Test connection
        if self.es.ping():
            print("✅ Successfully connected to Elasticsearch!")
        else:
            raise ConnectionError("❌ Failed to connect to Elasticsearch.")

    def create_index(
        self,
        index_name: str,
        mapping: Optional[Dict] = None,
        settings: Optional[Dict] = None,
    ) -> str:
        """
        Create an Elasticsearch index with optional mapping and settings.

        Parameters:
            index_name (str): Name of the index to create.
            mapping (Optional[Dict]): Mapping definition for the index.
            settings (Optional[Dict]): Settings definition for the index.

        Returns:
            str: Success or warning message.
        """
        try:
            if not self.es.indices.exists(index=index_name):
                body = {}
                if mapping:
                    body["mappings"] = mapping
                if settings:
                    body["settings"] = settings
                self.es.indices.create(index=index_name, body=body)
                return f"✅ Index '{index_name}' created successfully."
            else:
                return f"⚠️ Index '{index_name}' already exists. Skipping creation."
        except Exception as e:
            return f"❌ Error creating index '{index_name}': {e}"

    def delete_index(self, index_name: str) -> str:
        """
        Delete an Elasticsearch index if it exists.

        Parameters:
            index_name (str): Name of the index to delete.

        Returns:
            str: Success or warning message.
        """
        try:
            if self.es.indices.exists(index=index_name):
                self.es.indices.delete(index=index_name)
                return f"✅ Index '{index_name}' deleted successfully."
            else:
                return f"⚠️ Index '{index_name}' does not exist."
        except Exception as e:
            return f"❌ Error deleting index '{index_name}': {e}"

    def get_document(self, index_name: str, document_id: str) -> Optional[Dict]:
        """
        Retrieve a single document by its ID.

        Parameters:
            index_name (str): The index to retrieve the document from.
            document_id (str): The ID of the document to retrieve.

        Returns:
            Optional[Dict]: The document's content if found, None otherwise.
        """
        try:
            response = self.es.get(index=index_name, id=document_id)
            return response["_source"]
        except Exception as e:
            print(f"❌ Error retrieving document: {e}")
            return None

    def search_documents(self, index_name: str, query: Dict) -> List[Dict]:
        """
        Search for documents based on a query.

        Parameters:
            index_name (str): The index to search.
            query (Dict): The query body for the search.

        Returns:
            List[Dict]: List of documents that match the query.
        """
        try:
            response = self.es.search(index=index_name, body={"query": query})
            return [hit["_source"] for hit in response["hits"]["hits"]]
        except Exception as e:
            print(f"❌ Error searching documents: {e}")
            return []

    def upsert_document(
        self, index_name: str, document_id: str, document: Dict
    ) -> Dict:
        """
        Perform an upsert operation on a single document.

        Parameters:
            index_name (str): The index to perform the upsert on.
            document_id (str): The ID of the document.
            document (Dict): The document content to upsert.

        Returns:
            Dict: The response from Elasticsearch.
        """
        try:
            response = self.es.update(
                index=index_name,
                id=document_id,
                body={"doc": document, "doc_as_upsert": True},
            )
            return response
        except Exception as e:
            print(f"❌ Error upserting document: {e}")
            return {}

    def bulk_upsert(
        self, index_name: str, documents: List[Dict], timeout: Optional[str] = None
    ) -> None:
        """
        Perform a bulk upsert operation.

        Parameters:
            index (str): Default index name for the documents.
            documents (List[Dict]): List of documents for bulk upsert.
            timeout (Optional[str]): Timeout duration (e.g., '60s', '2m'). If None, the default timeout is used.
        """
        try:
            # Ensure each document includes an `_index` field
            for doc in documents:
                if "_index" not in doc:
                    doc["_index"] = index_name

            # Perform the bulk operation
            helpers.bulk(self.es, documents, timeout=timeout)
            print("✅ Bulk upsert completed successfully.")
        except Exception as e:
            print(f"❌ Error in bulk upsert: {e}")

    def parallel_bulk_upsert(
        self,
        index_name: str,
        documents: List[Dict],
        batch_size: int = 100,
        max_workers: int = 4,
        timeout: Optional[str] = None,
    ) -> None:
        """
        Perform a parallel bulk upsert operation.

        Parameters:
            index_name (str): Default index name for documents.
            documents (List[Dict]): List of documents for bulk upsert.
            batch_size (int): Number of documents per batch.
            max_workers (int): Number of parallel threads.
            timeout (Optional[str]): Timeout duration (e.g., '60s', '2m'). If None, the default timeout is used.
        """

        def chunk_data(
            data: List[Dict], chunk_size: int
        ) -> Generator[List[Dict], None, None]:
            """Split data into chunks."""
            for i in range(0, len(data), chunk_size):
                yield data[i : i + chunk_size]

        # Ensure each document has an `_index` field
        for doc in documents:
            if "_index" not in doc:
                doc["_index"] = index_name

        batches = list(chunk_data(documents, batch_size))

        def bulk_upsert_batch(batch: List[Dict]):
            helpers.bulk(self.es, batch, timeout=timeout)

        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            for batch in batches:
                executor.submit(bulk_upsert_batch, batch)

    def delete_document(self, index_name: str, document_id: str) -> Dict:
        """
        Delete a single document by its ID.

        Parameters:
            index_name (str): The index to delete the document from.
            document_id (str): The ID of the document to delete.

        Returns:
            Dict: The response from Elasticsearch.
        """
        try:
            response = self.es.delete(index=index_name, id=document_id)
            return response
        except Exception as e:
            print(f"❌ Error deleting document: {e}")
            return {}

    def delete_by_query(self, index_name: str, query: Dict) -> Dict:
        """
        Delete documents based on a query.

        Parameters:
            index_name (str): The index to delete documents from.
            query (Dict): The query body for the delete operation.

        Returns:
            Dict: The response from Elasticsearch.
        """
        try:
            response = self.es.delete_by_query(
                index=index_name, body={"query": query}, conflicts="proceed"
            )
            return response
        except Exception as e:
            print(f"❌ Error deleting documents by query: {e}")
            return {}

## Data Preparation for Tutorial
- Let’s process **The Little Prince** using the `RecursiveCharacterTextSplitter` to create document chunks.
- Then, we’ll generate embeddings for each text chunk and store the resulting data in a vector database to proceed with a vector database tutorial.

In [7]:
from langchain_text_splitters import RecursiveCharacterTextSplitter


# Function to read text from a file (Cross-Platform)
def read_text_file(file_path):
    try:
        with open(file_path, encoding="utf-8") as f:
            # Normalize line endings (compatible with Windows, macOS, Linux)
            raw_text = f.read().replace("\r\n", "\n").replace("\r", "\n")
        return raw_text
    except UnicodeDecodeError as e:
        raise ValueError(f"Failed to decode the file with UTF-8 encoding: {e}")
    except FileNotFoundError:
        raise FileNotFoundError(f"The specified file was not found: {file_path}")


# Function to split the text into chunks
def split_text(raw_text, chunk_size=100, chunk_overlap=20):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,  # Default string length function
        is_separator_regex=False,  # Default separator setting
    )
    split_docs = text_splitter.create_documents([raw_text])
    return [doc.page_content for doc in split_docs]


# Set file path and execute
file_path = "./data/the_little_prince.txt"
try:
    # Read the file
    raw_text = read_text_file(file_path)
    # Split the text
    docs = split_text(raw_text)

    # Verify output
    print(docs[:2])  # Print the first 5 chunks
    print(f"Total number of chunks: {len(docs)}")
except Exception as e:
    print(f"Error occurred: {e}")

['The Little Prince\nWritten By Antoine de Saiot-Exupery (1900〜1944)', '[ Antoine de Saiot-Exupery ]']
Total number of chunks: 1359


In [8]:
%%time

## text embedding
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

model_name = "intfloat/multilingual-e5-large-instruct"

hf_embeddings_e5_instruct = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs={"device": device},  # mps, cuda, cpu
    encode_kwargs={"normalize_embeddings": True},
)

embedded_documents = hf_embeddings_e5_instruct.embed_documents(docs)

print(len(embedded_documents))
print(len(embedded_documents[0]))

1359
1024
CPU times: user 7.81 s, sys: 2.37 s, total: 10.2 s
Wall time: 17.9 s


In [9]:
from uuid import uuid4
from typing import List, Tuple, Dict


def prepare_documents_with_ids(
    docs: List[str], embedded_documents: List[List[float]]
) -> Tuple[List[Dict], List[str]]:
    """
    Prepare a list of documents with unique IDs and their corresponding embeddings.

    Parameters:
        docs (List[str]): List of document texts.
        embedded_documents (List[List[float]]): List of embedding vectors corresponding to the documents.

    Returns:
        Tuple[List[Dict], List[str]]: A tuple containing:
            - List of document dictionaries with `doc_id`, `text`, and `vector`.
            - List of unique document IDs (`doc_ids`).
    """
    # Generate unique IDs for each document
    doc_ids = [str(uuid4()) for _ in range(len(docs))]

    # Prepare the document list with IDs, texts, and embeddings
    documents = [
        {"doc_id": doc_id, "text": doc, "vector": embedding}
        for doc, doc_id, embedding in zip(docs, doc_ids, embedded_documents)
    ]

    return documents, doc_ids

In [10]:
documents, doc_ids = prepare_documents_with_ids(docs, embedded_documents)

## Initialization
### Setting Up the Elasticsearch Client
- Begin by creating an Elasticsearch client.

In [11]:
import os

# Load environment variables
ES_URL = os.environ["ES_URL"]  # Elasticsearch host URL
ES_API_KEY = os.environ["ES_API_KEY"]  # Elasticsearch API key

# Ensure required environment variables are set
if not ES_URL or not ES_API_KEY:
    raise ValueError("Both ES_URL and ES_API_KEY must be set in environment variables.")

In [12]:
es_manager = ElasticsearchManager(es_url=ES_URL, api_key=ES_API_KEY)

✅ Successfully connected to Elasticsearch!


## DB Handling
### Create index
- Use the index method to create a new document.

In [13]:
# create index
index_name = "langchain_tutorial_es"

# vector dimension
dims = len(embedded_documents[0])


# 🛠️ Define the mapping for the new index
# This structure specifies the schema for documents stored in Elasticsearch
mapping = {
    "properties": {
        "metadata": {"properties": {"doc_id": {"type": "keyword"}}},
        "text": {"type": "text"},  # Field for storing textual content
        "vector": {  # Field for storing vector embeddings
            "type": "dense_vector",  # Specifies dense vector type
            "dims": dims,  # Number of dimensions in the vector
            "index": True,  # Enable indexing for vector search
            "similarity": "cosine",  # Use cosine similarity for vector comparisons
        },
    }
}

In [14]:
es_manager.create_index(index_name, mapping=mapping)

"✅ Index 'langchain_tutorial_es' created successfully."

### Delete index
- You can delete an index as follows

In [15]:
## delete index
es_manager.delete_index(index_name)

"✅ Index 'langchain_tutorial_es' deleted successfully."

### Upsert
- Let’s perform an upsert operation for **a single document.** 

In [16]:
# Let’s upsert a single document.

es_manager.upsert_document(index_name, doc_ids[0], documents[0])

ObjectApiResponse({'_index': 'langchain_tutorial_es', '_id': 'ac440534-f1b1-4e3a-a39e-971bc657272e', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1})

### Read
- Retrieve the upserted data using its `doc_id`  

In [17]:
# get_document
result = es_manager.get_document(index_name, doc_ids[0])
print(result["doc_id"])
print(result["text"])

ac440534-f1b1-4e3a-a39e-971bc657272e
The Little Prince
Written By Antoine de Saiot-Exupery (1900〜1944)


### Delete
- Delete using the `doc_id` 

In [18]:
# delete_document
es_manager.delete_document(index_name, doc_ids[0])

ObjectApiResponse({'_index': 'langchain_tutorial_es', '_id': 'ac440534-f1b1-4e3a-a39e-971bc657272e', '_version': 2, 'result': 'deleted', '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 1, '_primary_term': 1})

### Bulk Upsert
- Perform a bulk upsert of documents.
- In general, **“bulk”** refers to something large in quantity or volume, often handled or processed all at once.
- For example, “bulk operations” involve managing multiple items simultaneously.

In [19]:
%%time

es_manager.bulk_upsert(index_name, documents)

✅ Bulk upsert completed successfully.
CPU times: user 663 ms, sys: 83.4 ms, total: 746 ms
Wall time: 15.6 s


### Parallel Bulk Upsert
- Perform a bulk upsert of documents in parallel.
- **“parallel”** refers to tasks or processes happening at the same time or simultaneously, often independently of one another.

In [20]:
%%time

# parallel_bulk_upsert
es_manager.parallel_bulk_upsert(index_name, documents, batch_size=100, max_workers=8)

CPU times: user 748 ms, sys: 58.1 ms, total: 806 ms
Wall time: 5.79 s


- It is evident that parallel_bulk_upsert is **faster.** 

### Read (Document Retrieval)
- Retrieve documents based on specific values.

In [21]:
# search_documents
query = {"match": {"doc_id": doc_ids[0]}}
results = es_manager.search_documents(index_name, query=query)

print(len(results))
print(results[0]["doc_id"])
print(results[0]["text"])

4
ac440534-f1b1-4e3a-a39e-971bc657272e
The Little Prince
Written By Antoine de Saiot-Exupery (1900〜1944)


### Delete
- Delete documents based on specific values.

In [22]:
# delete_by_query
delete_query = {"match": {"doc_id": doc_ids[0]}}
es_manager.delete_by_query(index_name, query=delete_query)

ObjectApiResponse({'took': 8, 'timed_out': False, 'total': 4, 'deleted': 4, 'batches': 1, 'version_conflicts': 0, 'noops': 0, 'retries': {'bulk': 0, 'search': 0}, 'throttled_millis': 0, 'requests_per_second': -1.0, 'throttled_until_millis': 0, 'failures': []})

- Delete all documents.

In [23]:
# delete_by_query
delete_query = {"match_all": {}}
es_manager.delete_by_query(index_name, query=delete_query)

ObjectApiResponse({'took': 328, 'timed_out': False, 'total': 2718, 'deleted': 2714, 'batches': 3, 'version_conflicts': 4, 'noops': 0, 'retries': {'bulk': 0, 'search': 0}, 'throttled_millis': 0, 'requests_per_second': -1.0, 'throttled_until_millis': 0, 'failures': []})

## Advanced Search
- **Keyword Search**  
    - This method matches documents that contain the exact keyword in their text field.
    - It performs a straightforward text-based search using Elasticsearch's `match` query.

- **Semantic Search**  
    - Semantic search leverages embeddings to find documents based on their contextual meaning rather than exact text matches.
    - It uses a pre-trained model (`hf_embeddings_e5_instruct`) to encode both the query and the documents into vector representations and retrieves the most similar results.

- **Hybrid Search**  
    - Hybrid search combines both keyword search and semantic search to provide more comprehensive results.
    - It uses a filtering mechanism to ensure documents meet specific keyword criteria while scoring and ranking results based on their semantic similarity to the query.  


In [24]:
%%time

# parallel_bulk_upsert
es_manager.parallel_bulk_upsert(index_name, documents, batch_size=100, max_workers=8)

CPU times: user 725 ms, sys: 72.4 ms, total: 798 ms
Wall time: 8.73 s


In [25]:
# keyword search

keyword = "fox"

query = {"match": {"text": keyword}}
results = es_manager.search_documents(index_name, query=query)

for idx_, result in enumerate(results):
    if idx_ < 3:
        print(idx_, " :", result["text"])

0  : "I am a fox," said the fox.
1  : "Good morning," said the fox.
2  : [ Chapter 21 ]
- the little prince befriends the fox
It was then that the fox appeared.


In [26]:
from langchain_elasticsearch import ElasticsearchStore

# Initialize ElasticsearchStore
vector_store = ElasticsearchStore(
    index_name=index_name,  # Elasticsearch index name
    embedding=hf_embeddings_e5_instruct,  # Object responsible for text embeddings
    es_url=ES_URL,  # Elasticsearch host URL
    es_api_key=ES_API_KEY,  # Elasticsearch API key for authentication
)

In [27]:
# Execute Semantic Search
search_query = "Who are the Little Prince’s friends?"
results = vector_store.similarity_search(search_query, k=3)

print("🔍 Question: ", search_query)
print("🤖 Semantic Search Results:")
for result in results:
    print(f"- {result.page_content}")

🔍 Question:  Who are the Little Prince’s friends?
🤖 Semantic Search Results:
- "Who are you?" said the little prince.
- "Then what?" asked the little prince.
- And the little prince asked himself:


In [28]:
# hybrid search with score
search_query = "Who are the Little Prince’s friends?"
keyword = "friend"


results = vector_store.similarity_search_with_score(
    query=search_query,
    k=1,
    filter=[{"term": {"text": keyword}}],
)

print("🔍 search_query: ", search_query)
print("🔍 keyword: ", keyword)

for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content}")

🔍 search_query:  Who are the Little Prince’s friends?
🔍 keyword:  friend
* [SIM=0.927550] "My friend the fox--" the little prince said to me.


- **It is evident that conducting a Hybrid Search significantly enhances search performance.**  

- This approach ensures that the search results are both contextually meaningful and aligned with the specified keyword constraint, making it especially useful in scenarios where both precision and context matter.

Remove a **Huggingface Cache** , `vector_store` , `embeddings` and `client` .

If you created a **vectordb** directory, please **remove** it at the end of this tutorial.

In [29]:
es_manager.delete_index(index_name)

from huggingface_hub import scan_cache_dir

del embedded_documents
del vector_store
del es_manager
scan = scan_cache_dir()
scan.delete_revisions()

DeleteCacheStrategy(expected_freed_size=0, blobs=frozenset(), refs=frozenset(), repos=frozenset(), snapshots=frozenset())

## Managing Elasticsearch Connections and Documents
### ElasticsearchConnectionManager
- The `ElasticsearchConnectionManager` is a class designed to manage connections to an Elasticsearch instance.
- It facilitates connecting to the Elasticsearch server and provides functionalities for creating and deleting indices.

In [30]:
from utils.elasticsearch_interface import ElasticsearchConnectionManager

In [31]:
index_name = "langchain_tutorial_es"

In [32]:
%%time

## text embedding
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

model_name = "intfloat/multilingual-e5-large-instruct"

hf_embeddings_e5_instruct = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs={"device": device},  # mps, cuda, cpu
    encode_kwargs={"normalize_embeddings": True},
)

embedded_documents = hf_embeddings_e5_instruct.embed_documents(docs)

print(len(embedded_documents))
print(len(embedded_documents[0]))

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: intfloat/multilingual-e5-large-instruct


1359
1024
CPU times: user 5.05 s, sys: 3.7 s, total: 8.75 s
Wall time: 15.5 s


In [33]:
# vector dimension
dims = len(embedded_documents[0])


# 🛠️ Define the mapping for the new index
# This structure specifies the schema for documents stored in Elasticsearch
mapping = {
    "properties": {
        "metadata": {"properties": {"doc_id": {"type": "keyword"}}},
        "text": {"type": "text"},  # Field for storing textual content
        "vector": {  # Field for storing vector embeddings
            "type": "dense_vector",  # Specifies dense vector type
            "dims": dims,  # Number of dimensions in the vector
            "index": True,  # Enable indexing for vector search
            "similarity": "cosine",  # Use cosine similarity for vector comparisons
        },
    }
}

you'll learn how to generate text embeddings for documents using a Hugging Face model.
- First, we'll set up a multilingual model with the `HuggingFaceEmbeddings` class and choose the optimal device (mps, cuda, or cpu) for computation.
- Then, we'll generate embeddings for a list of documents and print the results to ensure everything is working correctly.

The `ElasticsearchConnectionManager` class manages the connection to an Elasticsearch server.
- This instance uses the server URL, API key, embedding model, and index name to connect to Elasticsearch and initialize the vector store.

In [34]:
es_connection_manager = ElasticsearchConnectionManager(
    es_url=ES_URL,
    api_key=ES_API_KEY,
    embedding_model=hf_embeddings_e5_instruct,
    index_name=index_name,
)

INFO:elastic_transport.transport:HEAD https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/ [status:200 duration:0.570s]
INFO:utils.elasticsearch_interface:✅ Successfully connected to Elasticsearch!
INFO:elastic_transport.transport:GET https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/ [status:200 duration:0.559s]
INFO:utils.elasticsearch_interface:✅ Vector store initialized for index 'langchain_tutorial_es'.


In [35]:
## create index
es_connection_manager.create_index(index_name, mapping=mapping)

INFO:elastic_transport.transport:HEAD https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es [status:404 duration:0.186s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es [status:200 duration:0.293s]


"✅ Index 'langchain_tutorial_es' created successfully."

In [36]:
## delete index
es_connection_manager.delete_index(index_name)

INFO:elastic_transport.transport:HEAD https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es [status:200 duration:0.188s]
INFO:elastic_transport.transport:DELETE https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es [status:200 duration:0.227s]


"✅ Index 'langchain_tutorial_es' deleted successfully."

### ElasticsearchDocumentManager
- The `ElasticsearchDocumentManager` leverages the `ElasticsearchConnectionManager` to handle document management tasks.
- This class performs operations such as inserting, deleting, and searching documents, with the capability to enhance performance through parallel processing.

In [37]:
from utils.elasticsearch_interface import ElasticsearchDocumentManager

In [38]:
es_document_manager = ElasticsearchDocumentManager(
    connection_manager=es_connection_manager,
)

### Upsert
- The `upsert` method of the `es_document_manager` is used to insert or update documents in the specified Elasticsearch index.
- It takes the original texts, their corresponding embedded documents, and the index name to efficiently manage the document storage and retrieval process.

In [39]:
%%time

es_document_manager.upsert(
    texts=docs,
    embedded_documents=embedded_documents,
    index_name=index_name,
)

INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:5.236s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:5.579s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:3.754s]
INFO:utils.elasticsearch_interface:✅ Bulk upsert completed successfully.


CPU times: user 697 ms, sys: 109 ms, total: 806 ms
Wall time: 15.4 s


In [40]:
es_document_manager.delete(index_name=index_name)

INFO:elastic_transport.transport:POST https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_delete_by_query?conflicts=proceed [status:200 duration:0.360s]


### Upsert_parallel
- The `upsert_parallel` method of the `es_document_manager` facilitates the parallel insertion or updating of documents in the specified Elasticsearch index.
- It processes the documents in batches of 100, utilizing up to 8 workers to enhance performance and efficiency in managing large datasets.

In [41]:
%%time

es_document_manager.upsert_parallel(
    index_name=index_name,
    texts=docs,
    embedded_documents=embedded_documents,
    batch_size=100,
    max_workers=8,
)

INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:1.375s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:2.422s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:1.365s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:2.883s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:3.399s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 duration:1.332s]
INFO:elastic_transport.transport:PUT https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/_bulk [status:200 du

CPU times: user 892 ms, sys: 89.8 ms, total: 981 ms
Wall time: 11.8 s


- It is evident that parallel_upsert is **faster.** 

### Search
- The code performs a search query, "Who are the Little Prince’s friends?", using the `es_document_manager` to retrieve relevant documents from the specified Elasticsearch index.
- It fetches the top 10 results, then prints the query and each result in a formatted manner for easy review.

In [42]:
search_query = "Who are the Little Prince’s friends?"

results = es_document_manager.search(index_name=index_name, query=search_query, k=10)

print("================================================")
print("🔍 Question: ", search_query)
print("================================================")
for idx_, result in enumerate(results):
    print(idx_, " :", result)

INFO:elastic_transport.transport:POST https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_search [status:200 duration:0.726s]


🔍 Question:  Who are the Little Prince’s friends?
0  : people. For some, who are travelers, the stars are guides. For others they are no more than little
1  : no more than little lights in the sky. For others, who are scholars, they are problems . For my
2  : "Forget what?" inquired the little prince, who already was sorry for him.
3  : "Ashamed of what?" insisted the little prince, who wanted to help him.
4  : "Where are the men?" the little prince asked, politely.
5  : But certainly, for us who understand life, figures are a matter of indifference. I should have
6  : "I know some one," said the little prince, "who would make a bad explorer."
7  : "They are in a great hurry," said the little prince. "What are they looking for?"
8  : "Are they pursuing the first travelers?" demanded the little prince.
9  : a friend. And if I forget him, I may become like the grown-ups who are no longer interested in


Retrieves the top 10 relevant documents using similarity-based matching.

In [43]:
search_query = "Who are the Little Prince’s friends?"
results = es_document_manager.search(query=search_query, k=10, use_similarity=True)

print("================================================")
print("🔍 Question: ", search_query)
print("================================================")
for idx_, result in enumerate(results):
    print(idx_, " :", result)

INFO:elastic_transport.transport:POST https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_search?_source_includes=metadata,text [status:200 duration:0.370s]
INFO:utils.elasticsearch_interface:✅ Found 10 similar documents.


🔍 Question:  Who are the Little Prince’s friends?
0  : "Who are you?" said the little prince.
1  : "Then what?" asked the little prince.
2  : And the little prince asked himself:
3  : "Why is that?" asked the little prince.
4  : But the little prince was wondering... The planet was tiny. Over what could this king really rule?
5  : "What do you do here?" the little prince asked.
6  : "Where are the men?" the little prince asked, politely.
7  : "No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?"
8  : [ Chapter 13 ]
- the little prince visits the businessman
9  : But the little prince added:


This code performs a search for the query "Who are the Little Prince’s friends?" while also filtering results based on the keyword "friend," retrieving the top 10 relevant documents and printing their content alongside additional information.

In [44]:
search_query = "Who are the Little Prince’s friends?"
keyword = "friend"
results = es_document_manager.search(
    query=search_query, k=10, use_similarity=True, keyword=keyword
)

print("================================================")
print("🔍 Question: ", search_query)
print("================================================")
for idx_, contents in enumerate(results):
    print(idx_, " :", contents[0].page_content, contents[1])

INFO:elastic_transport.transport:POST https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_search?_source_includes=metadata,text [status:200 duration:0.187s]
INFO:utils.elasticsearch_interface:✅ Hybrid search completed. Found 10 results.


🔍 Question:  Who are the Little Prince’s friends?
0  : "My friend the fox--" the little prince said to me. 0.92783
1  : any more. If you want a friend, tame me..." 0.91324496
2  : My friend broke into another peal of laughter: "But where do you think he would go?" 0.9049506
3  : a grown-up. I have a serious reason: he is the best friend I have in the world. I have another 0.9047897
4  : He was only a fox like a hundred thousand other foxes. But I have made him my friend, and now he is 0.9018576
5  : a friend. And if I forget him, I may become like the grown-ups who are no longer interested in 0.89573324
6  : that you have known me. You will always be my friend. You will want to laugh with me. And you will 0.8953247
7  : "That man is the only one of them all whom I could have made my friend. But his planet is indeed 0.89472246
8  : to seek, in other days, merely by pulling up his chair; and he wanted to help his friend. 0.89299285
9  : sure that I shall not forget him. To forget a frien

### Read
- This code retrieves the IDs of all documents stored in the specified Elasticsearch index using the `get_documents_ids` method of the `es_document_manager`, and then prints the list of these document IDs for review.

In [45]:
ids = es_document_manager.get_documents_ids(index_name)
print(ids)

INFO:elastic_transport.transport:POST https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_search [status:200 duration:0.182s]


['Qe9o2ZQBg-BrVn24K_9B', 'Qu9o2ZQBg-BrVn24K_9B', 'Q-9o2ZQBg-BrVn24K_9B', 'RO9o2ZQBg-BrVn24K_9B', 'Re9o2ZQBg-BrVn24K_9B', 'Ru9o2ZQBg-BrVn24K_9B', 'R-9o2ZQBg-BrVn24K_9B', 'SO9o2ZQBg-BrVn24K_9B', 'Se9o2ZQBg-BrVn24K_9B', 'Su9o2ZQBg-BrVn24K_9B']


This code fetches documents from the specified Elasticsearch index using a list of document IDs, specifically retrieving the first 10 IDs. It then prints each document's ID along with its corresponding text for easy reference.

In [46]:
responses = es_document_manager.get_documents_by_ids(index_name, ids[:10])

for response in responses:
    print(response["doc_id"], ": ", response["text"])

INFO:elastic_transport.transport:POST https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_search [status:200 duration:0.721s]


36b5acf8-11d2-4015-bdf9-7e9346ccdc1b :  That night I did not see him set out on his way. He got away from me without making a sound. When I
2b2ebe1f-c662-4d30-8237-da1adddba50e :  a sound. When I succeeded in catching up with him he was walking along with a quick and resolute
40a590f1-1236-4e3b-a062-9e5e14b0b828 :  quick and resolute step. He said to me merely:
ac1cfd05-f2c8-4d4f-a17f-b11fb8947db5 :  "Ah! You are there..." 
And he took me by the hand. But he was still worrying.
79bd09e7-e25e-42ae-81d3-cab7ae6f9e83 :  "It was wrong of you to come. You will suffer. I shall look as if I were dead; and that will not be
d06709c4-f949-4ba6-a4da-8bb63973a610 :  that will not be true..."
6f1b7d37-3605-4bb1-992f-06cc685b0893 :  I said nothing.
9f273fad-0b55-4bf2-8f89-834558f3cd7d :  "You understand... it is too far. I cannot carry this body with me. It is too heavy."
8df8dc63-ff88-4852-a5f7-2f14936af78f :  I said nothing.
40a8bd69-19da-448c-9ac8-dce98fc087d5 :  "But it will be like an old aband

### Delete
- This code deletes documents from the specified Elasticsearch index using a list of document IDs, specifically retrieving the first 10 IDs. It then prints each document's ID along with its corresponding text for easy reference.

In [47]:
es_document_manager.delete(index_name=index_name, ids=ids[:10])

INFO:elastic_transport.transport:DELETE https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_doc/Qe9o2ZQBg-BrVn24K_9B [status:200 duration:0.187s]
INFO:elastic_transport.transport:DELETE https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_doc/Qu9o2ZQBg-BrVn24K_9B [status:200 duration:0.185s]
INFO:elastic_transport.transport:DELETE https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_doc/Q-9o2ZQBg-BrVn24K_9B [status:200 duration:0.186s]
INFO:elastic_transport.transport:DELETE https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_doc/RO9o2ZQBg-BrVn24K_9B [status:200 duration:0.188s]
INFO:elastic_transport.transport:DELETE https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_doc/Re9o2ZQBg-BrVn24K_9B [status:200 duration:0.184s]
INFO:elastic_transport.transport:DELETE https://e6

In [48]:
# Delete all documents
es_document_manager.delete(index_name=index_name)

INFO:elastic_transport.transport:POST https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es/_delete_by_query?conflicts=proceed [status:200 duration:0.343s]


In [49]:
## delete index
es_connection_manager.delete_index(index_name)

INFO:elastic_transport.transport:HEAD https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es [status:200 duration:0.181s]
INFO:elastic_transport.transport:DELETE https://e638d39188c94d828a30ae87af1733ce.us-central1.gcp.cloud.es.io:443/langchain_tutorial_es [status:200 duration:0.263s]


"✅ Index 'langchain_tutorial_es' deleted successfully."