# **Milvus Vector Database:**

## **Documentation:**


### **What Is Milvus?**

Milvus is an **open-source vector database** designed for efficient **`storage`**, **`management`**, and **`similarity search`** of high-dimensional embeddings, such as those generated from text, images, or audio. It's widely used in modern AI applications like **`RAG (Retrieval-Augmented Generation)`**, recommendation systems, and semantic search.


#### **Core Features & Advantages**

* **High Performance & Scalability**:
  Milvus supports billions of vectors with real-time performance, thanks to a distributed, cloud-native architecture with compute–storage separation.

* **Diverse Index Types**:
  Offers a variety of indexing options — like HNSW, IVFFlat, DiskANN, and GPU-accelerated indexes — to balance speed, accuracy, and storage needs.

* **Hybrid Search & Metadata Filtering**:
  Enables semantic (vector-based) and keyword or metadata filtering in the same query — essential for precise, context-rich retrieval in systems like RAG.

* **Flexible Deployment Options**:

  * **`Milvus Lite`**: Lightweight, embedded version, ideal for prototyping within a Python environment (**create simple file/db to store all data**).
  * **`Standalone`**: Self-hosted single-node setup for testing or small-scale workloads (**single/self-host docker container**).
    * **URI:** http://127.0.0.1:9091/webui/
  * **`Distributed`**: Full-scale, cloud-native deployment for enterprise use (**docker container with k8s**).
  * **`Managed Service (Zilliz Cloud)`**: Fully managed with autoscaling and simplified operations.

* **Rich Language & Tool Integration**:
  Official SDKs for Python, Java, Go, and community support for .NET enable seamless integration. Milvus also works well with frameworks like **LangChain**, **Haystack**, and **OpenAI**.


#### **Get Started with Milvus Lite (Python)**

Here's a compact guide to running Milvus in minutes using its lightweight version:

```python
    from pymilvus import MilvusClient

    # 1. Create a local Milvus database
    client = MilvusClient("milvus_demo.db")

    # 2. Create a collection (like a table) for vectors of dimension 768
    client.create_collection(collection_name="demo_collection", dimension=768)

    # 3. Insert vector data (e.g., embeddings of text or images)
    data = [
        {"id": 1, "vector": [0.1, 0.2, ..., 0.768], "metadata": {"text": "Hello World"}},
        # Add more records...
    ]
    client.insert(collection_name="demo_collection", data=data)

    # 4. Perform similarity search
    query_vectors = [[...]]  # Query embeddings
    results = client.search(
        collection_name="demo_collection",
        data=query_vectors,
        limit=5,
        output_fields=["vector", "metadata"]
    )
    print(results)
```

By following these steps, you can **quickly prototype Milvus** in Jupyter notebooks or Python applications.



#### **Why Choose Milvus?**

* Designed for **real-world AI workloads** that require vector similarity search at scale.
* Exists across a continuum — from **local prototyping** to **distributed production**.
* Equipped with **multiple index types and hybrid search**, supporting complex, nuanced queries.
* Embraced by a **strong open-source community**, with ongoing enhancements and support.



**In short**: Milvus is a robust, scalable, and flexible vector database, perfect for powering the next generation of AI applications — from smart chatbots to personalized recommendations.


### **Difference between Milvus and other vector databases:**

#### **1. Milvus**

* **Type**: Open-source, maintained by Zilliz (with cloud option).
* **Strengths**:

  * Highly **scalable** (handles billions of vectors).
  * Flexible — supports multiple index types (IVF, HNSW, DiskANN, etc.).
  * Can be self-hosted or used with **Zilliz Cloud**.
  * Integrates well with data lakes and big data pipelines.
* **Use case**: Large-scale enterprise RAG, AI search, multimodal embeddings storage.



#### **2. Pinecone**

* **Type**: Fully managed SaaS.
* **Strengths**:

  * No infra management (serverless).
  * Low-latency search (production-grade).
  * Handles automatic scaling.
  * Has metadata filtering (filter embeddings by tags, user, domain).
* **Limitations**:

  * Closed-source (not self-hostable).
  * Pricing increases with scale.
* **Use case**: Startups/companies that want to quickly build RAG/search **without DevOps hassle**.



####  **3. Weaviate**

* **Type**: Open-source + SaaS.
* **Strengths**:

  * Schema-based database (you can define object types).
  * Has built-in **hybrid search** (keyword + vector).
  * Integrates with Hugging Face / OpenAI to auto-generate embeddings.
  * Strong metadata filtering.
* **Limitations**:

  * More complex schema management.
* **Use case**: Semantic + hybrid search (good if you need structured + unstructured search together).



#### **4. Chroma**

* **Type**: Open-source, lightweight.
* **Strengths**:

  * Simple Python-first design.
  * Easy to integrate with LangChain.
* **Limitations**:

  * Not designed for **very large scale** (more dev/prototyping).
* **Use case**: Quick prototyping RAG apps.



#### **5. Qdrant**

* **Type**: Open-source + SaaS.
* **Strengths**:

  * High-performance ANN (Approximate Nearest Neighbor).
  * Great support for **filters + metadata**.
  * Good distributed support.
* **Use case**: Production-grade RAG with filtering, self-hosted or cloud.



#### **Comparison Table**

| Feature / DB      | Milvus                  | Pinecone             | Weaviate                 | Chroma             | Qdrant                  |
| ----------------- | ----------------------- | -------------------- | ------------------------ | ------------------ | ----------------------- |
| **Type**          | Open-source + Cloud     | Fully Managed SaaS   | Open-source + SaaS       | Open-source        | Open-source + SaaS      |
| **Scale**         | Billions                | Millions to billions | Millions to billions     | Small-medium scale | Millions to billions    |
| **Index Options** | IVF, HNSW, DiskANN      | Proprietary ANN      | HNSW                     | HNSW               | HNSW, IVF               |
| **Ease of Use**   | Medium (infra needed)   | High (serverless)    | Medium (schema-heavy)    | High (Pythonic)    | Medium                  |
| **Best for**      | Enterprises, data-heavy | Quick production use | Semantic + hybrid search | Prototyping        | Production with filters |



**In short**:

* Use **Pinecone** if you want managed, plug-and-play RAG.
* Use **Milvus** if you need **scalable + open-source** with full control.
* Use **Weaviate** if you want **hybrid search (text + keyword + filters)**.
* Use **Chroma** for **small projects/prototypes**.
* Use **Qdrant** for a **balanced open-source + production ready** setup.


## **PART 01:**
Use LangChain to create a vector database using Milvus Lite/Server


* Install Milvus Lite/Server (Use Langchain)
* Set up Vector Database
* Create Collection
* Prepare Data
* Insert Data
* Upsert Data
* Delete Data
* Semantic Search
* Vector Search with Metadata Filtering
* Delete Entities
* Load Existing Data
* Drop the Collection
* Delete Vector Database

In [None]:
# Install necessary Libaries:

%pip install -qU langchain-milvus

### **Load LLM & Embedding Model From AZURE-OpenAI:**

In [6]:
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from dotenv import load_dotenv
import os

load_dotenv()
os.environ['AZURE_OPENAI_ENDPOINT'] = os.getenv("AZURE_OPENAI_ENDPOINT")


llm = AzureChatOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_API_VERSION"),
    azure_deployment="gpt-4o",
    temperature=0.7
)

embeddings = AzureOpenAIEmbeddings(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_API_VERSION"),
    model="text-embedding-3-small"
)

In [9]:
llm.invoke("Hi").content

'Hello! 😊 How can I assist you today?'

In [8]:
len(embeddings.embed_query("Hi"))

1536

### **Milvus Server:**

When we have a large amount of data (e.g., more than a million vectors), recommend setting up a more performant Milvus server on Docker or Kubernetes. <br>

The Milvus server offers support for a **variety of indexes**. Leveraging these different indexes can significantly enhance the retrieval capabilities and expedite the retrieval process, tailored to your specific requirements.

In [10]:
from pymilvus import Collection, MilvusException, connections, db, utility

In [12]:
# Define the connection parameters for Milvus:

conn = connections.connect(host="127.0.0.1", port=19530)

In [13]:
# Check list of database exists in Milvus:

existing_databases = db.list_database()
existing_databases

['default']

### **Create Milvus Database:**

In [14]:
from pymilvus import Collection, MilvusException, connections, db, utility

conn = connections.connect(host="127.0.0.1", port=19530)

# Check if the database exists
db_name = "milvus_demo"

try:
    existing_databases = db.list_database()
    if db_name in existing_databases:
        print(f"Database '{db_name}' already exists.")

        # Use the database context
        db.using_database(db_name)

        # Drop all collections in the database
        collections = utility.list_collections()
        for collection_name in collections:
            collection = Collection(name=collection_name)
            collection.drop()
            print(f"Collection '{collection_name}' has been dropped.")

        db.drop_database(db_name)
        print(f"Database '{db_name}' has been deleted.")
    
    else:
        print(f"Database '{db_name}' does not exist.")
        database = db.create_database(db_name)
        print(f"Database '{db_name}' created successfully.")
        
except MilvusException as e:
    print(f"An error occurred: {e}")

Database 'milvus_demo' does not exist.
Database 'milvus_demo' created successfully.


### **Create Vector Store instance with the `Milvus` database:**

In [15]:
from langchain_milvus import BM25BuiltInFunction, Milvus

URI = "http://localhost:19530"
db_name = "milvus_demo"

vectorstore = Milvus(
    embedding_function=embeddings,
    connection_args={"uri": URI, "token": "root:Milvus", "db_name": db_name},
    index_params={"index_type": "FLAT", "metric_type": "L2"},
    consistency_level="Strong",
    drop_old=False,  # set to True if seeking to drop the collection with that name if it exists
)

vectorstore

<langchain_milvus.vectorstores.milvus.Milvus at 0x29b2671d010>

### **Store unrelated documents in different collections within the same Milvus instance:**

In [40]:
from langchain_core.documents import Document


documents = Document(
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
    page_content="""from the chaff. This process is called\nHarvest Festivals\nthreshing. This is carried out with the\nAfter three or four months of hard\nhelp of a machine called ‘combine’ which\nwork there comes the day of the\nis in fact a harvester as well as a thresher\nharvest. The sight of golden fields\n(Fig. 1.8).\nof standing crop, laden with grain,\nfills the hearts of farmers with joy\nand a sense of well-being. The\nefforts of the past season have\nborne fruit and it is time to relax\nand enjoy a little. The period of\nharvest is, thus, of great joy and\nhappiness in all parts of India.\nMen and women celebrate it with\ngreat enthusiasm. Special\nfestivals associated with the\nharvest season are Pongal,\nBaisakhi, Holi, Diwali, Nabanya\nand Bihu.\nFig. 1.8 : Combine\n1.9Storage\nAfter harvesting, sometimes\nStorage of produce is an important task.\nstubs are left in the field, which\nIf the harvested grains are to be kept\nare burnt by farmers. Paheli is\nfor longer time, they should be safe\nworried. She knows that it\ncauses pollution. It may also from moisture, insects, rats and\ncatch fire and damage the crops microorganisms. Harvested grains have\nlying in the fields. more moisture. If freshly harvested\ngrains (seeds) are stored without drying,\nFarmers with small holdings of land\nthey may get spoilt or attacked by\ndo the separation of grain and chaff by\norganisms, making them unfit for use\nwinnowing (Fig. 1.9). You have already\nor for germination. Hence, before\nstudied this in Class VI.\nstoring them, the grains are properly\ndried in the sun to reduce the moisture\nin them. This prevents the attack by\ninsect pests, bacteria and fungi.\nI saw my mother putting\nsome dried neem leaves\nin an iron drum\ncontaining wheat.\nI wonder why?\nFig. 1.9 : Winnowing machine\nCROP PRODUCTION AND MANAGEMENT 11\n2024-25\n"""
)



vector_store = Milvus.from_documents(
    documents=[documents],
    embedding=embeddings,
    connection_args={"uri": URI, "token": "root:Milvus", "db_name": "milvus_demo"},
    index_params={"index_type": "FLAT", "metric_type": "L2"},
    consistency_level="Strong",
    drop_old=False,
)

vector_store

<langchain_milvus.vectorstores.milvus.Milvus at 0x29b45926d50>

Here, I created the vectorstore, now we can interact with it by adding and deleting different items.

### **Add items to vector store:**

In [41]:
from uuid import uuid4
from langchain_core.documents import Document

In [44]:
# Get the Documents:

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={
        'source': 'raw/k8_science.pdf',
        'file_path': 'raw/k8_science.pdf',
        'page': 10,
        'total_pages': 182,
        'Title': 'Chapter-1.pmd',
        'Author': 'NCERT',
        'Creator': 'PageMaker 7.0',
        'CreationDate': 'D:20171130101551',
        'ModDate': 'D:20240623043247Z'
    },
)

In [45]:
# Add Documents:

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

The ids parameter is ignored when auto_id is True. The ids will be generated automatically.


[460242899926532017,
 460242899926532018,
 460242899926532019,
 460242899926532020,
 460242899926532021,
 460242899926532022,
 460242899926532023,
 460242899926532024,
 460242899926532025,
 460242899926532026]

### **Delete items from vector store:**

In [48]:
vector_store.delete(ids=[101])

True

### **Query vector store:**

In [49]:
# Similarity search:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2
)

results

[Document(metadata={'file_path': 'raw/k8_science.pdf', 'Title': 'Chapter-1.pmd', 'Creator': 'PageMaker 7.0', 'total_pages': 182, 'source': 'raw/k8_science.pdf', 'CreationDate': 'D:20171130101551', 'ModDate': 'D:20240623043247Z', 'page': 10, 'Author': 'NCERT', 'pk': 460242899926532019}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(metadata={'file_path': 'raw/k8_science.pdf', 'Title': 'Chapter-1.pmd', 'Creator': 'PageMaker 7.0', 'total_pages': 182, 'source': 'raw/k8_science.pdf', 'CreationDate': 'D:20171130101551', 'ModDate': 'D:20240623043247Z', 'page': 10, 'Author': 'NCERT', 'pk': 460242899926532024}, page_content='LangGraph is the best framework for building stateful, agentic applications!')]

In [52]:
# Similarity search by metadata filter:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    expr='Title == "LangChain"'
)

results

[]

In [53]:
# Similarity search by metadata filter:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    expr='Title == "Chapter-1.pmd"'
)

results

[Document(metadata={'Creator': 'PageMaker 7.0', 'Author': 'NCERT', 'source': 'raw/k8_science.pdf', 'CreationDate': 'D:20171130101551', 'file_path': 'raw/k8_science.pdf', 'page': 10, 'ModDate': 'D:20240623043247Z', 'total_pages': 182, 'Title': 'Chapter-1.pmd', 'pk': 460242899926532019}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(metadata={'Creator': 'PageMaker 7.0', 'Author': 'NCERT', 'source': 'raw/k8_science.pdf', 'CreationDate': 'D:20171130101551', 'file_path': 'raw/k8_science.pdf', 'page': 10, 'ModDate': 'D:20240623043247Z', 'total_pages': 182, 'Title': 'Chapter-1.pmd', 'pk': 460242899926532024}, page_content='LangGraph is the best framework for building stateful, agentic applications!')]

In [54]:
# Similarity search with score:

results = vector_store.similarity_search_with_score(
    "LangChain provides abstractions to make working with LLMs easy",
    k=1,
    expr='Title == "Chapter-1.pmd"'
)

results

[(Document(metadata={'page': 10, 'source': 'raw/k8_science.pdf', 'Author': 'NCERT', 'file_path': 'raw/k8_science.pdf', 'total_pages': 182, 'Creator': 'PageMaker 7.0', 'CreationDate': 'D:20171130101551', 'ModDate': 'D:20240623043247Z', 'Title': 'Chapter-1.pmd', 'pk': 460242899926532019}, page_content='Building an exciting new project with LangChain - come check it out!'),
  1.143980860710144)]

### **Query by turning into retriever:**

In [58]:
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})

retriever.invoke("Stealing from the bank is a crime")

[Document(metadata={'total_pages': 182, 'file_path': 'raw/k8_science.pdf', 'Author': 'NCERT', 'CreationDate': 'D:20171130101551', 'source': 'raw/k8_science.pdf', 'Creator': 'PageMaker 7.0', 'ModDate': 'D:20240623043247Z', 'Title': 'Chapter-1.pmd', 'page': 10, 'pk': 460242899926532020}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

In [61]:
# by metadata filter:

retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 1,
        "expr": 'Title == "Chapter-1.pmd"'   # use expr here
    }
)

results = retriever.invoke("Stealing from the bank is a crime")
results

[Document(metadata={'Title': 'Chapter-1.pmd', 'CreationDate': 'D:20171130101551', 'page': 10, 'file_path': 'raw/k8_science.pdf', 'ModDate': 'D:20240623043247Z', 'source': 'raw/k8_science.pdf', 'total_pages': 182, 'Author': 'NCERT', 'Creator': 'PageMaker 7.0', 'pk': 460242899926532020}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

### **Hybrid Search:**

The most common hybrid search scenario is the **`dense`** + **`sparse hybrid search`**, where candidates are retrieved using both semantic vector similarity and precise keyword matching. Results from these methods are **`merged`**, **`reranked`**, and **`passed to an LLM`** to generate the final answer. This approach balances precision and semantic understanding, making it highly effective for diverse query scenarios.

In [62]:
from langchain_milvus import BM25BuiltInFunction, Milvus

In [63]:
hybrid_vector_store = Milvus.from_documents(
    documents=documents,
    embedding=embeddings,
    builtin_function=BM25BuiltInFunction(),
    # `dense` is for OpenAI embeddings, `sparse` is the output field of BM25 function
    vector_field=["dense", "sparse"],
    connection_args={
        "uri": URI,
    },
    collection_name='my_collection',
    consistency_level="Strong",
    drop_old=False,
)

hybrid_vector_store

<langchain_milvus.vectorstores.milvus.Milvus at 0x29b57d40910>

In [64]:
# Rerank:

query = "What is LangChain?"

result = hybrid_vector_store.similarity_search(
    query=query,
    k=3,
    ranker_type="weighted", 
    ranker_params={"weights": [0.6, 0.4]}
)

result

[Document(metadata={'file_path': 'raw/k8_science.pdf', 'source': 'raw/k8_science.pdf', 'total_pages': 182, 'Author': 'NCERT', 'Creator': 'PageMaker 7.0', 'page': 10, 'CreationDate': 'D:20171130101551', 'ModDate': 'D:20240623043247Z', 'Title': 'Chapter-1.pmd', 'pk': 460242899926532030}, page_content='Building an exciting new project with LangChain - come check it out!'),
 Document(metadata={'file_path': 'raw/k8_science.pdf', 'source': 'raw/k8_science.pdf', 'total_pages': 182, 'Author': 'NCERT', 'Creator': 'PageMaker 7.0', 'page': 10, 'CreationDate': 'D:20171130101551', 'ModDate': 'D:20240623043247Z', 'Title': 'Chapter-1.pmd', 'pk': 460242899926532033}, page_content='Is the new iPhone worth the price? Read this review to find out.'),
 Document(metadata={'file_path': 'raw/k8_science.pdf', 'source': 'raw/k8_science.pdf', 'total_pages': 182, 'Author': 'NCERT', 'Creator': 'PageMaker 7.0', 'page': 10, 'CreationDate': 'D:20171130101551', 'ModDate': 'D:20240623043247Z', 'Title': 'Chapter-1.pmd'

In [66]:
result[0].metadata

{'file_path': 'raw/k8_science.pdf',
 'source': 'raw/k8_science.pdf',
 'total_pages': 182,
 'Author': 'NCERT',
 'Creator': 'PageMaker 7.0',
 'page': 10,
 'CreationDate': 'D:20171130101551',
 'ModDate': 'D:20240623043247Z',
 'Title': 'Chapter-1.pmd',
 'pk': 460242899926532030}