# Lesson 4: Vector Databases

**Objective**: Build a retrieval system that efficiently searches for relevant document chunks.

**Topics**:
- Sparse vs. dense retrieval methods
- Hybrid search methods (e.g., combining BM25 with dense retrieval)
- Overview of vector databases: Milvus, Faiss, Qdrant

**Practical Task**: Set up a vector database and implement a retrieval method.

**Resources**:
- What is a vector database
- Choosing a vector database


# 1. Vector Store from Scratch

In [1]:
#!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

## Imports

In [1]:
from typing import  Any, cast

from llama_index.core import VectorStoreIndex
from llama_index.core.bridge.pydantic import Field
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.schema import TextNode, BaseNode
from llama_index.core.vector_stores import MetadataFilters, VectorStoreQuery, VectorStoreQueryResult
from llama_index.core.vector_stores.types import BasePydanticVectorStore
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.readers.file import PyMuPDFReader

import numpy as np


  from .autonotebook import tqdm as notebook_tqdm


## Setup

We load in some documents, and parse them into `Node` objects - chunks that are ready to be inserted into a vector store.

### Load in Documents

In [2]:
loader = PyMuPDFReader()
documents = loader.load(file_path="./data/llama2.pdf")

### Parse into Nodes

In [3]:
node_parser = SentenceSplitter(chunk_size=256)
nodes = node_parser.get_nodes_from_documents(documents)

### Generate Embeddings for each Node

In [4]:
embed_model  = HuggingFaceEmbedding(model_name = "BAAI/bge-small-en")

2025-11-07 19:48:06,966 - INFO - Load pretrained SentenceTransformer: BAAI/bge-small-en
2025-11-07 19:48:12,348 - INFO - 1 prompt is loaded, with the key: query


In [5]:
for node in nodes:
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )
    node.embedding = node_embedding

## Build a Simple In-Memory Vector Store

Now we'll build our in-memory vector store. We'll store Nodes within a simple Python dictionary. We'll start off implementing embedding search, and add metadata filters.

## 1. Defining the Interface
We'll first define the interface for building a vector store. It contains the following items:
    - get
    - add
    - delete
    - query
    - persist (which we will not implement)

In [6]:
class BaseVectorStore(BasePydanticVectorStore):
    """Simple custom Vector Store.

    Stores documents in a simple in-memory dict.

    """

    stores_text: bool = True

    def client(self) -> Any:
        """Get client."""
        return None

    def get(self, text_id: str) -> list[float]:
        """Get embedding."""
        pass

    def add(
        self,
        nodes: list[BaseNode],
    ) -> list[str]:
        """Add nodes to index."""
        pass

    def delete(self, ref_doc_id: str, **delete_kwargs: Any) -> None:
        """
        Delete nodes using with ref_doc_id.

        Args:
            ref_doc_id (str): The doc_id of the document to delete.

        """
        pass

    def query(
        self,
        query: VectorStoreQuery,
        **kwargs: Any,
    ) -> VectorStoreQueryResult:
        """Get nodes for response."""
        pass

    def persist(self, persist_path, fs=None) -> None:
        """Persist the SimpleVectorStore to a directory.

        NOTE: we are not implementing this for now.

        """
        pass


At a high-level, we subclass our base `VectorStore` abstraction. There's no inherent reason to do this if you're just building a vector store from scratch. We do it because it makes it easy to plug into our downstream abstractions later.

Let's look at some of the classes defined here.

`BaseNode` is simply the parent class of our core Node modules. Each Node represents a text chunk + associated metadata.
We also use some lower-level constructs, for instance our `VectorStoreQuery` and `VectorStoreQueryResult`. These are just lightweight dataclass containers to represent queries and results. We look at the dataclass fields below.

## 2. Defining add, get, and delete
We add some basic capabilities to add, get, and delete from a vector store.

The implementation is very simple (everything is just stored in a python dictionary).

In [7]:
class VectorStore2(BaseVectorStore):
    """VectorStore2 (add/get/delete implemented)."""

    stores_text: bool = True
    node_dict: dict[str, BaseNode] = Field(default_factory=dict)

    def get(self, text_id: str) -> list[float]:
        """Get embedding."""
        return self.node_dict[text_id]

    def add(
        self,
        nodes: list[BaseNode],
    ) -> list[str]:
        """Add nodes to index."""
        for node in nodes:
            self.node_dict[node.node_id] = node

    def delete(self, node_id: str, **delete_kwargs: Any) -> None:
        """
        Delete nodes using with node_id.

        Args:
            node_id: str

        """
        del self.node_dict[node_id]

In [8]:
test_node = TextNode(id_="id1", text="hello world")
test_node2 = TextNode(id_="id2", text="foo bar")
test_nodes = [test_node, test_node2]

In [9]:
vector_store = VectorStore2()

vector_store.add(test_nodes)

In [10]:
node = vector_store.get("id1")
print(str(node))

Node ID: id1
Text: hello world


## 3.a Defining query (semantic search)
We implement a basic version of top-k semantic search. This simply iterates through all document embeddings, and compute cosine-similarity with the query embedding. The top-k documents by cosine similarity are returned.

Cosine similarity: $\dfrac{\vec{d}\vec{q}}{|\vec{d}||\vec{q}|}$ for every document, query embedding pair $\vec{d}$, $\vec{p}$.

NOTE: The top-k value is contained in the VectorStoreQuery container.

NOTE: Similar to the above, we define another subclass just so we don't have to reimplement the above functions (not because this is actually good code practice).

In [11]:
def get_top_k_embeddings(
    query_embedding: list[float],
    doc_embeddings: list[list[float]],
    doc_ids: list[str],
    similarity_top_k: int = 5,
) -> tuple[list[float], list]:
    """Get top nodes by similarity to the query."""
    # dimensions: D
    q_embed_np = np.array(query_embedding)
    # dimensions: N x D
    d_embed_np = np.array(doc_embeddings)
    # dimensions: N
    d_product_arr = np.dot(d_embed_np, q_embed_np)
    # dimensions: N
    norm_arr = np.linalg.norm(q_embed_np) * np.linalg.norm(
        d_embed_np, axis=1, keepdims=False
    )
    # dimensions: N
    cos_sim_arr = d_product_arr / norm_arr

    # now we have the N cosine similarities for each document
    # sort by top k cosine similarity, and return ids
    tups = [(cos_sim_arr[i], doc_ids[i]) for i in range(len(doc_ids))]
    sorted_tups = sorted(tups, key=lambda t: t[0], reverse=True)

    sorted_tups = sorted_tups[:similarity_top_k]

    result_similarities = [s for s, _ in sorted_tups]
    result_ids = [n for _, n in sorted_tups]
    return result_similarities, result_ids

In [12]:
class VectorStore3A(VectorStore2):
    """Implements semantic/dense search."""

    def query(
        self,
        query: VectorStoreQuery,
        **kwargs: Any,
    ) -> VectorStoreQueryResult:
        """Get nodes for response."""

        query_embedding = cast(list[float], query.query_embedding)
        doc_embeddings = [n.embedding for n in self.node_dict.values()]
        doc_ids = [n.node_id for n in self.node_dict.values()]

        similarities, node_ids = get_top_k_embeddings(
            query_embedding,
            doc_embeddings,
            doc_ids,
            similarity_top_k=query.similarity_top_k,
        )
        result_nodes = [self.node_dict[node_id] for node_id in node_ids]

        return VectorStoreQueryResult(
            nodes=result_nodes, similarities=similarities, ids=node_ids
        )

## 3.b. Supporting Metadata Filtering
The next extension is adding metadata filter support. This means that we will first filter the candidate set with documents that pass the metadata filters, and then perform semantic querying.

For simplicity we use metadata filters for exact matching with an AND condition.

In [13]:
def filter_nodes(nodes: list[BaseNode], filters: MetadataFilters):
    filtered_nodes = []
    for node in nodes:
        matches = True
        for f in filters.filters:
            if f.key not in node.metadata:
                matches = False
                continue
            if f.value != node.metadata[f.key]:
                matches = False
                continue
        if matches:
            filtered_nodes.append(node)
    return filtered_nodes

We add filter_nodes as a first-pass over the nodes before running semantic search.

In [14]:
def dense_search(query: VectorStoreQuery, nodes: list[BaseNode]):
    """Dense search."""
    query_embedding = cast(list[float], query.query_embedding)
    doc_embeddings = [n.embedding for n in nodes]
    doc_ids = [n.node_id for n in nodes]
    return get_top_k_embeddings(
        query_embedding,
        doc_embeddings,
        doc_ids,
        similarity_top_k=query.similarity_top_k,
    )

In [15]:
class VectorStore3B(VectorStore2):
    """Implements Metadata Filtering."""

    def query(
        self,
        query: VectorStoreQuery,
        **kwargs: Any,
    ) -> VectorStoreQueryResult:
        """Get nodes for response."""
        # 1. First filter by metadata
        nodes = self.node_dict.values()
        if query.filters is not None:
            nodes = filter_nodes(nodes, query.filters)
        if len(nodes) == 0:
            result_nodes = []
            similarities = []
            node_ids = []
        else:
            # 2. Then perform semantic search
            similarities, node_ids = dense_search(query, nodes)
            result_nodes = [self.node_dict[node_id] for node_id in node_ids]
        return VectorStoreQueryResult(
            nodes=result_nodes, similarities=similarities, ids=node_ids
        )

## 4. Load Data into our Vector Store
Let's load our text chunks into the vector store, and run it on different types of queries: dense search, w/ metadata filters, and more.

In [16]:
vector_store = VectorStore3B()
# load data into the vector stores
vector_store.add(nodes)

Define an example question and embed it.

In [17]:
query_str = "Can you tell me about the key concepts for safety finetuning"
query_embedding = embed_model.get_query_embedding(query_str)
print(query_embedding)

[-0.057459719479084015, 0.0006168404361233115, 0.007399792782962322, -0.02610773779451847, 0.010598315857350826, 0.00798750203102827, 0.06677503883838654, 0.03960995748639107, 0.0025795777328312397, -0.00429012905806303, -0.051723357290029526, -0.036634426563978195, 0.01641755737364292, 0.020253067836165428, 0.0051087080501019955, 0.003774326993152499, 0.032964076846838, 0.032009292393922806, -0.008439945057034492, 0.019312605261802673, -0.004506593104451895, 0.03124888800084591, -0.024012679234147072, -0.020971687510609627, 0.00496076513081789, -0.0026956028304994106, 0.017490491271018982, 0.002253093058243394, -0.017213137820363045, -0.19273661077022552, -0.04692847281694412, -0.03517276048660278, -0.02127939835190773, -0.007309691980481148, -0.02094404771924019, -0.008686757646501064, -0.02881697379052639, 0.031263578683137894, -0.004988396540284157, -0.003774901619181037, 0.04736080393195152, 0.017691638320684433, -0.008748934604227543, -0.009141998365521431, -0.01443556509912014, 

### Query the vector store with dense search.

In [18]:
query_obj = VectorStoreQuery(
    query_embedding=query_embedding, similarity_top_k=2
)

query_result = vector_store.query(query_obj)
for similarity, node in zip(query_result.similarities, query_result.nodes):
    print(
        "\n----------------\n"
        f"[Node ID {node.node_id}] Similarity: {similarity}\n\n"
        f"{node.get_content(metadata_mode='all')}"
        "\n----------------\n\n"
    )


----------------
[Node ID 45ab3492-2272-4969-a3cc-54aa1a368822] Similarity: 0.8711294993852078

total_pages: 77
file_path: ./data/llama2.pdf
source: 23

Benchmarks give a summary view of model capabilities and behaviors that allow us to understand general
patterns in the model, but they do not provide a fully comprehensive view of the impact the model may have
on people or real-world outcomes; that would require study of end-to-end product deployments. Further
testing and mitigation should be done to understand bias and other social issues for the specific context
in which a system may be deployed. For this, it may be necessary to test beyond the groups available in
the BOLD dataset (race, religion, and gender). As LLMs are integrated and deployed, we look forward to
continuing research that will amplify their potential for positive impact on these important social issues.
4.2
Safety Fine-Tuning
In this section, we describe our approach to safety fine-tuning, including safety categori

### Query the vector store with dense search + Metadata Filters

In [19]:
# filters = MetadataFilters(
#     filters=[
#         ExactMatchFilter(key="page", value=3)
#     ]
# )
filters = MetadataFilters.from_dict({"source": "24"})

query_obj = VectorStoreQuery(
    query_embedding=query_embedding, similarity_top_k=2, filters=filters
)

query_result = vector_store.query(query_obj)
for similarity, node in zip(query_result.similarities, query_result.nodes):
    print(
        "\n----------------\n"
        f"[Node ID {node.node_id}] Similarity: {similarity}\n\n"
        f"{node.get_content(metadata_mode='all')}"
        "\n----------------\n\n"
    )


----------------
[Node ID 459aba14-1d81-4948-b581-4605636f2520] Similarity: 0.8589715396624741

total_pages: 77
file_path: ./data/llama2.pdf
source: 24

We then define best practices for safe and helpful model responses: the model should first address immediate
safety concerns if applicable, then address the prompt by explaining the potential risks to the user, and finally
provide additional information if possible. We also ask the annotators to avoid negative user experience
categories (see Appendix A.5.2). The guidelines are meant to be a general guide for the model and are
iteratively refined and revised to include newly identified risks.
4.2.2
Safety Supervised Fine-Tuning
In accordance with the established guidelines from Section 4.2.1, we gather prompts and demonstrations
of safe model responses from trained annotators, and use the data for supervised fine-tuning in the same
manner as described in Section 3.1. An example can be found in Table 5.
The annotators are instructed to 

  filters = MetadataFilters.from_dict({"source": "24"})


## Build a RAG System with the Vector Store
Now that we've built the RAG system, it's time to plug it into our downstream system!

In [20]:
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)

In [21]:
llm = HuggingFaceLLM(
    model_name="google/gemma-3-270m", 
    tokenizer_name="google/gemma-3-270m",
    device_map="auto"
)

In [22]:
response = llm.complete("What is the meaning of life?")
print(str(response))

 Why is life worth having? How do we define it? How can we be sure that it is really worth having?

These are just a few questions the world of quantum physics has been asking for the past 25 years, and their answers are still getting better and better each and every day. Quantum physics is at the forefront of a wide array of scientific and technological innovations that impact many of our daily lives. If quantum physics is the answer to any of these questions, it will be a monumental step forward for humanity.

<h2><strong>15. Are you ever going to understand what you think?</strong></h2>

When we are asking the question, “What is life worth?” we are never truly getting to the point of actually understanding what life is worth. The world is full of complex things – things that we are never going to understand.

This brings us back to our original question. What is life worth? The first question is not going to be answered by us. The only way you will ever know what life is worth is by

In [23]:
query_engine = index.as_query_engine(llm=llm)

In [24]:
query_str = "Can you tell me about the key concepts for safety finetuning"

In [25]:
response = query_engine.query(query_str)
print(str(response))

3.2

2. Unsupervised Safety Fine-Tuning:
We use unsupervised safety fine-tuning to build more adversarial datasets with more context and
data about the topic under investigation.
Total Pages: 77
file_path: ./data/llama2.pdf
source: 23

The resulting
dataset is then further fine-tuned using supervised and unsupervised fine-tuning methods to reduce the
variation in model performance. This is done by training a model with training data from which it can
easily re-fit with new data from which it is not. It also allows the model to scale as the model is applied to
new datasets, such as human speech annotations. The process differs from that described in Section 3.2.
4.2
Safety Fine-Tuning
In this section, we describe our approach to safety fine-tuning, including safety categories, annotation
guidelines, and the techniques we use to mitigate safety risks. We employ a process similar to the general
fine-tuning methods as described in Section 3, with some notable differences related to safety 

## Conclusion
That's it! We've built a simple in-memory vector store that supports very simple inserts, gets, deletes, and supports dense search and metadata filtering. This can then be plugged into the rest of LlamaIndex abstractions.

It doesn't support sparse search yet and is obviously not meant to be used in any sort of actual app. But this should expose some of what's going on under the hood!

# 2. Vector Store Using Libraries

#### Load the dataset

In [28]:
from langchain_community.document_loaders import PyPDFLoader
from dotenv import load_dotenv

load_dotenv()

file_path = (
   "data/visual_instruction_tunning.pdf"
)
loader = PyPDFLoader(file_path)
docs = loader.load_and_split()

In [29]:
docs[0].page_content

'Visual Instruction Tuning\nHaotian Liu1∗, Chunyuan Li2∗, Qingyang Wu3, Yong Jae Lee1\n1University of Wisconsin–Madison 2Microsoft Research 3Columbia University\nhttps://llava-vl.github.io\nAbstract\nInstruction tuning large language models (LLMs) using machine-generated\ninstruction-following data has been shown to improve zero-shot capabilities on\nnew tasks, but the idea is less explored in the multimodal field. We present the\nfirst attempt to use language-only GPT-4 to generate multimodal language-image\ninstruction-following data. By instruction tuning on such generated data, we in-\ntroduce LLaV A:Large Language and Vision Assistant, an end-to-end trained\nlarge multimodal model that connects a vision encoder and an LLM for general-\npurpose visual and language understanding. To facilitate future research on visual\ninstruction following, we construct two evaluation benchmarks with diverse and\nchallenging application-oriented tasks. Our experiments show that LLaV A demon-\nstra

### Embeddings function

In [30]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
embedded_document = embedding_model.embed_query(docs[0].page_content)
embedded_document[:3]

2025-11-07 19:50:10,923 - INFO - Use pytorch device_name: mps
2025-11-07 19:50:10,923 - INFO - Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2


[0.027461927384138107, 0.04477216303348541, -0.015285043977200985]

# A first approach

In [31]:
from dotenv import load_dotenv

load_dotenv()

True

In [32]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

In [33]:
client = QdrantClient(path="/tmp/langchain_qdrant")

In [34]:
client.delete_collection(collection_name="demo_collection")

True

In [35]:
client.create_collection(
    collection_name="demo_collection",
    vectors_config=VectorParams(size=768, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client,
    collection_name="demo_collection",
    embedding=embedding_model,
)

In [36]:
vector_store.add_documents(docs)


['176204090b3345b88b68ce641029cb90',
 'a54c19715ac14a22aab0f1da69167db1',
 '4469b9cee609494fb88a68b5193e1129',
 '4de77f3d5f7148c09072cb036768dd34',
 '2ce6b79d30bb4a9e99caeea4228481e9',
 'bddf4230fe264f779e7830f3b5a8ae8a',
 '079f1905fde5434b91bc866315cd0cce',
 '8ae80c13fa05483e84a9e8fba6d5f2c8',
 '5cddc057514648abb6f7b21aaa9034f9',
 '6cfff7eb883a48adb2c18ccbbd0b35d9',
 '4966de4c485d4ed898f3075ad6a8a91f',
 'b3afd7844f09490d9252ac71c4444f1d',
 '9644d426c3914243bc8f72bec4d404b7',
 '9617fe07f12741adb7108641bce6bc3a',
 '292c7d5cdc4749d3820b6c20f2a4c90e',
 'c6a2ab29ba964c4c94a2d8aa7dd192a6',
 'b666722b009b4429b965b21abeb6415a',
 '22d98c5553dc4bad99e47bbce01c51ce',
 '37542ce8c9364b9e8eb51cc2af4df218',
 '320a49502b42402aa4e525163f7e0cc2',
 '727f32fcd09749ed8e4494e039f0d6b4',
 '59709793229348f8a3687a4f75a3acc5',
 'bd605146ca904ffeaed506679a999028',
 '25998e552ce8455e9162b8a1c8b68994',
 '44a08f65cf1c4f95bc389bc0ef34e90f',
 'd81081a5f2394dca9a159165ba18bb76',
 'b7f2947ff8574267ac91bef3f4ff9acb',
 

In [37]:
client.scroll(collection_name="demo_collection", limit=3)

([Record(id='079f1905fde5434b91bc866315cd0cce', payload={'page_content': 'H v\nImageLanguage Instruction\nLanguage Response \n<latexit sha1_base64="/KN5R7NUwEKH6XBR4DKeLzGzIrU=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSyCq5KIqMuimy4r2Ac0IUymk3bo5OHMjVhD8FfcuFDErf/hzr9x2mahrQcuHM65l3vv8RPBFVjWt1FaWl5ZXSuvVzY2t7Z3zN29topTSVmLxiKWXZ8oJnjEWsBBsG4iGQl9wTr+6Hrid+6ZVDyObmGcMDckg4gHnBLQkmceZI4f4EbuZQ6wBwDI7vLcM6tWzZoCLxK7IFVUoOmZX04/pmnIIqCCKNWzrQTcjEjgVLC84qSKJYSOyID1NI1IyJSbTa/P8bFW+jiIpa4I8FT9PZGRUKlx6OvOkMBQzXsT8T+vl0Jw6WY8SlJgEZ0tClKBIcaTKHCfS0ZBjDUhVHJ9K6ZDIgkFHVhFh2DPv7xI2qc1+7xm35xV61dFHGV0iI7QCbLRBaqjBmqiFqLoET2jV/RmPBkvxrvxMWstGcXMPvoD4/MHSKCVxg==</latexit>\nH q\n<latexit sha1_base64="4a/5KuBhqFrRimsGds8xVP6ZkkY=">AAAB/XicbVDLSsNAFJ34rPUVHzs3g0VwVRIRdVl047KCfUAbwmQ6aYdOJnHmRqwh+CtuXCji1v9w5984bbPQ1gMXDufcy733BIngGhzn21pYXFpeWS2tldc3Nre27Z3dpo5TRVmDxiJW7YBoJrhkDeAgWDtRjESBYK1geDX2W/dMaR7LWxglzItIX/KQUwJG8u39rBuEuJ37WRfYAwBkd3nu2xWn6kyA54lbkAoqUPftr24vpmnEJFBBtO64TgJeRhRwKlhe7qaaJYQOSZ91DJUkYtrLJ

# Dense search

In [38]:
from langchain_qdrant import RetrievalMode

qdrant = QdrantVectorStore.from_documents(
    docs,
    embedding=embedding_model,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.DENSE,
)

In [39]:
query = "What datasets they used to benchmark LLAVA?"
found_docs = qdrant.similarity_search(query)

In [40]:
found_docs

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-12-14T01:02:53+00:00', 'author': '', 'keywords': '', 'moddate': '2023-12-14T01:02:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'data/visual_instruction_tunning.pdf', 'total_pages': 25, 'page': 19, 'page_label': '20', '_id': '7a406f0fdf1642a9a57a3a33133636ba', '_collection_name': 'my_documents'}, page_content='C Training Details\nWe pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a\nbatch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a\nlearning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no\nweight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full\nShard Data Parallel) and gradi

In [41]:
retriever = qdrant.as_retriever()
retriever.invoke(query)

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-12-14T01:02:53+00:00', 'author': '', 'keywords': '', 'moddate': '2023-12-14T01:02:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'data/visual_instruction_tunning.pdf', 'total_pages': 25, 'page': 19, 'page_label': '20', '_id': '7a406f0fdf1642a9a57a3a33133636ba', '_collection_name': 'my_documents'}, page_content='C Training Details\nWe pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a\nbatch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a\nlearning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no\nweight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full\nShard Data Parallel) and gradi

# Sparse Vector Search

To search with only sparse vectors,

The retrieval_mode parameter should be set to RetrievalMode.SPARSE.
An implementation of the SparseEmbeddings interface using any sparse embeddings provider has to be provided as value to the sparse_embedding parameter.
The langchain-qdrant package provides a FastEmbed based implementation out of the box.

In [42]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25", cache_dir="cache")

qdrant = QdrantVectorStore.from_documents(
    docs,
    embedding=embedding_model,
    sparse_embedding=sparse_embeddings,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.SPARSE,
)

In [43]:
found_docs = qdrant.similarity_search(query)

In [44]:
found_docs

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-12-14T01:02:53+00:00', 'author': '', 'keywords': '', 'moddate': '2023-12-14T01:02:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'data/visual_instruction_tunning.pdf', 'total_pages': 25, 'page': 19, 'page_label': '20', '_id': '622942565ee04743966c76b7b42fa1aa', '_collection_name': 'my_documents'}, page_content='C Training Details\nWe pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a\nbatch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a\nlearning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no\nweight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full\nShard Data Parallel) and gradi

# Hybrid Search

To perform a hybrid search using dense and sparse vectors with score fusion,

The retrieval_mode parameter should be set to RetrievalMode.HYBRID.
A dense embeddings value should be provided to the embedding parameter.
An implementation of the SparseEmbeddings interface using any sparse embeddings provider has to be provided as value to the sparse_embedding parameter.
Note that if you've added documents with the HYBRID mode, you can switch to any retrieval mode when searching. Since both the dense and sparse vectors are available in the collection.

In [45]:
from langchain_qdrant import FastEmbedSparse, RetrievalMode

sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")

qdrant = QdrantVectorStore.from_documents(
    docs,
    embedding=embedding_model,
    sparse_embedding=sparse_embeddings,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.HYBRID,
)

Fetching 18 files: 100%|██████████| 18/18 [00:01<00:00, 13.36it/s]


In [46]:
found_docs = qdrant.similarity_search(query)

In [47]:
found_docs

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-12-14T01:02:53+00:00', 'author': '', 'keywords': '', 'moddate': '2023-12-14T01:02:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'data/visual_instruction_tunning.pdf', 'total_pages': 25, 'page': 19, 'page_label': '20', '_id': 'cee7ed6308184fe5a0770176ca5d54c7', '_collection_name': 'my_documents'}, page_content='C Training Details\nWe pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a\nbatch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a\nlearning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no\nweight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full\nShard Data Parallel) and gradi

In [58]:
#If you want to execute a similarity search and receive the corresponding scores you can run:
results = vector_store.similarity_search_with_score(
    query=query, k=1
)
for doc, score in results:
    print(f"* [SIM={score:3f}] * {doc.page_content} [{doc.metadata}]")

* [SIM=0.445506] * C Training Details
We pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a
batch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a
learning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no
weight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full
Shard Data Parallel) and gradient checkpointing is used to save GPU memory, and offloading is not
used. BF16 and TF32 are enabled to achieve a balance between speed and precision.
We train all models with 8× A100s. Pretraining on CC-595K completes within 4 hours. Finetuning
on Instruct-158K completes within 10 hours. Finetuning on ScienceQA completes within 4 hours.
D Assets
Our source code, generated instruction-tuning data, proposed benchmark are uploaded to the
anonymized GitHub repository: LLaV A-Annonymous/LLaV A.
1. Source Code: link
2. README: link
3. In

# Metadata filtering

In [75]:
from qdrant_client.http import models

results = vector_store.similarity_search(
    query=query,
    k=1,
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* C Training Details
We pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a
batch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a
learning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no
weight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full
Shard Data Parallel) and gradient checkpointing is used to save GPU memory, and offloading is not
used. BF16 and TF32 are enabled to achieve a balance between speed and precision.
We train all models with 8× A100s. Pretraining on CC-595K completes within 4 hours. Finetuning
on Instruct-158K completes within 10 hours. Finetuning on ScienceQA completes within 4 hours.
D Assets
Our source code, generated instruction-tuning data, proposed benchmark are uploaded to the
anonymized GitHub repository: LLaV A-Annonymous/LLaV A.
1. Source Code: link
2. README: link
3. Instructions to lau

In [76]:
results

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-12-14T01:02:53+00:00', 'author': '', 'keywords': '', 'moddate': '2023-12-14T01:02:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'data/visual_instruction_tunning.pdf', 'total_pages': 25, 'page': 19, 'page_label': '20', '_id': '9c1b515339ff4f05bf5eb8d0045d13f0', '_collection_name': 'demo_collection'}, page_content='C Training Details\nWe pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a\nbatch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a\nlearning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no\nweight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full\nShard Data Parallel) and gr

In [79]:
results = vector_store.similarity_search(
    query=query,
    k=1,
    filter=models.Filter(
        must=[
            models.FieldCondition(
                key="metadata.producer",  # Note the "metadata." prefix!
                match=models.MatchValue(
                    value="pdfTeX-1.40.25"
                ),
            ),
        ]
    ),
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* C Training Details
We pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a
batch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a
learning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no
weight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full
Shard Data Parallel) and gradient checkpointing is used to save GPU memory, and offloading is not
used. BF16 and TF32 are enabled to achieve a balance between speed and precision.
We train all models with 8× A100s. Pretraining on CC-595K completes within 4 hours. Finetuning
on Instruct-158K completes within 10 hours. Finetuning on ScienceQA completes within 4 hours.
D Assets
Our source code, generated instruction-tuning data, proposed benchmark are uploaded to the
anonymized GitHub repository: LLaV A-Annonymous/LLaV A.
1. Source Code: link
2. README: link
3. Instructions to lau

In [80]:
results


[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-12-14T01:02:53+00:00', 'author': '', 'keywords': '', 'moddate': '2023-12-14T01:02:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'data/visual_instruction_tunning.pdf', 'total_pages': 25, 'page': 19, 'page_label': '20', '_id': '9c1b515339ff4f05bf5eb8d0045d13f0', '_collection_name': 'demo_collection'}, page_content='C Training Details\nWe pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a\nbatch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a\nlearning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no\nweight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full\nShard Data Parallel) and gr

## Query by turning into a retriever

In [70]:
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 5})
retriever.invoke(query)

[Document(metadata={'producer': 'pdfTeX-1.40.25', 'creator': 'LaTeX with hyperref', 'creationdate': '2023-12-14T01:02:53+00:00', 'author': '', 'keywords': '', 'moddate': '2023-12-14T01:02:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'data/visual_instruction_tunning.pdf', 'total_pages': 25, 'page': 19, 'page_label': '20', '_id': '9c1b515339ff4f05bf5eb8d0045d13f0', '_collection_name': 'demo_collection'}, page_content='C Training Details\nWe pre-train our model on the filtered CC-595K subset for 1 epoch with a learning rate of 2e-3 and a\nbatch size of 128, and fine-tune on the proposed LLaV A-Instruct-158K dataset for 3 epochs, with a\nlearning rate of 2e-5 and a batch size of 32. Following Vicuna, we use the Adam optimizer with no\nweight decay and a cosine learning rate with a warmup ratio of 3%. During finetuning, FSDP (Full\nShard Data Parallel) and gr