# VectorDB

VectorDB interface allows to insert vector embeddings into a Vector Database. These records can then later be queried for nearest neighbour search, and return back similar records found in the database.

To insert and query the records, we have to do the following -

1. Create embeddings for our nodes
1. Insert the embeddings along with node metadata into the vector database
1. Take an input query and embed it into a vector embedding
1. Query the VectorDB with input embedding and get similar node back from VectorDB

We have already covered task #1 on how to convert a Node into an embedding in the [Embedder](Embedder) guide. In this guide we are going to try out task #2, #3, and #4.

For this we need to:

1. Prepare our Node embeddings as we did in the [Embedder](Embedder) guide
1. Get an instance of VectorDB using the `get_vector_db` method, along with its configuration
1. Insert our nodes into the vector database using the `upsert` method
1. Embed an input query using the same [Embedder](Embedder)
1. Query the VectorDB using the `query` method

For this Getting Started guide, we are going to use Qdrant in-memory vector database. Let's get started.

In [1]:
# ensure the qdrant dependencies are installed
!pip install -q qdrant-client

In [2]:
# ensure the bodhiext.qdrant plugin is installed
# !pip install -q bodhiext.qdrant

In [3]:
# prepare the node embeddings for the Paul Graham's essay:
# 1. Load the Paul Graham essays from data/data-loader directory using `file` DataLoader
# 2. Convert it into Nodes using sentence_splitter
# 3. Enrich node embeddings using the sentence_transformers
import os
from pathlib import Path
from bodhilib import get_data_loader, get_splitter, get_embedder

# Get data directory path and add it to data_loader
current_dir = current_working_directory = Path(os.getcwd())
data_dir = current_dir / ".." / "data" / "data-loader"
data_loader = get_data_loader('file')
data_loader.add_resource(dir=str(data_dir))
docs = data_loader.load()
splitter = get_splitter("text_splitter", max_len=300, overlap=30)
nodes = splitter.split(docs)
embedder = get_embedder("sentence_transformers")
_ = embedder.embed(nodes)

In [4]:
# Get an instance of Vector DB
from bodhilib import get_vector_db

vector_db = get_vector_db("qdrant", location=":memory:")

In [5]:
from bodhilib import Distance

collection_name = "test_collection"
if "test_collection" in vector_db.get_collections():
    vector_db.delete_collection("test_collection")
vector_db.create_collection(collection_name=collection_name, dimension=embedder.dimension, distance=Distance.COSINE)

True

In [6]:
# insert the Node embeddings and return database enriched Node object

_ = vector_db.upsert(collection_name, nodes)

In [7]:
# the nodes object has been enriched with database record identifier `id`
print(nodes[0].id)

db49994d-3fc2-4c02-bf12-00319f212339


In [8]:
# Embed the input query using Embedder
input_query = "According to Paul Graham, how to tackle when you are in doubt?"
embedding = embedder.embed(input_query)

In [9]:
type(embedding)

list

In [10]:
# query the vector db using the input embedding
result = vector_db.query(collection_name, embedding[0].embedding, limit=5)

In [11]:
import textwrap

print(textwrap.fill(result[0].text, 100))

who sits back and offers sophisticated-sounding criticisms of them. "It's easy to criticize" is true
in the most literal sense, and the route to great work is never easy.  There may be some jobs where
it's an advantage to be cynical and pessimistic, but if you want to do great work it's an advantage
to be optimistic, even though that means you'll risk looking like a fool sometimes. There's an old
tradition of doing the opposite. The Old Testament says it's better to keep quiet lest you look like
a fool. But that's advice for seeming smart. If you actually want to discover new things, it's
better to take the risk of telling people your ideas.  Some people are naturally earnest, and with
others it takes a conscious effort. Either kind of earnestness will suffice. But I doubt it would be
possible to do great work without being earnest. It's so hard to do even if you are. You don't have
enough margin for error to accommodate the distortions introduced by being affected, intellectually
dish

---
🎉 We just inserted and queried our document using VectorDB.

Finally, let’s see how we can use LLM to do extractive Q&A and give us direct result from the selected records.