# KDB.AI

> [KDB.AI](https://kdb.ai/) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.

[This example](https://github.com/KxSystems/kdbai-samples/blob/main/document_search/document_search.ipynb) demonstrates how to use KDB.AI to run semantic search on unstructured text documents.

To access your end point and API keys, [sign up to KDB.AI here](https://kdb.ai/get-started/).

To set up your development environment, follow the instructions on the [KDB.AI pre-requisites page](https://code.kx.com/kdbai/pre-requisites.html).

The following examples demonstrate some of the ways you can interact with KDB.AI through LangChain.

You'll need to install `langchain-community` with `pip install -qU langchain-community` to use this integration

## Import required packages

In [1]:
import os
import time
from getpass import getpass

import kdbai_client as kdbai
import pandas as pd
import requests
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import KDBAI
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

In [2]:
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
    os.environ["KDBAI_API_KEY"]
    if "KDBAI_API_KEY" in os.environ
    else getpass("KDB.AI API key: ")
)

# OpenAI API Key: https://platform.openai.com/api
os.environ["OPENAI_API_KEY"] = (
    os.environ["OPENAI_API_KEY"]
    if "OPENAI_API_KEY" in os.environ
    else getpass("OpenAI API Key: ")
)


In [3]:
TEMP = 0.0
K = 3

## Create a KBD.AI Session

In [None]:
print("Create a KDB.AI session...")
session = kdbai.Session(endpoint=KDBAI_ENDPOINT, api_key=KDBAI_API_KEY)

Create a KDB.AI session...


## Create a table

In [13]:
print('Create table "documents"...')

# Set up the schema and indexes for KDB.AI table, specifying embeddings column with 384 dimensions, Euclidean Distance, and flat index
schema = [
    {"name": "id", "type": "str"},
    {"name": "text", "type": "bytes"},
    {"name": "embeddings", "type": "float32s"},
    {"name": "tag", "type": "str"},
    {"name": "title", "type": "bytes"}
]

indexes = [
    {
        "name": "vectorIndex",
        "type": "hnsw",
        "column": "embeddings",
        "params": {"dims": 1536, "metric": "L2"},
    }
]


Create table "documents"...


In [14]:
# get the database connection. Default database name is 'default'
db = session.database('default')

# First ensure the table does not already exist
try:
    db.table("documents").drop()
except kdbai.KDBAIException:
    pass

table = db.create_table("documents", schema=schema, indexes=indexes)

In [9]:
%%time
URL = "https://www.conseil-constitutionnel.fr/node/3850/pdf"
PDF = "Déclaration_des_droits_de_l_homme_et_du_citoyen.pdf"
open(PDF, "wb").write(requests.get(URL).content)

CPU times: user 14.8 ms, sys: 0 ns, total: 14.8 ms
Wall time: 395 ms


562978

## Read a PDF

In [10]:
%%time
print("Read a PDF...")
loader = PyPDFLoader(PDF)
pages = loader.load_and_split()
len(pages)

Read a PDF...
CPU times: user 316 ms, sys: 11.5 ms, total: 328 ms
Wall time: 347 ms


3

## Create a Vector Database from PDF Text

In [15]:
%%time
print("Create a Vector Database from PDF text...")
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
texts = [p.page_content for p in pages]
metadata = pd.DataFrame(index=list(range(len(texts))))
metadata["tag"] = "law"
metadata["title"] = "Déclaration des Droits de l'Homme et du Citoyen de 1789".encode(
    "utf-8"
)
vectordb = KDBAI(table, embeddings)
vectordb.add_texts(texts=texts, metadatas=metadata)

Create a Vector Database from PDF text...
CPU times: user 60.8 ms, sys: 161 μs, total: 61 ms
Wall time: 1.28 s


['bc33c2cd-39bf-40a1-b732-b07a4ef41ee9',
 'ca0fd77c-1473-4a0e-a853-c75184bfbd9d',
 'b18dc68d-965f-4b96-a4ba-b449caf21141']

## Create LangChain Pipeline

In [18]:
%%time
print("Create LangChain Pipeline...")
qabot = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-3.5-turbo-16k", temperature=TEMP),
    retriever=vectordb.as_retriever(search_kwargs=dict(index='vectorIndex', k=K)),
    return_source_documents=True,
)

Create LangChain Pipeline...
CPU times: user 70.3 ms, sys: 0 ns, total: 70.3 ms
Wall time: 68.3 ms


## Summarize the document in English

In [19]:
%%time
Q = "Summarize the document in English:"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



Summarize the document in English:

The document is the Declaration of the Rights of Man and of the Citizen of 1789. It was written by the representatives of the French people and aims to declare the natural, inalienable, and sacred rights of every individual. These rights include freedom, property, security, and resistance to oppression. The document emphasizes the importance of equality and the principle that sovereignty resides in the nation. It also highlights the role of law in protecting individual rights and ensuring the common good. The document asserts the right to freedom of thought, expression, and religion, as long as it does not disturb public order. It emphasizes the need for a public force to guarantee the rights of all citizens and the responsibility of public officials to be accountable to society. The document recognizes the right to private property, but also states that it can be taken away for the public good, with fair compensation. Overall, the document establi

## Query the Data

In [20]:
%%time
Q = "Is it a fair law and why ?"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



Is it a fair law and why ?

As an AI language model, I don't have personal opinions. However, I can provide some analysis based on the given context. The text provided is an excerpt from the Declaration of the Rights of Man and of the Citizen of 1789, which is considered a foundational document in the history of human rights. It outlines the natural and inalienable rights of individuals, such as freedom, property, and resistance to oppression. 

Whether or not this law is considered fair is subjective and can vary depending on individual perspectives and societal norms. However, many people view the principles outlined in this declaration as fundamental to the concept of fairness and justice. It emphasizes the equality of all individuals, the importance of protecting individual rights, and the need for a just and accountable government. 

It is worth noting that this declaration was written in a specific historical and cultural context, and interpretations of fairness may differ acro

In [21]:
%%time
Q = "What are the rights and duties of the man, the citizen and the society ?"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



What are the rights and duties of the man, the citizen and the society ?

According to the Declaration of the Rights of Man and of the Citizen of 1789, the rights and duties of man, citizen, and society are as follows:

Rights of Man:
1. Men are born and remain free and equal in rights. Social distinctions can only be based on common utility.
2. The purpose of political association is to protect the natural and imprescriptible rights of man, which include liberty, property, security, and resistance to oppression.
3. Sovereignty resides essentially in the nation. No body or individual can exercise authority that does not emanate expressly from the nation.
4. Freedom consists of being able to do anything that does not harm others. The exercise of natural rights of each man has no limits other than those that ensure the enjoyment of these same rights by other members of society. These limits can only be determined by law.
5. The law has the right to prohibit only actions harmful to soci

In [22]:
%%time
Q = "Is this law practical ?"
print(f"\n\n{Q}\n")
print(qabot.invoke(dict(query=Q))["result"])



Is this law practical ?

As an AI language model, I cannot provide personal opinions or subjective judgments on whether a law is practical or not. The provided text consists of articles from the French Declaration of the Rights of Man and of the Citizen of 1789, which is a fundamental document outlining the rights and principles of the French legal system. The practicality of a law is often a matter of interpretation and can vary depending on the context and specific circumstances. It is up to legal experts, lawmakers, and society as a whole to assess the practicality and effectiveness of a law.
CPU times: user 36 ms, sys: 0 ns, total: 36 ms
Wall time: 2.04 s


## Clean up the Documents table

In [23]:
# Clean up KDB.AI "documents" table and index for similarity search
# so this notebook could be played again and again
db.table("documents").drop()