# KDB.AI

> [KDB.AI](https://kdb.ai/) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.

[This example](https://github.com/KxSystems/kdbai-samples/blob/main/document_search/document_search.ipynb) demonstrates how to use KDB.AI to run semantic search on unstructured text documents.

To access your end point and API keys, sign up to KDB.AI [here](https://kdb.ai/get-started/).

To set up your development environment, follow the instructions on the [KDB.AI pre-requisites page](https://code.kx.com/kdbai/pre-requisites.html).

The following examples demonstrate some of the ways you can interact with KDB.AI through LangChain.

## Import required packages

In [1]:
from getpass import getpass
import os
import time

import pandas as pd
import requests

import kdbai_client as kdbai

from langchain.chains import RetrievalQA
from langchain_community.chat_models import ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import KDBAI

In [2]:
KDBAI_ENDPOINT = input('KDB.AI endpoint: ')
KDBAI_API_KEY = getpass('KDB.AI API key: ')
os.environ['OPENAI_API_KEY'] = getpass('OpenAI API Key: ')

KDB.AI endpoint:  https://ui.qa.cld.kx.com/instance/4eaejaib7q
KDB.AI API key:  ········
OpenAI API Key:  ········


In [3]:
TEMP = 0.0
K = 3

## Create a KBD.AI Session

In [None]:
print('Create a KDB.AI session...')
session = kdbai.Session(endpoint=KDBAI_ENDPOINT, api_key=KDBAI_API_KEY)

Create KDB.AI session...


## Create a table

In [5]:
print('Create table "documents"...')
schema = {'columns': [{'name': 'id', 'pytype': 'str'},
                      {'name': 'text', 'pytype': 'bytes'},
                      {'name': 'embeddings',
                               'pytype': 'float32',
                               'vectorIndex': {'dims': 1536, 'metric': 'L2', 'type': 'hnsw'}},
                      {'name': 'tag', 'pytype': 'str'},
                      {'name': 'title', 'pytype': 'bytes'}]}
table = session.create_table('documents', schema)

Create table "documents"...


In [6]:
%%time
URL = 'https://www.conseil-constitutionnel.fr/node/3850/pdf'
PDF = 'Déclaration_des_droits_de_l_homme_et_du_citoyen.pdf'
open(PDF, 'wb').write(requests.get(URL).content)

CPU times: user 44.2 ms, sys: 6.71 ms, total: 50.9 ms
Wall time: 233 ms


562978

## Read a PDF

In [7]:
%%time
print('Read a PDF...')
loader = PyPDFLoader(PDF)
pages = loader.load_and_split()
len(pages)

Read PDF...
CPU times: user 110 ms, sys: 9.14 ms, total: 119 ms
Wall time: 118 ms


3

## Create a Vector Database from PDF Text

In [1]:
%%time
print('Create a Vector Database from PDF Text...')
embeddings = OpenAIEmbeddings(model='text-embedding-ada-002')
texts = [p.page_content for p in pages]
metadata = pd.DataFrame(index=list(range(len(texts))))
metadata['tag'] = 'law'
metadata['title'] = 'Déclaration des Droits de l\'Homme et du Citoyen de 1789'.encode('utf-8')
vectordb = KDBAI(table, embeddings)
vectordb.add_texts(texts=texts, metadatas=metadata)

Create a Vector Database from PDF Text...


NameError: name 'OpenAIEmbeddings' is not defined

## Create LangChain Pipeline

In [None]:
%%time
print('Create LangChain Pipeline...')
qabot = RetrievalQA.from_chain_type(chain_type='stuff',
                                    llm=ChatOpenAI(model='gpt-3.5-turbo-16k', temperature=TEMP), 
                                    retriever=vectordb.as_retriever(search_kwargs=dict(k=K)),
                                    return_source_documents=True)

Create LangChain pipeline...
CPU times: user 1.32 ms, sys: 62 µs, total: 1.38 ms
Wall time: 1.36 ms


## Summarize the document in English

In [None]:
%%time
Q = 'Summarize the document in English:'
print(f'\n\n{Q}\n')
print(qabot(dict(query=Q))['result'])



Summarize the document in English:

The document is the Declaration of the Rights of Man and of the Citizen of 1789. It was written by the representatives of the French people and aims to declare the natural, inalienable, and sacred rights of all individuals. These rights include freedom, property, security, and resistance to oppression. The document emphasizes that all individuals are born and remain free and equal in rights, and that social distinctions should only be based on the common good. It also states that sovereignty resides in the nation and that no one can exercise authority without explicit authorization. The document highlights the importance of law in protecting individual rights and states that the law should only prohibit actions that are harmful to society. It also emphasizes the principle of equality before the law and the right of all citizens to participate in its formation. The document further establishes the rights of individuals in relation to arrest and dete

## Query the Data

In [None]:
%%time
Q = 'Is it a fair law and why ?'
print(f'\n\n{Q}\n')
print(qabot(dict(query=Q))['result'])



Is it a fair law and why ?

As an AI language model, I don't have personal opinions or the ability to determine whether a law is fair or not. The concept of fairness is subjective and can vary depending on individual perspectives and societal values. It is up to individuals, legal experts, and society as a whole to debate and determine the fairness of a law based on various factors such as its adherence to human rights principles, its impact on different groups, and its alignment with societal values and goals.
CPU times: user 62.1 ms, sys: 6.02 ms, total: 68.1 ms
Wall time: 3.32 s


In [None]:
%%time
Q = 'What are the rights and duties of the man, the citizen and the society ?'
print(f'\n\n{Q}\n')
print(qabot(dict(query=Q))['result'])



What are the rights and duties of the man, the citizen and the society ?

According to the Declaration of the Rights of Man and of the Citizen of 1789, the rights and duties of man, citizen, and society are as follows:

Rights of Man:
1. Men are born and remain free and equal in rights. Social distinctions can only be based on common utility.
2. The purpose of political association is the preservation of the natural and imprescriptible rights of man, which are liberty, property, security, and resistance to oppression.
3. The principle of sovereignty resides essentially in the nation. No body or individual can exercise any authority that does not emanate expressly from the nation.
4. Liberty consists of being able to do anything that does not harm others. The exercise of the natural rights of each man has no limits other than those that ensure the enjoyment of these same rights by other members of society. These limits can only be determined by law.
5. The law has the right to prohibi

In [None]:
%%time
Q = 'Is this law practical ?'
print(f'\n\n{Q}\n')
print(qabot(dict(query=Q))['result'])



Is this law practical ?

As an AI language model, I don't have personal opinions. Whether a law is practical or not is subjective and can vary depending on various factors such as the context, societal norms, and individual perspectives. The practicality of a law is often a matter of debate and can be influenced by its effectiveness in achieving its intended goals, its impact on individuals and society, and its compatibility with other laws and principles. It is ultimately up to legal experts, policymakers, and society as a whole to assess and determine the practicality of a law.
CPU times: user 55.5 ms, sys: 5.46 ms, total: 61 ms
Wall time: 3.51 s


## Clean up the Documents table

In [None]:
# Clean up KDB.AI "documents" table and index for similarity search
# so this notebook could be played again and again
session.table('documents').drop()

True