# Quickstart: Hello, KDB.AI

How to get started with the KDB.AI vector database. Here, you'll get a quick taste of KDB.AI in  ~10 minutes.

You will learn how to:

1. Connect to KDB.AI
1. Create a KDB.AI Table
1. Add Data to the KDB.AI Table
1. Query the Table
1. Perform Similarity Search
1. Delete the KDB.AI Table

## 0. Setup

### Import Packages

In [1]:
# vector DB
import os
from getpass import getpass
import kdbai_client as kdbai
import time

In [2]:
import numpy as np
import pandas as pd

## 1. Connect to KDB.AI

KDB.AI comes in two offerings:

1. [KDB.AI Cloud](https://trykdb.kx.com/kdbai/signup/) - For experimenting with smaller generative AI projects with a vector database in our cloud.
2. [KDB.AI Server](https://trykdb.kx.com/kdbaiserver/signup/) - For evaluating large scale generative AI applications on-premises or on your own cloud provider.

Depending on which you use there will be different setup steps and connection details required.

### KDB.AI Cloud

To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key. To get these you can sign up for free [here](https://trykdb.kx.com/kdbai/signup).

You can connect to a KDB.AI Cloud session using `kdbai.Session` and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables `KDBAI_ENDPOINTS` and `KDBAI_API_KEY` exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect.
If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

In [3]:
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else input("KDB.AI endpoint: ")
)
KDBAI_API_KEY = (
    os.environ["KDBAI_API_KEY"]
    if "KDBAI_API_KEY" in os.environ
    else getpass("KDB.AI API key: ")
)

In [4]:
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

### KDB.AI Server

To use KDB.AI Server, you will need download and run your own container. To do this you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/). 

You will receive an email with the required license file and bearer token needed to download your instance. Follow instructions in the signup email to get your session up and running.

Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.

In [None]:
session = kdbai.Session(endpoint='http://localhost:8082')

<div class="alert alert-block alert-info">
<b>Need help understanding a function?</b><br/>
Add ? before or after any function name in KDB.AI to bring up the documentation for that function along with sample code and arguments.
</div>

In [5]:
?kdbai.Session

We can check our connection using the `session.list()` function.
This will return a list of all the tables we have defined in our vector database thus far.
This should return an empty list.

In [6]:
# ensure no table called "data" exists
try:
    session.table("data").drop()
    time.sleep(5)
except kdbai.KDBAIException:
    pass

In [7]:
session.list()

[]

## 2. Create a KDB.AI Table

To create a table we can use `create_table`, this function takes two arguments - the name and schema of the table.

This schema must meet the following criteria:
- It must contain a list of columns. 
- All columns must have either a `pytype` or a `qtype` specified, except the column of vectors.
- One column of vector embeddings may also have a `vectorIndex` attribute with the configuration of the index for similarity search - this column is implicitly an array of `float32`.

Run `?session.create_table` for more details and sample code.

In [8]:
?session.create_table

### Define Schema 

Our table will have two columns the first `id` with a list of dummy ID's, the second will be the vector embeddings we will use for similarity search later on in this example. 

We will define our dimensionality, similarity metric and index type with the `vectorIndex` attribute. For this example we chose:
- `dims = 8` : In the next section, we generate embeddings that are eight-dimensional to match this. You can chose any value here.
- `metric = L2` : We chose [L2/Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance). Our dummy dataset is low dimensional which Euclidean distance is suitable for. You have the choice of using other metrics here like [IP/Inner Product](https://en.wikipedia.org/wiki/Inner_product_space) and [CS/Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) and the one you chose depends on the specific context and nature of your data.
- `type = flat` : We use a [Flat index](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexFlat.html) here as we have a simple data structure so this is more than adequate. You have the choice of using other indexes like [HNSW](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSW.html) and [IVFPQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexIVFPQ.html) here, as with metrics the one you chose depends your data and your overall performance requirements.


In [9]:
schema = {
    "columns": [
        {"name": "id", "pytype": "str"},
        {"name": "vectors", "vectorIndex": {"dims": 8, "metric": "L2", "type": "flat"}},
    ]
}

### Create Table

In [10]:
table = session.create_table("data", schema)

## 3. Add Data to the KDB.AI Table

First, generate a vector of five 8-dimensional vectors which will be the vector embeddings in this example. We will then add these to pandas dataframe with column names/types matching the target table.

In [11]:
# Create a NumPy array of 5 eight-dimensional float32 arrays
vectors = np.array(
    [
        [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
        [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
        [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
        [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1],
        [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2],
    ],
    dtype=np.float32,
)

In [12]:
# Example ID values
ids = ["h", "e", "l", "l", "o"]

In [13]:
# column names/types matching the schema
embeddings = pd.DataFrame({"id": ids, "vectors": list(vectors)})

In [14]:
embeddings

Unnamed: 0,id,vectors
0,h,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]"
1,e,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]"
2,l,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]"
3,l,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]"
4,o,"[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"


We can now add data to our KDB.AI table using `insert`.

In [None]:
table.insert(embeddings)

## 4. Query the Table

We can use `query` to query data from the table.

In [16]:
table.query()

Unnamed: 0,id,vectors
0,h,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]"
1,e,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]"
2,l,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]"
3,l,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]"
4,o,"[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"


The `query` function accepts a wide range of arguments to make it easy to filter, aggregate, and sort.
Run `?table.query` to see them all.

Let's filter for ids equal to 'l' only. Two rows are returned as expected.

In [17]:
table.query(filter=[("like", "id", "l")])

Unnamed: 0,id,vectors
0,l,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]"
1,l,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]"


## 5. Perform Similarity Search

Finally, let's perform similarity search on the table. We do this using the `search` function.

In [18]:
?table.search

<div class="alert alert-block alert-warning">
<b>Note:</b> The dimension of input query vectors must match the vector embedding dimensions in the table, defined in schema above.
</div>

In [19]:
# Find the closest neighbor of a single query vector
table.search(vectors=[[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]], n=1)

[  id                                   vectors  __nn_distance
 0  e  [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]           0.01]

<div class="alert alert-block alert-warning">
<b>Note:</b> The output was a list of length one, matching the number of vectors we input to the search. This can be indexed on position [0] to extract the dataframe corresponding to the single input vector.
</div>

The closest matching neighbor for the query vector passed in is returned alongside the calculation of L2 ([Euclidean Distance](#https://en.wikipedia.org/wiki/Euclidean_distance)) similarity.

We can also rerun the same query for more neighbors.

In [20]:
# Find 3 closest neighbours of a single query vector
table.search(vectors=[[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]], n=3)[0]

Unnamed: 0,id,vectors,__nn_distance
0,e,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]",0.01
1,h,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]",0.07
2,l,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]",0.11


And we can apply a filter to the search results.

In [21]:
# Find 3 closest neighbours of a single query vector
table.search(
    vectors=[[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]],
    n=3,
    filter=[("like", "id", "l")],
)[0]

Unnamed: 0,id,vectors,__nn_distance
0,l,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]",0.11
1,l,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]",0.9699999
2,,[],3.4028230000000003e+38


And also we can search passing more than one query vector.

In [22]:
# Find the 3 closest neighbours of 2 query vectors
table.search(
    vectors=[
        [0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
        [0.3, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
    ],
    n=3,
)

[  id                                   vectors  __nn_distance
 0  e  [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]           0.01
 1  h  [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]           0.07
 2  l  [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]           0.11,
   id                                   vectors  __nn_distance
 0  h  [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]           0.04
 1  e  [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]           0.08
 2  l  [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]           0.28]

## 6. Delete the KDB.AI Table

We can use `table.drop()` to delete a table. 

In [23]:
table.drop()

True

<div class="alert alert-block alert-warning">
<b>Warning:</b> Once you drop a table, you cannot use it again.
</div>

## Next Steps

Now that you’re successfully making indexes with KDB.AI, you can start inserting your own data or view more examples:
- [PDF Document Search](../document_search)
- [MRI Image Search](../image_search)
- [Music Recommendation System](../music_recommendation)
- [Sensor Pattern Matching](../pattern_matching)
- [Retrieval Augmented Generation with LangChain](../retrieval_augmented_generation)
- [Sentiment Analysis of Reviews](../sentiment_analysis)