# Managing Tables in KDB.AI
[Video Walkthough](https://www.youtube.com/watch?v=XH5iNkcFKXc&list=PLypX5sYuDqvrqsXTw876gGHosCKvK_7QS&index=6)

##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).



How to get started with the KDB.AI vector database. Here, you'll get a quick taste of KDB.AI in  ~10 minutes.

You will learn how to:

1. Connect to KDB.AI
1. Create a KDB.AI Table
1. Add Data to the KDB.AI Table
1. Query the Table
1. Perform Similarity Search
1. Delete the KDB.AI Table

## 0. Setup

### Install dependencies

In order to successfully run this sample, note the following steps depending on where you are running this notebook:

-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.


-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells.

In [None]:
!pip install kdbai_client

### Import Packages

In [None]:
# vector DB
import os
from getpass import getpass
import kdbai_client as kdbai
import time

In [None]:
import numpy as np
import pandas as pd

With the embeddings created, we need to store them in a vector database to enable efficient searching.

### Connect to KDB.AI Server

To use KDB.AI Server, you will need download and run your own container.
To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).

You will receive an email with the required license file and bearer token needed to download your instance.
Follow instructions in the signup email to get your session up and running.

Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint.

In [None]:
#Set up KDB.AI server endpoint 
KDBAI_ENDPOINT = (
    os.environ["KDBAI_ENDPOINT"]
    if "KDBAI_ENDPOINT" in os.environ
    else "http://localhost:8082"
)


#connect to KDB.AI Server, default mode is qipc
session = kdbai.Session(endpoint=KDBAI_ENDPOINT)

In [None]:
database = session.database("default")

<div class="alert alert-block alert-info">
<b>Need help understanding a function?</b><br/>
Add ? before or after any function name in KDB.AI to bring up the documentation for that function along with sample code and arguments.
</div>

In [None]:
?kdbai.Session

### Verify Defined Tables

We can check our connection using the `session.list()` function.
This will return a list of all the tables we have defined in our vector database thus far.
This should return an empty list.

In [None]:
# ensure no table called "data" exists
try:
    database.table("data").drop()
except kdbai.KDBAIException:
    pass

In [None]:
database.tables

[]

## 2. Create a KDB.AI Table

To create a table we can use `create_table`, this function takes two mandatory arguments - the name and schema of the table.

This schema must meet the following criteria:
- It must contain a list of columns.
- All columns must have `type` specified.

If you want to create indexes, you must provide them as separate parameter.
- It must contain a list of index definitions
- Each index must have `name`, `colummn` and `type` attributes. Index-specific parameters can be passed in `params`, it's mandatory for some index types.

Run `?database.create_table` for more details and sample code.

In [None]:
?database.create_table

### Define Schema

Our table will have two columns the first `id` with a list of dummy ID's, the second will be the vector embeddings we will use for similarity search later on in this example.

We will define our dimensionality, similarity metric and index type in the `indexes` parameter. For this example we chose:
- `dims = 8` : In the next section, we generate embeddings that are eight-dimensional to match this. You can chose any value here.
- `metric = L2` : We chose [L2/Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance). Our dummy dataset is low dimensional which Euclidean distance is suitable for. You have the choice of using other metrics here like [IP/Inner Product](https://en.wikipedia.org/wiki/Inner_product_space) and [CS/Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) and the one you chose depends on the specific context and nature of your data.
- `type = flat` : We use a [Flat index](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexFlat.html) here as we have a simple data structure so this is more than adequate. You have the choice of using other indexes like [HNSW](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexHNSW.html) and [IVFPQ](https://faiss.ai/cpp_api/struct/structfaiss_1_1IndexIVFPQ.html) here, as with metrics the one you chose depends your data and your overall performance requirements.

In [None]:
schema = [
    {"name": "id", "type": "str"},
    {"name": "vectors", "type": "float32s"},
]

index_name = "flat_index"
indexes = [{"name": index_name, "column": "vectors", "type": "flat", "params": {"dims": 8, "metric": "L2"}}]

### Create Table

In [None]:
table = database.create_table("data", schema=schema, indexes=indexes)

## 3. Add Data to the KDB.AI Table

First, generate a vector of five 8-dimensional vectors which will be the vector embeddings in this example. We will then add these to pandas dataframe with column names/types matching the target table.

In [None]:
# Create a NumPy array of 5 eight-dimensional float32 arrays
vectors = np.array(
    [
        [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
        [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
        [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
        [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1],
        [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2],
    ],
    dtype=np.float32,
)

In [None]:
# Example ID values
ids = ["h", "e", "l", "l", "o"]

In [None]:
# column names/types matching the schema
embeddings = pd.DataFrame({"id": ids, "vectors": list(vectors)})

In [None]:
embeddings

Unnamed: 0,id,vectors
0,h,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]"
1,e,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]"
2,l,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]"
3,l,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]"
4,o,"[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"


We can now add data to our KDB.AI table using `insert`.

In [None]:
table.insert(embeddings)

## 4. Query the Table

We can use `query` to query data from the table.

In [None]:
table.query()

Unnamed: 0,id,vectors
0,h,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]"
1,e,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]"
2,l,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]"
3,l,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]"
4,o,"[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"


## 5. Perform Similarity Search

Finally, let's perform similarity search on the table. We do this using the `search` function.

In [None]:
?table.search

<div class="alert alert-block alert-warning">
<b>Note:</b> The dimension of input query vectors must match the vector embedding dimensions in the table, defined in schema above.
</div>

In [None]:
# Find the closest neighbor of a single query vector
table.search(vectors={index_name: [[0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]]}, n=1)

[  id                                   vectors  __nn_distance
 0  e  [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]           0.01]

<div class="alert alert-block alert-warning">
<b>Note:</b> The output was a list of length one, matching the number of vectors we input to the search. This can be indexed on position [0] to extract the dataframe corresponding to the single input vector.
</div>

## 6. Delete the KDB.AI Table

We can use `table.drop()` to delete a table.

In [None]:
table.drop()

True

<div class="alert alert-block alert-warning">
<b>Warning:</b> Once you drop a table, you cannot use it again.
</div>