# Quickstart: Hello, kdb.ai

How to get started with the kdb.ai vector database. Here, you'll get a quick taste of kdb.ai in  ~10 minutes.

You will:
- Build a vector database, and
- Query it with semantic search.

## 1. Import libraries

In [148]:
import kdbai
import numpy as np
import pandas as pd

## 2. Create a vector index

Create a vector embedding index and data table for storage of raw input data. The command below creates an index named `myHNSW` that performs Hierarchical Navigable Small Worlds (HNSW) for 8-dimensional vectors.

In [162]:
index = kdbai.KDBAI('myHNSW', 'hnsw', ('table', None), 'vectors', options=dict(dims=8))

Setting the table: '.kdbai.table.table' as a global table within this process



## 3. See your indexes

The `list` function will return indexes available in your process. Once your index is created in the previous step, its details appears when running the below command.

In [163]:
kdbai.KDBAI.list()

pykx.Dictionary(pykx.q('
      | name                mdlType algo                                     ..
------| ---------------------------------------------------------------------..
myHNSW| .kdbai.model.myHNSW hnsw    `ptrs`add`search`filter`write`delete!(946..
'))

## 3. Add data

First, generate a vector of 5 8-dimensional vectors. Then add these to a dataframe to match the table and vector column specified during the index creation in Step 1 above.

In [164]:
# Create a NumPy array of 5 eight-dimensional float32 arrays
vectors = np.array([
    [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
    [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
    [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
    [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1],
    [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]
], dtype=np.float32)

df = pd.DataFrame(dict(vectors=list(vectors)))       # df needs to match the table and vector column specified during the index creation
df

Unnamed: 0,vectors
0,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]"
1,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]"
2,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]"
3,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]"
4,"[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]"


Next, add this dummy data into your index, use the `append` operation.

In [165]:
index.append(df)

Updating index with new vector data
Updating in-memory table: '.kdbai.table.table' with new data


## 4. Query the index

Query the index using the `search` operation and get similar vectors. The following code queries the index and calculates the nearest neighbors distance for each.

In [166]:
query = np.array([[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]], dtype=np.float32)

index.search(query).pd()                        # Search similar vectors (5 by default)

Unnamed: 0,vectors,nn_dist,nn_idx
0,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]",0.0,2
1,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]",0.08,1
2,"[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]",0.32,0
3,"[0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1]",0.88,3
4,"[0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 0.1, 0.2]",1.52,4


Search for the 2 closest neighbors.

In [169]:
index.search(query, options=dict(neighbors=2)).pd()  # Search the 2 closest neighbors

Unnamed: 0,vectors,nn_dist,nn_idx
0,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]",0.0,2
1,"[0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]",0.08,1


## 5. Persist the index 

Save the index model, tabular data and metadata using `persist` which allows us to recover from disk for use later on.

In [171]:
index.persist('/tmp/index.kdbai')

Saving metadata associated with model
Persisting embedding model
Persisting tabular data to disk


pykx.Identity(pykx.q('::'))

## 6. Remove index

Now we have persisted we can then use the `remove` operation to delete it.

In [176]:
index.remove()
del index

Deleting the underlying search index
Deleting the backing table for the search index
Deleting all search 
Performing garbage collection on process


<div class="alert alert-block alert-warning">
<b>Warning:</b>If you remove an index without first persisting, you cannot use it again.
</div>

## 7. Load index from disk

When we next want to use this index we can simply retrieve all content associated index using `from_disk`.

In [177]:
index = kdbai.KDBAI.from_disk('myHNSW', '/tmp/index.kdbai')
index.search(query, options=dict(neighbors=1)).pd() 

Unnamed: 0,vectors,nn_dist,nn_idx
0,"[0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]",0.0,2


## Next Steps

Now that you’re successfully making indexes with kdb.ai, you can start inserting data or view more examples.
