# BabyDragon Indexes

The `indexes` submodule of the BabyDragon package provides different indexing
and searching strategies for various data types.
The main class in this
submodule is  `MemoryIndex` class, a wrapper for a Faiss index that simplifies managing the index and associated data. It supports creating an index from scratch, loading an index from a file, or initializing from a pandas DataFrame. The class also provides methods for adding and removing items from the index, querying the index, saving and loading the index, and pruning the index based on certain constraints.

##  Table of Contents

1. [MemoryIndex](#usage)
   - [Initializing a MemoryIndex](#initializing-a-memoryindex)
   - [Adding and Removing Items](#adding-and-removing-items)
   - [Querying the Index](#querying-the-index)
   - [Saving and Loading](#saving-and-loading)
   - [Pruning the Index](#pruning-the-index)
   - [Multithreading](#multithreading)
2. [Examples](#examples)

## Usage

### Initializing a MemoryIndex

A `MemoryIndex` object can be initialized in several ways:
1. Create a new empty index from scratch:




In [1]:
from babydragon.memory.indexes.memory_index import MemoryIndex

index = MemoryIndex()

Creating a new index


Before adding values we need to specify an api key 

In [3]:
import openai
openai.api_key = "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"

2. Create a new index from a list of values using the default ada02 embedder:

In [4]:
values = ["apple", "banana", "cherry"]

index = MemoryIndex(values=values)

Creating a new index from a list of values
Embedding value  0  of  3
Embedding value  0  took  0.23536920547485352  seconds
Embedding value  1  of  3
Embedding value  1  took  0.2901902198791504  seconds
Embedding value  2  of  3
Embedding value  2  took  0.45903921127319336  seconds


We can now search the index using the underlying faiss index by calling the `faiss_query` method

In [6]:
results, scores, indeces = index.faiss_query("apple", k=3)
for result, score in zip(results, scores):
    print(result, score)

apple 0.9999996
banana 0.90331817
cherry 0.84615296


3. Create a new index from a list of values and their embeddings:

In [12]:
from babydragon.models.embedders.ada2 import OpenAiEmbedder
embedder = OpenAiEmbedder()

embeddings = []
for value in values:
    embeddings.append(embedder.embed(value))

index = MemoryIndex(name="precomputed_index",values=values, embeddings=embeddings)

results, scores, indeces = index.faiss_query("apple", k=3)
for result, score in zip(results, scores):
    print(result, score)

Creating a new index from a list of embeddings and values
apple 0.9999981
banana 0.90342283
cherry 0.8461177


5. Load an existing index from a file:

In [13]:
index = MemoryIndex(load=True, name = "precomputed_index")

Loading index from storage\precomputed_index


6. Initialize a MemoryIndex object from a pandas DataFrame:

In [19]:
import pandas as pd

data_frame = pd.DataFrame({
    "values": values,
    "embeddings": embeddings  # list of embeddings corresponding to the values
})

index = MemoryIndex.from_pandas(data_frame=data_frame, columns="values", embeddings_col="embeddings")

results, scores, indeces = index.faiss_query("apple", k=3)
for result, score in zip(results, scores):
    print(result, score)

Loading the DataFrame
Creating a new index from a list of embeddings and values
apple 0.9999981
banana 0.90342283
cherry 0.8461177



### Adding and Removing Items
You can add items to the index by calling the add_to_index method:

In [22]:
index.add_to_index(value="orange")

results, scores, indeces = index.faiss_query("apple", k=4)
for result, score in zip(results, scores):
    print(result, score)

apple 0.99999803
banana 0.90340215
orange 0.8651828
cherry 0.84610254


You can also remove items from the index by calling the remove_from_index method:

In [24]:
index.remove_from_index(value="banana")

results, scores, indeces = index.faiss_query("apple", k=4)
for result, score in zip(results, scores):
    print(result, score)

The value 'banana' was not found in the index.
apple 0.9999996
orange 0.8652205
cherry 0.8461285


### Querying the Index
To query the index, use the faiss_query or token_bound_query methods:

In [37]:
# Query the top-5 most similar values with a maximum tokens constraint

results, scores, indices = index.token_bound_query(query="fruit", k=5, max_tokens=4)
for result, score in zip(results, scores):
    print(result, score)

apple 0.90448993
orange 0.877166
cherry 0.8606403


### Saving and Loading
After modifying the index you can save the index to disk by calling the save method, the location and name of the file is controlled by the `index.save_path` and `index.name` attributes:

In [None]:
index.name="precomputed_index"
index.save()

You can load an index from a file by calling the load method:

In [None]:
index = MemoryIndex(load=True, name= "precomputed_index")

### Pruning the Index
To prune the index based on certain constraints, use the prune method:

In [None]:
index.prune(max_tokens=3500)

### Multithreading
In order to enable multi-threading for speeding up the embedding process, you can set the `max_workers parameter to a value bigger than 1:

In [None]:
index = MemoryIndex(values=myvalues,max_workers=8)