## FastAPI Tutorial

This requires uvicorn and fastapi to be installed by running

`pip install fastapi uvicorn`

In order to start the FastAPI, open up a terminal and run the following command (This must be done from the root directory of this project):

`uvicorn api.fastapi:app --host 0.0.0.0 --port 8000`

### Setup the environment

Load in the necessary packages and append the paths needed

In [1]:
import requests
import pickle
import sys
import os

# Load in spDB from the local directory
current_dir = os.getcwd()
sys.path.append(current_dir + "/../")
sys.path.append(current_dir + "/../tests/integration/")

from spdb.spdb import spDB, load_db
import helpers

In [None]:
api_key_id = "cc99b87f866350b89fe9dbdde8db421f"
api_key_secret = "VDyOlszmGvWTS8clU9IPyTiVqMlPRTnjUp5cAePGLR0"

import requests
import base64

# Define the token
authToken = base64.b64encode((f"{api_key_id}:{api_key_secret}").encode("utf-8")).decode("utf-8")

# Define the payload and make the query
payload = {
    "knowledge_base_ids": ["cff59808-3184-4d2d-9b09-0ff757402daa"],
    "query": "What are some financial modeling tools we can build for startups?",
    "summarize_results": True
}
response = requests.post(
    "https://api.superpowered-qa.com/v1/knowledge_bases/query",
    headers={"Authorization": f"Bearer {authToken}"},
    json=payload
)

print(response.json())

In [None]:
import superpowered
import os

# Initialize API key and secret
os.environ["SUPERPOWERED_API_KEY_ID"]="INSERT_API_KEY_ID_HERE"
os.environ["SUPERPOWERED_API_KEY_SECRET"]="INSERT_API_KEY_SECRET_HERE"

# Make the request for the desired chat thread and message
response = superpowered.get_chat_response(
    thread_id="a3797a86-427a-4e72-969c-704efaa22f80",
    input="What are some good resources to help find trail conditions?",
)

# Print the response
print (response["interaction"]["model_response"]["content"])


In [6]:
import requests
import base64

# Define the token
api_key_id="INSERT_API_KEY_ID_HERE"
api_key_secret="INSERT_API_KEY_SECRET_HERE"
authToken = base64.b64encode((f"{api_key_id}:{api_key_secret}").encode("utf-8")).decode("utf-8")

# Define the payload and make the query
payload = {
    "input": "What are some good resources to help find trail conditions?",
}
response = requests.post(
    "https://api.superpowered-qa.com/v1/chat/threads/a3797a86-427a-4e72-969c-704efaa22f80/get_response",
    headers={"Authorization": f"Bearer {authToken}"},
    json=payload
)

print(response.json()["interaction"]["model_response"]["content"])


There are several resources you can use to find trail conditions:

1. Local Hiking Websites: Many regions have dedicated hiking websites or forums where hikers share information about trail conditions. These websites often have up-to-date reports from fellow hikers who have recently visited the trails.

2. National Park or Forest Websites: If you're planning to hike in a national park or forest, their official websites usually provide information on trail conditions. They may have trail status updates, closures, or alerts due to weather, maintenance, or wildlife activity.

3. Trail Apps: There are numerous trail apps available for smartphones that provide information on trail conditions. These apps often include user-generated reports, trail maps, and other helpful features. Some popular trail apps include AllTrails, Hiking Project, and Gaia GPS.

4. Social Media: Following local hiking groups or organizations on social media platforms like Facebook or Instagram can be a great way to s

In [None]:
# Load in the Fiqa test data
vectors, text, queries, _ = helpers.fiqa_test_data()
with open(current_dir + "/../tests/data/fiqa_queries_text.pickle", "rb") as f:
    query_text = pickle.load(f)
# Vectors needs to be a list when using FastAPI
vectors = vectors.tolist()


### Create the spDB object

In [None]:
# Create a new spDB

db_name = "fast_api_test"
url = "http://0.0.0.0:8000/db/create"
response = requests.post(url, json={"name": db_name})
print (response.text)

In [None]:
db_name = "fiqa_test_1"
url = f"http://0.0.0.0:8000/db/{db_name}/info"
response = requests.get(url)
print (response.json())

In [None]:
# Remove a few vectors
ids = [10, 11, 12, 13]
url = f"http://0.0.0.0:8000/db/{db_name}/remove"
response = requests.post(url, json={"ids": ids})
print (response.text)

### Add data to the spDB object

Adding data to the spDB object using FastAPI must be done in batches. We recommend using a batch size of ~100. Pushing this number too high will result in a failure

The data must also be a list. Numpy arrays are not a valid data type for FastAPI

In [None]:
# Add the data to the spDB in batches of 100
batch_size = 1000
data = [(vectors[i], {"text": text[i]}) for i in range(len(vectors))]

url = f"http://0.0.0.0:8000/db/{db_name}/add"

for i in range(0, 10000, batch_size):
    print (i)
    add_data = data[i:i+batch_size]
    response = requests.post(url, json={"add_data": add_data})

### Train the spDB object

For this example, we are using PCA 256, compressed vector bytes of 32, and omitting OPQ

For more information on these parameters, you can visit the Github Wiki [here](https://github.com/SuperpoweredAI/spDB/wiki/Tunable-parameters)

In [None]:
# Train the spDB

url = f"http://0.0.0.0:8000/db/{db_name}/train"
response = requests.post(url, json={
    "use_two_level_clustering": False,
    "pca_dimension": 256,
    "compressed_vector_bytes": 32,
    "omit_opq": True
})
print (response.text)

### Query the trained index

Make a test query using the `query` endpoint. The query vector must be converted to a list

In [None]:
url = f"http://0.0.0.0:8000/db/{db_name}/query"
query_vector = queries[0].tolist()
response = requests.post(url, json={"query_vector": query_vector})

print ("Query text:", query_text[0])
print ("")
print (response.json()["metadata"][0]["text"])

In [33]:
import os
import sys
import numpy as np
import pickle

# Load in spDB from the local directory
current_dir = os.getcwd()
sys.path.append(current_dir + "/../")
sys.path.append(current_dir + "/../tests/integration/")

from spdb.spdb import spDB, load_db

db_name = "memory_test"

In [36]:
import requests
import numpy as np
import random
import string

def generate_random_vectors_with_text(N, D):
    random_vectors = np.random.rand(N, D).astype(np.float32) 
    random_text = [''.join(random.choices(string.ascii_lowercase, k=D)) for _ in range(N)]
    return random_vectors, random_text

# Specify the number of random vectors (N) and the dimensionality (D)
N = 30000  # Number of random vectors
D = 1024  # Dimensionality of each vector

# Generate N random vectors with D dimensions and random text strings
random_vectors, random_text = generate_random_vectors_with_text(N, D)


In [32]:
print (type(random_vectors[0][0]))

<class 'numpy.float32'>


In [37]:
url = "http://0.0.0.0:8000/db/create"
response = requests.post(url, json={"name": db_name})
print (response.text)

ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8000): Max retries exceeded with url: /db/create (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc10c914a30>: Failed to establish a new connection: [Errno 61] Connection refused'))

In [25]:
add_data = [(random_vectors[i], {"text": random_text[i]}) for i in range(len(random_vectors))]
db.add(add_data)



range(30000, 60000)

In [27]:
# Train the spDB
db.train(True, pca_dimension=256, compressed_vector_bytes=64, omit_opq=True)

04-18 08:48:13 spdb.spdb:239 in train() INFO     Training the Faiss index
04-18 08:48:13 spdb.spdb:251 in train() INFO     pca_dimension: 256
04-18 08:48:13 spdb.spdb:252 in train() INFO     opq_dimension: 256
04-18 08:48:13 spdb.spdb:253 in train() INFO     compressed_vector_bytes: 64
04-18 08:48:13 spdb.spdb:254 in train() INFO     omit_opq: True
04-18 08:48:13 spdb.spdb:286 in train() INFO     Training with two-level clustering
04-18 08:48:13 spdb.train:25 in train_with_two_level_clustering() INFO     index_factory_parameter_string: PCA256,IVF766,PQ64
04-18 08:48:14 spdb.two_level_clustering:129 in handle_pre_transforms() INFO     time taken to get random vectors inside handle_pre_transforms: 0.574617862701416
04-18 08:48:14 spdb.two_level_clustering:131 in handle_pre_transforms() INFO     number of vector transforms in chain: 1
04-18 08:48:16 spdb.two_level_clustering:138 in handle_pre_transforms() INFO     Time taken to train and apply vector transform 0: 1.7126641273498535
04-18 