# StackAI Vector Database Exploration

Interactive notebook for testing the StackAI vector database API.

**Setup (Local Development):**
1. Start the server: `make start`
2. Seed test data: `python scripts/seed_data.py --library all`
3. Run the cells below

**Setup (Docker):**
1. Start the container: `docker compose up --build`
2. Seed test data: `docker compose exec api python scripts/seed_data.py --library all`
3. Run the cells below

The API is available at `http://localhost:8000` in both cases.

In [1]:
import httpx

BASE_URL = "http://localhost:8000/api/v1"
client = httpx.Client(base_url=BASE_URL, timeout=30.0)

# Helper functions

def search(query: str, library: str = "recipes_lib", k: int = 3):
    """Search a library and display formatted results."""
    response = client.post(f"/libraries/{library}/search", json={"query": query, "k": k})
    if response.status_code != 200:
        print(f"Error: {response.status_code} - {response.text}")
        return
    
    data = response.json()
    print(f'Query: "{data["query"]}"')
    print(f"Library: {library}")
    print(f"Results: {data['result_count']}")
    print("-" * 60)
    
    for i, result in enumerate(data["results"], 1):
        chunk = result["chunk"]
        print(f"\n[{i}] Score: {result['score']:.4f}")
        print(f"    Doc: {chunk['document_id']}")
        print(f"    Text: {chunk['text']}")


def list_libraries():
    """List all libraries."""
    response = client.get("/libraries")
    libraries = response.json()
    print(f"Libraries ({len(libraries)}):")
    for lib in libraries:
        print(f"  - {lib['id']}: {lib['name']}")


def list_documents(library: str):
    """List documents in a library."""
    response = client.get(f"/libraries/{library}/documents")
    if response.status_code != 200:
        print(f"Error: {response.text}")
        return
    docs = response.json()
    print(f"Documents in {library} ({len(docs)}):")
    for doc in docs:
        print(f"  - {doc['id']}: {doc['name']}")


def list_chunks(document: str, show_text: bool = True):
    """List chunks in a document."""
    response = client.get(f"/documents/{document}/chunks")
    if response.status_code != 200:
        print(f"Error: {response.text}")
        return
    chunks = response.json()
    print(f"Chunks in {document} ({len(chunks)}):")
    for chunk in chunks:
        if show_text:
            print(f"  [{chunk['id']}] {chunk['text'][:80]}..." if len(chunk['text']) > 80 else f"  [{chunk['id']}] {chunk['text']}")
        else:
            print(f"  - {chunk['id']}")


def health_check():
    """Check if the server is running."""
    try:
        response = httpx.get("http://localhost:8000/health")
        if response.status_code == 200:
            print("Server is running")
        else:
            print(f"Server returned: {response.status_code}")
    except httpx.ConnectError:
        print("Cannot connect to server. Start it with: make start")


print("Helper functions loaded: search(), list_libraries(), list_documents(), list_chunks(), health_check()")

Helper functions loaded: search(), list_libraries(), list_documents(), list_chunks(), health_check()


## Quick Start

In [2]:
health_check()

Server is running


In [3]:
list_libraries()

Libraries (3):
  - recipes_lib: Recipe Collection
  - support_lib: Support Knowledge Base
  - products_lib: Product Manuals


## Search

Available libraries (after seeding):
- `recipes_lib` - Cooking recipes
- `support_lib` - Support knowledge base
- `products_lib` - Product manuals

In [6]:
search("How do I make a creamy pasta sauce?", library="recipes_lib")

Query: "How do I make a creamy pasta sauce?"
Library: recipes_lib
Results: 3
------------------------------------------------------------

[1] Score: 0.7248
    Doc: recipes_spaghetti_carbonara
    Text: Stir in the egg mixture quickly, adding pasta water as needed to create a creamy sauce.

[2] Score: 0.5473
    Doc: recipes_spaghetti_carbonara
    Text: Toss the hot pasta with the guanciale and rendered fat, then remove from heat.

[3] Score: 0.5315
    Doc: recipes_thai_green_curry
    Text: Heat a wok over high heat and add a splash of the thick coconut cream from the top of the can.


In [5]:
search("How do I reset my password?", library="support_lib")

Query: "How do I reset my password?"
Library: support_lib
Results: 3
------------------------------------------------------------

[1] Score: 0.7047
    Doc: support_account_login
    Text: To reset your password, click 'Forgot Password' on the login page.

[2] Score: 0.4629
    Doc: support_account_login
    Text: To change your email address, verify your identity with your current password first.

[3] Score: 0.4598
    Doc: support_account_login
    Text: If you lose access to your 2FA device, use one of your backup codes to log in.


In [5]:
search("bluetooth pairing", library="products_lib", k=5)

Query: "bluetooth pairing"
Library: products_lib
Results: 5
------------------------------------------------------------

[1] Score: 0.6546
    Doc: products_mechanical_keyboard
    Text: To pair Bluetooth, press Fn+1, Fn+2, or Fn+3 to select a device slot.

[2] Score: 0.6068
    Doc: products_wireless_headphones
    Text: To pair with Bluetooth, hold the power button for 7 seconds until the LED flashes blue.

[3] Score: 0.5037
    Doc: products_mechanical_keyboard
    Text: The keyboard can remember up to 3 Bluetooth devices and switch between them instantly.

[4] Score: 0.4495
    Doc: products_mechanical_keyboard
    Text: Connect the keyboard using the detachable USB-C cable or via Bluetooth.

[5] Score: 0.4443
    Doc: products_wireless_headphones
    Text: The headphones will appear as 'WH-1000' in your device's Bluetooth settings.


In [7]:
# Try your own query
search("chicken curry recipe", library="recipes_lib", k=5)

Query: "chicken curry recipe"
Library: recipes_lib
Results: 5
------------------------------------------------------------

[1] Score: 0.6314
    Doc: recipes_thai_green_curry
    Text: Gather 400ml coconut milk, 2 tbsp green curry paste, 500g chicken thighs, Thai basil, and fish sauce.

[2] Score: 0.5733
    Doc: recipes_thai_green_curry
    Text: Fry the curry paste in the coconut cream for 2 minutes until fragrant.

[3] Score: 0.5249
    Doc: recipes_chicken_tikka_masala
    Text: For the marinade, combine 500g chicken breast with yogurt, garam masala, turmeric, and lemon juice.

[4] Score: 0.5208
    Doc: recipes_thai_green_curry
    Text: Thai Green Curry is a fragrant coconut-based curry from central Thailand.

[5] Score: 0.5156
    Doc: recipes_chicken_tikka_masala
    Text: Chicken Tikka Masala features tender marinated chicken in a creamy tomato-based sauce.


## Browse Data

In [8]:
list_documents("recipes_lib")

Documents in recipes_lib (5):
  - recipes_spaghetti_carbonara: Spaghetti Carbonara
  - recipes_thai_green_curry: Thai Green Curry
  - recipes_chocolate_chip_cookies: Chocolate Chip Cookies
  - recipes_chicken_tikka_masala: Chicken Tikka Masala
  - recipes_french_onion_soup: French Onion Soup


In [9]:
list_chunks("recipes_spaghetti_carbonara")

Chunks in recipes_spaghetti_carbonara (10):
  [recipes_spaghetti_carbonara_chunk_1] Spaghetti Carbonara is a classic Roman pasta dish made with eggs, cheese, and cu...
  [recipes_spaghetti_carbonara_chunk_2] You'll need 400g spaghetti, 200g guanciale, 4 egg yolks, 100g pecorino romano, a...
  [recipes_spaghetti_carbonara_chunk_3] Bring a large pot of salted water to boil and cook the spaghetti until al dente.
  [recipes_spaghetti_carbonara_chunk_4] Cut the guanciale into small strips and crisp it in a dry pan over medium heat.
  [recipes_spaghetti_carbonara_chunk_5] Whisk together the egg yolks, grated pecorino, and plenty of black pepper in a b...
  [recipes_spaghetti_carbonara_chunk_6] Reserve a cup of pasta water before draining the cooked spaghetti.
  [recipes_spaghetti_carbonara_chunk_7] Toss the hot pasta with the guanciale and rendered fat, then remove from heat.
  [recipes_spaghetti_carbonara_chunk_8] Stir in the egg mixture quickly, adding pasta water as needed to create a cre

## Manual API Calls

For more control, use the client directly:

In [10]:
# Raw API call example
response = client.post("/libraries/recipes_lib/search", json={
    "query": "baking cookies",
    "k": 2
})
response.json()

{'query': 'baking cookies',
 'results': [{'chunk': {'id': 'recipes_chocolate_chip_cookies_chunk_9',
    'document_id': 'recipes_chocolate_chip_cookies',
    'text': 'Scoop rounded tablespoons onto a baking sheet lined with parchment paper.',
    'embedding': None,
    'metadata': {},
    'created_at': '2025-11-23T14:54:05.972955'},
   'score': 0.5598201757918291},
  {'chunk': {'id': 'recipes_chocolate_chip_cookies_chunk_1',
    'document_id': 'recipes_chocolate_chip_cookies',
    'text': 'These chocolate chip cookies are crispy on the edges and chewy in the center.',
    'embedding': None,
    'metadata': {},
    'created_at': '2025-11-23T14:54:05.972929'},
   'score': 0.5374198302933988}],
 'result_count': 2}

## Testing Deletes

Test that deleting chunks/documents/libraries properly removes them from both storage and the search index.

In [None]:
# Create a test library for delete testing
test_lib = {"id": "delete_test_lib", "name": "Delete Test Library"}
response = client.post("/libraries", json=test_lib)
print(f"Create library: {response.status_code}")

# Create a document
test_doc = {"id": "delete_test_doc", "library_id": "delete_test_lib", "name": "Test Document"}
response = client.post("/libraries/delete_test_lib/documents", json=test_doc)
print(f"Create document: {response.status_code}")

# Create chunks
chunks = [
    {"id": "chunk_a", "document_id": "delete_test_doc", "text": "The quick brown fox jumps over the lazy dog"},
    {"id": "chunk_b", "document_id": "delete_test_doc", "text": "Machine learning is a subset of artificial intelligence"},
    {"id": "chunk_c", "document_id": "delete_test_doc", "text": "Python is a popular programming language"},
]
response = client.post("/documents/delete_test_doc/chunks/batch", json={"chunks": chunks})
print(f"Create chunks: {response.status_code} - Created {response.json().get('created_count', 0)} chunks")

In [None]:
# Verify search works - search for "programming"
print("=== BEFORE DELETE ===")
search("programming language", library="delete_test_lib", k=3)

In [None]:
# Delete single chunk (chunk_c - the Python one)
response = client.delete("/chunks/chunk_c")
print(f"Delete chunk_c: {response.status_code}")

# Verify it's gone from search results
print("\n=== AFTER DELETING chunk_c ===")
search("programming language", library="delete_test_lib", k=3)
# Should NOT find the Python chunk anymore

In [None]:
# Create another document with chunks to test document deletion
test_doc2 = {"id": "delete_test_doc2", "library_id": "delete_test_lib", "name": "Second Test Document"}
response = client.post("/libraries/delete_test_lib/documents", json=test_doc2)
print(f"Create document2: {response.status_code}")

chunks2 = [
    {"id": "chunk_d", "document_id": "delete_test_doc2", "text": "Cats are popular pets around the world"},
    {"id": "chunk_e", "document_id": "delete_test_doc2", "text": "Dogs are known as man's best friend"},
]
response = client.post("/documents/delete_test_doc2/chunks/batch", json={"chunks": chunks2})
print(f"Create chunks: {response.status_code}")

# Verify we can find pet content
print("\n=== BEFORE DOCUMENT DELETE ===")
search("pets and animals", library="delete_test_lib", k=5)

In [None]:
# Delete the document (should cascade delete its chunks from index too)
response = client.delete("/libraries/delete_test_lib/documents/delete_test_doc2")
print(f"Delete document2: {response.status_code}")

# Verify pet chunks are gone from search
print("\n=== AFTER DOCUMENT DELETE ===")
search("pets and animals", library="delete_test_lib", k=5)
# Should NOT find cats or dogs chunks anymore

In [None]:
# Test library deletion (deletes everything including index file)
response = client.delete("/libraries/delete_test_lib")
print(f"Delete library: {response.status_code}")

# Verify library is gone
response = client.get("/libraries/delete_test_lib")
print(f"Get deleted library: {response.status_code} (should be 404)")

# Verify search fails gracefully
response = client.post("/libraries/delete_test_lib/search", json={"query": "test", "k": 3})
print(f"Search deleted library: {response.status_code}")

print("\n✓ Delete tests complete!")

## Testing Index Persistence

Test that indexes survive server restarts.

**Manual test:**
1. Run the cell below to create test data
2. Restart the server (Ctrl+C, then `make start`)
3. Run the verification cell to confirm search still works

In [11]:
# Step 1: Create test data for persistence test
persist_lib = {"id": "persist_test_lib", "name": "Persistence Test Library"}
response = client.post("/libraries", json=persist_lib)
print(f"Create library: {response.status_code}")

persist_doc = {"id": "persist_test_doc", "library_id": "persist_test_lib", "name": "Persistence Test Doc"}
response = client.post("/libraries/persist_test_lib/documents", json=persist_doc)
print(f"Create document: {response.status_code}")

persist_chunks = [
    {"id": "persist_chunk_1", "document_id": "persist_test_doc", "text": "This chunk should survive a server restart"},
    {"id": "persist_chunk_2", "document_id": "persist_test_doc", "text": "Index persistence means the search index is saved to disk"},
]
response = client.post("/documents/persist_test_doc/chunks/batch", json={"chunks": persist_chunks})
print(f"Create chunks: {response.status_code}")

print("\n✓ Test data created. Now restart the server and run the next cell.")

Create library: 201
Create document: 201
Create chunks: 201

✓ Test data created. Now restart the server and run the next cell.


In [14]:
# Step 2: Verify search still works after restart
# (Run this AFTER restarting the server)

print("Checking if search works after restart...")
search("server restart persistence", library="persist_test_lib", k=3)

# If results appear, persistence is working!
print("\n✓ If you see results above, index persistence is working!")

Checking if search works after restart...
Error: 404 - {"detail":"Library with id 'persist_test_lib' not found"}

✓ If you see results above, index persistence is working!


In [13]:
# Cleanup: Delete persistence test library
response = client.delete("/libraries/persist_test_lib")
print(f"Cleanup - Delete library: {response.status_code}")
print("✓ Test data cleaned up")

Cleanup - Delete library: 204
✓ Test data cleaned up
