# Minecraft Genie - Embedding & Query Testing

This notebook demonstrates how to work with the Minecraft Lore vector database using ChromaDB and LlamaIndex.

## Overview
- **Purpose**: Load persisted vector embeddings of Minecraft lore and test querying functionality
- **Technology Stack**: ChromaDB for vector storage, LlamaIndex for indexing/querying, OpenAI embeddings
- **Data**: Minecraft lore documents embedded as vectors for semantic search

## Workflow
1. **Setup**: Import necessary modules and configure paths
2. **Load Database**: Connect to persisted ChromaDB vector store
3. **Query Testing**: Test semantic search capabilities on Minecraft lore data

In [1]:
# Setup
import sys
from pathlib import Path

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
import chromadb

import os
import json

# Path
project_root = Path("..").resolve()
sys.path.insert(0, str(project_root))

# Config
GOLD_PROMPTS_PATH = str(project_root / "evaluation/gold_prompts.json")
DB_PATH = str(project_root / "db/minecraft_lore")
COLLECTION_NAME = "minecraft_lore"
K_DEFAULT = 5  # default top-k for retrieval

## Loading Persisted Vector Database

The next cell connects to an already-created ChromaDB database containing Minecraft lore embeddings. The database was created using the `build_vector_index()` function from the `data.embedder` module.

**Key Components:**
- **ChromaDB**: Vector database storing document embeddings
- **PersistentClient**: Connects to the existing database file
- **Collection**: Named container ("minecraft_lore") holding the embedded documents
- **VectorStoreIndex**: LlamaIndex interface for querying the vector store

In [2]:
client = chromadb.PersistentClient(path=DB_PATH)
collection = client.get_collection(COLLECTION_NAME)
try:
    print(
        f"[Chroma] Collection '{COLLECTION_NAME}' contains {collection.count()} items at '{DB_PATH}'"
    )
except Exception:
    pass
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Ensure the same embedding model used during indexing
embed_model_name = os.environ.get("EMBED_MODEL", "text-embedding-3-small")
embed_model = OpenAIEmbedding(model=embed_model_name)

index = VectorStoreIndex.from_vector_store(
    vector_store,
    storage_context=storage_context,
    embed_model=embed_model,
)
retriever = index.as_retriever(similarity_top_k=K_DEFAULT)
print(
        f"✅ Retriever initialized with collection '{COLLECTION_NAME}' containing {collection.count()} items"
    )

[Chroma] Collection 'minecraft_lore' contains 431 items at '/Users/yonahbole/Documents/GitHub/projects/minecraft-genie/db/minecraft_lore'
✅ Retriever initialized with collection 'minecraft_lore' containing 431 items


## Semantic Search Testing

Now we can test the semantic search capabilities by creating a query engine and asking questions about Minecraft lore. The query engine will:

1. **Embed the query**: Convert the question into a vector using the same embedding model
2. **Find similar vectors**: Search the database for the most relevant document chunks
3. **Generate response**: Use the retrieved context to provide an informed answer

In [3]:
query_engine = index.as_query_engine()
query = "List the blocks that are affected by gravity. They are 11 of them."
results = query_engine.query(query)

print(str(results))

Sand, red sand, gravel, anvils of all damage levels, dragon eggs, all colors of concrete powder, scaffolding, snow layers [BE only], pointed dripstone, suspicious sand, and suspicious gravel.


In [4]:
# Inspect the contents of the ChromaDB collection
print(f"Collection name: {collection.name}")
print(f"Number of documents: {collection.count()}")

# Fetch all documents from the collection for validation
docs = collection.get(include=["documents"])

# Validate gold prompts against database content
with open(GOLD_PROMPTS_PATH, "r", encoding="utf-8") as f:
    gold_prompts = json.load(f)

# Print status for snippet matches
for prompt in gold_prompts:
    question = prompt["question"]
    expected_snippets = prompt["expected_answer_contains"]
    contains_all_snippets = all(
        any(snippet.lower() in doc.lower() for doc in docs["documents"])
        for snippet in expected_snippets
    )
    status = "✔" if contains_all_snippets else "✘"
    print(f"[{status}] Question: {question}")
    for snippet in expected_snippets:
        found_doc = next((doc for doc in docs["documents"] if snippet.lower() in doc.lower()), None)
        found = found_doc is not None
        print(f"  Snippet '{snippet}' found: {found}")
        if found:
            # Print the actual document snippet (showing a portion for readability)
            snippet_start = found_doc.lower().find(snippet.lower())
            snippet_end = snippet_start + len(snippet)
            context_window = 60
            start = max(0, snippet_start - context_window)
            end = min(len(found_doc), snippet_end + context_window)
            context = found_doc[start:end].replace('\n', ' ')
            print(f"    ...{context}...")
        else:
            print("    (No matching document snippet found)")

Collection name: minecraft_lore
Number of documents: 431
[✔] Question: Give me the trading web link
  Snippet 'https://minecraft.wiki/w/Trading' found: True
    ...ept the novice (stone) badge, which is 4x4 pixels.  Source: https://minecraft.wiki/w/Trading...
[✔] Question: Whats the xp level a villager need to become a Master?
  Snippet '250' found: True
    ...tice | 10 3 | Journeyman | 70 4 | Expert | 150 5 | Master | 250  InJava Edition, villagers have a maximum of 10 trades. Eac...
[✔] Question: How many raw chicken are needed to get one emerauld for a novice butcher?
  Snippet '14' found: True
    ...P Slot | Probability | Probability Novice | 1 | 33% | 50% | 14 ×Raw Chicken | Emerald | 16 | Low | 2 33% | 50% | 4 ×Raw Ra...
[✔] Question: Whats the probability of having the dried kelp block trade with expert butcher?
  Snippet '100%' found: True
    ...| Villager XP Slot | Probability | Probability Novice | 1 | 100% | 40% | 15 ×Coal | Emerald | 16 | Low | 2 2 | 25% |...
[✔] Questio