# [STARTER] Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [None]:
# Only needed for Udacity workspace

import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [1]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

In [2]:
# TODO: Create a .env file with the following variables
# OPENAI_API_KEY="YOUR_KEY"
# CHROMA_OPENAI_API_KEY="YOUR_KEY"
# TAVILY_API_KEY="YOUR_KEY"

In [3]:
# TODO: Load environment variables

load_dotenv()

CHROMA_PATH = "chromadb" 
OPENAI_API_BASE = "https://openai.vocareum.com/v1"
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

### VectorDB Instance

In [4]:
# TODO: Instantiate your ChromaDB Client
# Choose any path you want
chroma_client = chromadb.PersistentClient(path="chromadb")

### Collection

In [5]:
# TODO: Pick one embedding function
# If picking something different than openai, 
# make sure you use the same when loading it
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(api_base="https://openai.vocareum.com/v1")


In [9]:
# TODO: Create a collection
# Choose any name you want
collection = chroma_client.create_collection(
    name="udaplay",
    embedding_function=embedding_fn,
    get_or_create=True
)

print(f"Added {collection.count()} documents")

Added 15 documents


### Add documents

In [7]:
# Make sure you have a directory "project/starter/games"
data_dir = "games"

for file_name in sorted(os.listdir(data_dir)):
    if not file_name.endswith(".json"):
        continue

    file_path = os.path.join(data_dir, file_name)
    with open(file_path, "r", encoding="utf-8") as f:
        game = json.load(f)

    # You can change what text you want to index
    content = f"[{game['Platform']}] {game['Name']} ({game['YearOfRelease']}) - {game['Description']}"

    # Use file name (like 001) as ID
    doc_id = os.path.splitext(file_name)[0]
    
    collection.add(
        ids=[doc_id],
        documents=[content],
        metadatas=[game]
    )

In [8]:
semantic_query = "What is the most famous game released on Nintendo 64 that is a platformer?"

results = collection.query(
    query_texts=[semantic_query],
    n_results=3,
    include=['documents', 'metadatas', 'distances']
)

for i in range(len(results['ids'][0])):
    print(f"\n--- Result {i+1} ---")
    print(f"ID: {results['ids'][0][i]}")
    print(f"Distance: {results['distances'][0][i]:.4f}")
    print(f"Metadata: {json.dumps(results['metadatas'][0][i], indent=2)}")
    print(f"Document: {results['documents'][0][i]}")


--- Result 1 ---
ID: 009
Distance: 0.1209
Metadata: {
  "YearOfRelease": 1996,
  "Description": "A groundbreaking 3D platformer that set new standards for the genre, featuring Mario's quest to rescue Princess Peach.",
  "Platform": "Nintendo 64",
  "Name": "Super Mario 64",
  "Publisher": "Nintendo",
  "Genre": "Platformer"
}
Document: [Nintendo 64] Super Mario 64 (1996) - A groundbreaking 3D platformer that set new standards for the genre, featuring Mario's quest to rescue Princess Peach.

--- Result 2 ---
ID: 008
Distance: 0.1477
Metadata: {
  "YearOfRelease": 1990,
  "Platform": "Super Nintendo Entertainment System (SNES)",
  "Publisher": "Nintendo",
  "Genre": "Platformer",
  "Name": "Super Mario World",
  "Description": "A classic platformer where Mario embarks on a quest to save Princess Toadstool and Dinosaur Land from Bowser."
}
Document: [Super Nintendo Entertainment System (SNES)] Super Mario World (1990) - A classic platformer where Mario embarks on a quest to save Princess