# [STARTER] Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [5]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.environ["OPENAI_API_KEY"],
    api_base=os.environ["OPENAI_BASE_URL"],
    model_name="text-embedding-3-small"
)

In [None]:
# TODO: Create a .env file with the following variables
#OPENAI_API_KEY="YOUR_KEY"
#CHROMA_OPENAI_API_KEY="YOUR_KEY"
#TAVILY_API_KEY="YOUR_KEY"

In [6]:
# TODO: Load environment variables

load_dotenv('config.env')
assert os.getenv("OPENAI_API_KEY") is not None
assert os.getenv("TAVILY_API_KEY") is not None

### VectorDB Instance

In [7]:
# TODO: Instantiate your ChromaDB Client
# Choose any path you want
DATA_DIR = "project/starter/games"
CHROMA_PATH = "chromadb"
COLLECTION_NAME = "udaplay"

# Instantiate your ChromaDB Client
chroma_client = chromadb.PersistentClient(path=CHROMA_PATH)

### Collection

In [8]:
# TODO: Pick one embedding function
# If picking something different than openai, 
# make sure you use the same when loading it
import os

embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key=os.getenv("OPENAI_API_KEY"),
    model_name="text-embedding-3-small"
)

In [9]:
# TODO: Create a collection
# Choose any name you want
collection = chroma_client.get_or_create_collection(
    name=COLLECTION_NAME,
    embedding_function=embedding_fn
)

### Add documents

In [10]:
# Make sure you have a directory "project/starter/games"
api_key = os.getenv("OPENAI_API_KEY")
if not api_key or not api_key.strip():
    raise ValueError(
        "OPENAI_API_KEY is missing/blank. Ensure your .env is loaded in this runtime "
        "and contains OPENAI_API_KEY=sk-..."
    )


data_dir = "games"  # or DATA_DIR if you defined it

for file_name in sorted(os.listdir(data_dir)):
    if not file_name.endswith(".json"):
        continue

    file_path = os.path.join(data_dir, file_name)
    with open(file_path, "r", encoding="utf-8") as f:
        game = json.load(f)

    content = (
        f"[{game.get('Platform','Unknown')}] "
        f"{game.get('Name','Unknown')} "
        f"({game.get('YearOfRelease','Unknown')}) - "
        f"{game.get('Description','')}"
    )

    doc_id = os.path.splitext(file_name)[0]

    # Skip if already added (prevents crashing on rerun)
    existing = collection.get(ids=[doc_id])
    if existing and existing.get("ids"):
        continue

    collection.add(
        ids=[doc_id],
        documents=[content],
        metadatas=[game]
    )

print("Collection count:", collection.count())


Collection count: 15
