# [STARTER] Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [1]:
# Only needed for Udacity workspace

import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [2]:
import os
import json
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

In [3]:
# Done 1: Create a .env file with the following variables
# OPENAI_API_KEY="YOUR_KEY"
# CHROMA_OPENAI_API_KEY="YOUR_KEY"
# TAVILY_API_KEY="YOUR_KEY"

In [4]:
# TODO: Load environment variables
if load_dotenv():
    print("Environment variables loaded.")
else:
    print("Warning: .env file not found.")
    

Environment variables loaded.


### VectorDB Instance

In [5]:
# TODO: Instantiate your ChromaDB Client
# Choose any path you want
print("Initializing ChromaDB client...")
chroma_client = chromadb.PersistentClient(path="chromadb")

Initializing ChromaDB client...


### Collection

In [9]:
# TODO: Pick one embedding function
# If picking something different than openai, 
# make sure you use the same when loading it
print("Defining embedding function...")
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
    raise ValueError("OPENAI_API_KEY environment variable not set.")

embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key=openai_api_key,
    model_name="text-embedding-ada-002"
)

Defining embedding function...
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


In [7]:
# TODO: Create a collection
# Choose any name you want
print("Creating or getting the 'udaplay' collection...")
collection = chroma_client.get_or_create_collection(
    name="udaplay",
    embedding_function=embedding_fn
)

Creating or getting the 'udaplay' collection...


### Add documents

In [8]:
# Make sure you have a directory "project/starter/games"
data_dir = "games"

# Check if the directory exists
if not os.path.isdir(data_dir):
    print(f"Error: Directory '{data_dir}' not found.")
else:

    for file_name in sorted(os.listdir(data_dir)):
        if not file_name.endswith(".json"):
            continue

        file_path = os.path.join(data_dir, file_name)
        with open(file_path, "r", encoding="utf-8") as f:
            game = json.load(f)

        # You can change what text you want to index
        content = f"[{game['Platform']}] {game['Name']} ({game['YearOfRelease']}) - {game['Description']}"

        # Use file name (like 001) as ID
        doc_id = os.path.splitext(file_name)[0]

        collection.add(
            ids=[doc_id],
            documents=[content],
            metadatas=[game]
        )

TypeError: APIStatusError.__init__() missing 2 required keyword-only arguments: 'response' and 'body'