# [STARTER] Udaplay Project

## Part 01 - Offline RAG

In this part of the project, you'll build your VectorDB using Chroma.

The data is inside folder `project/starter/games`. Each file will become a document in the collection you'll create.
Example.:
```json
{
  "Name": "Gran Turismo",
  "Platform": "PlayStation 1",
  "Genre": "Racing",
  "Publisher": "Sony Computer Entertainment",
  "Description": "A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.",
  "YearOfRelease": 1997
}
```


### Setup

In [1]:
import os
import json
import chromadb


### VectorDB Instance

In [11]:



chroma_client = chromadb.PersistentClient(path="chroma_db_jupiter")


### Collection

In [12]:
from chromadb.utils import embedding_functions

embedding_fn = embedding_functions.DefaultEmbeddingFunction()

collection = chroma_client.get_or_create_collection(
    name="games_collection_new",
    embedding_function=embedding_fn
)

### Add documents

In [13]:
data_dir = "games"

for file_name in sorted(os.listdir(data_dir)):
    if not file_name.endswith(".json"):
        continue

    file_path = os.path.join(data_dir, file_name)
    with open(file_path, "r", encoding="utf-8") as f:
        game = json.load(f)

    # You can change what text you want to index
    content = f"[{game['Platform']}] {game['Name']} ({game['YearOfRelease']}) - {game['Description']}"

    # Use file name (like 001) as ID
    doc_id = os.path.splitext(file_name)[0]

    collection.add(
        ids=[doc_id],
        documents=[content],
        metadatas=[game]
    )




In [14]:
collection.peek()


{'ids': ['001', '002', '003', '004', '005', '006', '007', '008', '009', '010'],
 'embeddings': array([[-0.04768373, -0.01363629, -0.02100486, ..., -0.00430921,
         -0.00090423,  0.0972009 ],
        [-0.01098956,  0.01837821, -0.06527878, ...,  0.07810403,
         -0.07336764,  0.03125126],
        [-0.05645394, -0.04063236,  0.02586169, ..., -0.0154053 ,
          0.00325583,  0.09907703],
        ...,
        [-0.06329967,  0.00596636, -0.00042587, ...,  0.01854568,
         -0.00551006,  0.0540259 ],
        [-0.02731697, -0.00917329, -0.01271181, ..., -0.01593885,
          0.05847818,  0.0173116 ],
        [-0.04662724, -0.04988441, -0.06761332, ...,  0.03523087,
         -0.04189238,  0.04914557]], shape=(10, 384)),
 'documents': ['[PlayStation 1] Gran Turismo (1997) - A realistic racing simulator featuring a wide array of cars and tracks, setting a new standard for the genre.',
  "[PlayStation 2] Grand Theft Auto: San Andreas (2004) - An expansive open-world game set in th