#  Game Data Collection (RAWG.io API)

**Data source:** Data provided by [RAWG.io](https://rawg.io).  
_This project uses the RAWG Video Games Database API but is not endorsed or certified by RAWG._

In this notebook, I query the RAWG API to collect basic metadata on thousands of video games, including:
- Title, release date, average user rating
- Critic rating (Metacritic), ESRB rating
- Genre, platforms, stores, tags
- Engagement metrics like playtime and review counts

The collected data is exported to a `.csv` file for further cleaning and analysis.


### Import Libraries and Setup

In [None]:
import requests
import pandas as pd
import time
from dotenv import load_dotenv
import os
import sys

sys.path.append("..")

load_dotenv()
API_KEY = os.getenv("API_KEY")

### Fetch Data
I iterate through the first 250 pages of the API, each returning 40 games. For each game, I extract relevant metadata for my analysis.

- RAWG has rate limits; so I used a `time.sleep(1)` between requests to stay compliant to the limits.
- The obatined data is stored as a list of dictionaries and converted to a DataFrame at the end.

In [None]:
games_data = []
page = 1
max_pages = 250
    
while page <= max_pages:
    url = f"https://api.rawg.io/api/games?key={API_KEY}&page_size=40&page={page}"
    response = requests.get(url)
    
    if response.status_code != 200:
        print("Error:", response.status_code)
        break
    
    data = response.json()
        
    for game in data['results']:
        games_data.append({
            'id': game['id'],
            'name': game['name'],
            'released': game['released'],
            'avg_user_rating': game['rating'],
            'user_ratings_count': game.get('ratings_count'),
            'reviews_count': game.get('reviews_count'),
            'added': game.get('added'),
            'metacritic_rating': game.get('metacritic'),
            'avg_playtime': game.get('playtime'),
            'genres': [g['name'] for g in game.get('genres', [])],
            'platforms': [p['platform']['name'] for p in game.get('platforms', [])],
            'stores': [s['store']['name'] for s in game.get('stores', [])],
            'tags': [t['name'] for t in game.get('tags', [])],
            'esrb_rating': game.get('esrb_rating', {}).get('name') if game.get('esrb_rating') else None,
        })
    
    print(f"Fetched page {page} with {len(data['results'])} games")
    page += 1
    time.sleep(1) 
    
games = pd.DataFrame(games_data)

### Export to CSV

In [None]:
games.to_csv("../data/games_data.csv", index =  False)
