# Data Acquisition - Game Details <a id='top'></a>

In this next notebook of data acquisition, we start extracting more data about the selected games:

* **game_name**: Stores the name of the game.
* **app_id**: Represents the unique application ID associated with the game.
* **data**: This is a dictionary containing various data about the game.
* **required_age**: Stores the age required to play the game, defaulting to 0 if not provided.
* **is_free**: Indicates whether the game is free or not, defaulting to `False`.
* **detailed_description**: Contains a detailed description of the game.
* **supported_languages**: A list of languages supported by the game, parsed from a string in the data.
* **developers**: Stores the developers of the game.
* **publishers**: Represents the publishers of the game.
* **price**: The price of the game, either 0 if it's free or retrieved from the `price_overview` section in the data.
* **platforms**: A list of platforms on which the game is available.
* **metacritic_score**: The Metacritic score of the game.
* **categories**: Categories or tags associated with the game.
* **genres**: Genres the game belongs to.
* **release_date**: The release date of the game.
* **content_descriptors**: Descriptors for the game's content.
* **usk_rating**: The USK (Entertainment Software Self-Regulation) rating of the game.
* **number_of_reviews**: The total number of reviews or recommendations for the game.


**Attention:** This notebook has a high running time.

The structure of this notebook is as follows:

[0. Import libraries](#libraries) <br>
[1. Build a Safenet](#safenet) <br>
[2. Format Games Data](#format) <br>
[3. Extract Data](#extract) <br>

# 0. Import libraries <a id='libraries'></a>
[to the top](#top)

Import the necessary libraries.

In [1]:
import polars as pl
from helper_functions import save_data_to_json
import time
import requests

ModuleNotFoundError: No module named 'utils'

# 1. Build a Safenet<a id='safenet'></a>

Before we start extracting the data, we thought this was going to be a long process so we wanted to prevent any harm that may come. To do so, we decided to create a safenet that let's us continue where the computer left of before the algortihm went down. Now, we are ready to start extracting data.

In [None]:
def get_nested(dictionary, keys, default=None):
    """Safely retrieves a nested value from a dictionary given a list of keys."""
    for key in keys:
        dictionary = dictionary.get(key) if dictionary is not None else None
        if dictionary is None:
            return default
    return dictionary

# 2. Format Games Data<a id='format'></a>
[to the top](#top)

Before starting the extrcation process, we decided to create a function to format and extract the necessary variables from the information related to the game.

In [None]:
def format_game_data(game_info, game_name, app_id):
    """Formats and extracts the necessary fields from the game info data, including the appid, using safe dictionary access
    for top-level data and handling nested data where applicable."""
    data = game_info.get('data', {})
    return {
        "name": game_name,
        "appid": app_id,
        "required_age": data.get("required_age", 0),
        "is_free": data.get("is_free", False),
        "detailed_description": data.get("detailed_description", ""),
        "supported_languages": [lang.split('<')[0] for lang in data.get("supported_languages", "").split(',')],
        "developers": data.get("developers", []),
        "publishers": data.get("publishers", []),
        "price": 0 if data.get("is_free", False) else get_nested(data, ["price_overview", "final"], 0) / 100,
        "platforms": [key for key, value in data.get("platforms", {}).items() if value],
        "metacritic_score": get_nested(data, ["metacritic", "score"], None),
        "categories": [category["description"] for category in data.get("categories", [])],
        "genres": [genre["description"] for genre in data.get("genres", [])],
        "release_date": get_nested(data, ["release_date", "date"], ""),
        "content_descriptors": data.get("content_descriptors", {}).get("notes", ""),
        "usk_rating": get_nested(data, ["ratings", "usk", "rating"], None),
        "number_of_reviews": get_nested(data, ["recommendations", "total"], 0)  # Use get_nested to safely access nested data
    }

# 3. Extract Data <a id='extract'></a>
[to the top](#top)

Below its a function that extracts information about the selected games.

In [None]:
def fetch_game_details(game_list, file_path):
    """Fetches information for each game in the list and updates the game dictionaries with the data."""
    base_url = "https://store.steampowered.com/api/appdetails"
    games_details = []  # List to store all processed games details
    processed_count = 0

    while game_list:  # Process until the list is empty
        game = game_list.pop(0)  # Remove and return the first game from the list
        app_id = game['appid']
        game_name = game["name"]

        # Construct the URL for the API request
        params = {'appids': app_id}

        # Make the API request
        try:
            response = requests.get(base_url, params=params)
            response.raise_for_status()  # Raises an HTTPError for bad responses
            data = response.json()
            game_info = data[str(app_id)]
            
            if game_info['success']:
                games_details.append(format_game_data(game_info, game_name, app_id))
                processed_count += 1

                if processed_count % 10 == 0:
                    print(f"Processed {processed_count} games so far.")

        except requests.RequestException as e:
            print(f"Failed to fetch data for {game_name}: {str(e)}")
            exit()

        time.sleep(1)

    save_data_to_json(games_details, file_path)

In the next cell, we extract more information about the Top Sellers games on Steam

In [None]:
topseller = pl.read_json('data/jsons/SteamTopSellers.json').to_dicts()
file_path = "data/game_details/SteamTopSellers_game_details.json"

fetch_game_details(topseller, file_path)

After, we extract the same information but related with the Most Palyed Games on Steam (on the data we extracted our data).

In [None]:
mostplayed = pl.read_json('data/jsons/SteamMostPlayed.json').to_dicts()
file_path = "data/game_details/SteamMostPlayed_game_details.json"

fetch_game_details(mostplayed, file_path)