# Data Preparation - Game Details <a id='top'></a>

After extrcating all the necessary data, the next step is to prepare the data by converting each variable to the correct datatype. In this notebook, we start preparing the data that contains information about games details. Here we convert dates to timestamps, lists to boolean columns and ensure that the boolean features have the correct format.

The structure of this notebook is as follows:

[0. Import Libraries](#libraries) <br>
[1. Function Creation](#function) <br>
[2. Data Preparation](#prepare) <br>

# 0. Import Libraries<a id='libraries'></a>
[to the top](#top)  

The first step is to import the necessary libraries.

In [None]:
import os
from datetime import datetime
from helper_functions import save_data_to_json, load_json

# 1. Function Creation<a id='function'></a>
[to the top](#top)  

In the next cell, we define the function used to prepare the data to facilitate its use. It ensures that the output directory exists, reads each JSON file in the input directory, and prepares the data. Key functionalities include converting release dates to timestamps, transforming lists of platforms and genres into boolean columns, and handling potential errors or missing data gracefully. Additionally, it extracts relevant data such as app ID, price, number of reviews, and ratings, converts dates into a uniform format, and dynamically processes lists into individual boolean attributes. Each processed row is compiled into a new JSON structure, which is then saved to the specified output directory. This approach ensures a clean and structured format for further analysis or use.

In [2]:
def preprocess_game_details(input_folder, output_folder):
    # Ensure the output directory exists
    os.makedirs(output_folder, exist_ok=True)
    
    # Helper function to convert date to timestamp
    def convert_date_to_timestamp(date_str):
        try:
            return int(datetime.strptime(date_str, "%d %b, %Y").timestamp())
        except ValueError:
            return None
    
    # Helper function to convert list to boolean columns
    def convert_list_to_bool_columns(data, key):
        items = data.get(key, [])
        return {f"{key}_{item.replace(' ', '_').lower()}": True for item in items}
    
    # Iterate over all .json files in the input directory
    for filename in os.listdir(input_folder):
        if filename.endswith(".json"):
            input_path = os.path.join(input_folder, filename)
            
            # Read JSON file
            df = load_json(input_path)
            
            preprocessed_rows = []
            
            # Iterate over each row in the DataFrame
            for data in df.to_dicts():
                # Extract and process relevant data
                preprocessed_data = {
                    "appid": data.get("appid"),
                    "is_free": bool(data.get("is_free")),
                    "price": data.get("price"),
                    "release_date": convert_date_to_timestamp(data.get("release_date")),
                    "number_of_reviews": data.get("number_of_reviews"),
                    "metacritic_score": data.get("metacritic_score"),
                    "usk_rating": data.get("usk_rating"),
                    "required_age": data.get("required_age"),
                }
                
                # Dynamic conversion of lists to boolean columns
                preprocessed_data.update(convert_list_to_bool_columns(data, "platforms"))
                preprocessed_data.update(convert_list_to_bool_columns(data, "genres"))
                
                preprocessed_rows.append(preprocessed_data)
            
            # Path for the processed file
            output_path = os.path.join(output_folder, filename)
            
            # Write processed data to a new .json file
            save_data_to_json(preprocessed_rows, output_path)

# 2. Data Preparation<a id='prepare'></a>
[to the top](#top)  

Finally, we can start preparing the data using the function.

Before calling the function, we need to define the input and output folders.

In [None]:
# Input and output directories
input_folder = 'data/game_details'
output_folder = 'data/game_details_preprocessed'

Here, we use our newly created function

In [None]:
# Perform preprocessing
preprocess_game_details(input_folder, output_folder)