## Phase I Project Proposal
### Which VideoGame is Really Worth It's Money?

#### Name: Alexander Khinno, DS 3000

### Introduction

Is it worth it to spend money on a new game? Many games charge high prices just for people to play it for a couple of hours and get bored. Some games are free and easy to keep playing, meaning players dont have to waste anymore money. I think it would be intresting to see how a games price affects its rating and playtime, as well as its popularity. I want to find the game with the highest % of owners returning to play the game. I hypothesize free games like Dota 2, CS2, and Player Unknown Battleground to have some of the highest player counts while not having as high of a return rate or as high reviews as a small community game like for example: Realm of the Mad God (A free rougelike pixel game which is not very well known). I also want to see the relationship to games price and how well it did overtime? If someone spends $60 on a game, does that mean they are more likely to stay with it to get their 'moneys worth?' I think it would be intresting to see if any patterns emerge and helpful to find games that are actually worth there money and or time.

### Data Collection

I plan to use the Steam App List API which returns a JSON file of all the steam game codes associated for each title. On the steam webpage, the difference between links is a number associated with a game, so having a list of all the game codes is neccesary for webscraping the reviews and price of the games as well as other data. Another API I would like to use would be from Steam Charts. Steam Charts measures the player count overtime and also takes the game code. Lastly, an API to track ownership of the games would be from SteamSPY, which tracks the estimated ownership of the game as well as how many users play the game for 2 weeks or forever.

Below is the demonstrated code that could be used to extract some of the data: First function got all the game id's and names and the second function got all the player counts of the game for every month. I printed data showing a couple of game id's and names as well as all the monthly data of 2 games.

In [88]:
from bs4 import BeautifulSoup
import requests
import json
import pandas as pd

In [94]:
def api_call_game_code():
    """ This function uses the Steam App List API to extract all game codes and use it for the data
    No API Key required, takes 1-2 minutes to load

    Parameters: None
    
    Returns: 
    data (Dictoniary): Contains all App ID's and their coresponding name
    app_ids (List): List of all the ID's
    """
    url = 'https://api.steampowered.com/ISteamApps/GetAppList/v2/'
    data = requests.get(url).json()

    dict = data['applist']['apps']
    app_ids = [app['appid'] for app in dict]
    
    return dict, app_ids

def api_call_steam_chart(ids):
    """ This function uses the Steam Charts API to get all playercount data

    Parameters: 
    ids (lst): List of game ID's
    Returns:
    data: A dictionary where all game ID's have average players per month, gain and gain%, as well as peak peak players
    """
    data = {}
    for id in ids:
        url = f'https://steamcharts.com/app/{id}'
        url_text = requests.get(url).text

        soup = BeautifulSoup(url_text)
        table = soup.find('table', class_='common-table')

        rows = table.find('tbody').find_all('tr')
        monthly_data = []
        for row in rows:
            cols = row.find_all('td')
            month = cols[0].text.strip()
            avg_players = cols[1].text.strip()
            gain = cols[2].text.strip()
            pct_gain = cols[3].text.strip()
            peak = cols[4].text.strip()
            monthly_data.append([month, avg_players, gain, pct_gain, peak])
        
        data[id] = monthly_data
        
    return data

In [98]:
id_dict, id = api_call_game_code()
print(id_dict[:5])

result = api_call_steam_chart(['10', '20']) 
print(result)

[{'appid': 5, 'name': 'Dedicated Server'}, {'appid': 7, 'name': 'Steam Client'}, {'appid': 8, 'name': 'winui2'}, {'appid': 10, 'name': 'Counter-Strike'}, {'appid': 20, 'name': 'Team Fortress Classic'}]
{'10': [['Last 30 Days', '8176.27', '+371.0', '+4.75%', '16249'], ['September 2025', '7805.25', '883.12', '+12.76%', '13254'], ['August 2025', '6922.13', '-449.35', '-6.10%', '12168'], ['July 2025', '7371.48', '-833.50', '-10.16%', '13951'], ['June 2025', '8204.98', '-847.53', '-9.36%', '15798'], ['May 2025', '9052.51', '-471.31', '-4.95%', '15333'], ['April 2025', '9523.82', '-849.53', '-8.19%', '17727'], ['March 2025', '10373.35', '-1190.16', '-10.29%', '18180'], ['February 2025', '11563.50', '-341.15', '-2.87%', '18934'], ['January 2025', '11904.66', '413.55', '+3.60%', '20626'], ['December 2024', '11491.11', '1253.53', '+12.24%', '19006'], ['November 2024', '10237.58', '310.83', '+3.13%', '17065'], ['October 2024', '9926.75', '645.11', '+6.95%', '16821'], ['September 2024', '9281.64'

### Data Usage and Remaining Issues

The above data does not include every API I want to use. I would still want to get get ownership data for each game. Another part of my plan would be to create a dataframe where the game ID, name, monthly data, and ownership data is all in one big dataframe in order for graphing and comparisons. I would also consider using numpy arrays to compare numerical data to maybe make predictions about future data, maybe to predict success of a new game based on similarity to other games (or something like that). There is tons of data to analyze and lots of possibilities so I am not worried about my data having any major issues. The only real issue is that steam's api takes all the game names and id's which might make rendering my code take a while. I could possibly remove some of the excess data or see if the API has parameters for its call so that I can limit the number of games it gives me.