# <u>GameFinder</u>

The goal of this project is to help indie developers find similar games 
to that which they plan to or are currently developing, along with 
information useful for business-related research, such as sales, review ratings and more. 
While the project started out with this in mind, it can still be used by anyone simply 
looking to find games that suit their tastes, with more options and information than other 
websites currently provide.

**API REFERENCES**<br />
https://partner.steamgames.com/doc/webapi                          - Valve's steam API<br />
https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI     - API Docs<br />
https://steamapi.xpaw.me/#<br />
https://steamspy.com/api.php                                       - Popularity and sales<br />

**CODE REFERENCES**<br />
https://nik-davis.github.io/posts/2019/steam-data-collection/<br />
https://www.machinelearningplus.com/nlp/gensim-tutorial/#3howtocreateadictionaryfromalistofsentences  - gensim <br />

In [99]:
# standard library imports
import csv
import datetime as dt
import json
import os
import statistics
import time
from collections import Counter

# third-party imports
import numpy as np
import pandas as pd
import requests
from requests.exceptions import SSLError

# For regular expressions
import re

In [None]:
# customisations - ensure tables show all columns
pd.set_option("max_columns", 100)

# All-Purpose Function to Get API Requests

In [None]:
def get_request(url, parameters=None):
    """Return json-formatted response of a get request using optional parameters.

    Parameters
    ----------
    url : string
    parameters : {'parameter': 'value'}
        parameters to pass as part of get request

    Returns
    -------
    json_data
        json-formatted response (dict-like)
    """
    try:
        response = requests.get(url=url, params=parameters)
    except SSLError as s:
        print('SSL Error:', s)

        for i in range(5, 0, -1):
            print('\rWaiting... ({})'.format(i), end='')
            time.sleep(1)
        print('\rRetrying.' + ' ' * 10)

        # recusively try again
        return get_request(url, parameters)

    if response:
        return response.json()
    else:
        # response is none usually means too many requests. Wait and try again
        print('No response, waiting 10 seconds...')
        time.sleep(10)
        print('Retrying.')
        return get_request(url, parameters)

# Generate List of App IDs Using SteamSpy
- NOTE: Currently we are not doing this. Instead, we are using the kaggle steam dataset (which also used SteamSpy) listed in the references section. This was so that the focus of the project would stay on the core functionality instead of worrying about how to gather the data, since Steam does not have a very well documented API. Nevertheless, I have tested all of these functions, and they all work correctly.

In [112]:
# url = "https://steamspy.com/api.php"
# parameters = {"request": "all"}

# request 'all' from steam spy and parse into dataframe
# json_data = get_request(url, parameters=parameters)
# steam_spy_all = pd.DataFrame.from_dict(json_data, orient='index')

# generate sorted app_list from steamspy data
# app_list = steam_spy_all[['appid', 'name']].sort_values('appid').reset_index(drop=True)

# export can be disabled to keep consistency across download sessions
# app_list.to_csv('../downloads/app_list.csv', index=False)

# instead read from stored csv
app_list = pd.read_csv('../downloads/kaggle_steam_dataset/steam.csv')

# display first few rows
app_list.head()

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
1,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
2,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
3,40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
4,50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99


In [115]:
print(f"Total number of games:\t {len(app_list)}")

Total number of games:	 27075


# Define Download Logic

Now that we have app_list, we can iterate over it to request individual app data from the srevers.
This is where we set the logic for that, and store the end data as a csv.
Since it takes a long time to retreive the data, we cannot attempt it all in one go.
We shall define a function to download and process the requests in batches, appending each one to an external file, and keeping track of the highest index in another.

This provides security (easy restart), and means we can complete the download over multiple sessions.

In [None]:
def get_app_data(start, stop, parser, pause):
    """Return list of app data generated from parser.
    
    parser : function to handle request
    """
    app_data = []
    
    # iterate through each row of app_list, confined by start and stop
    for index, row in app_list[start:stop].iterrows():
        print('Current index: {}'.format(index), end='\r')
        
        appid = row['appid']
        name = row['name']

        # retrive app data for a row, handled by supplied parser, and append to list
        data = parser(appid, name)
        app_data.append(data)

        time.sleep(pause) # prevent overloading api with requests
    
    return app_data

In [None]:
def process_batches(parser, app_list, download_path, data_filename, index_filename,
                    columns, begin=0, end=-1, batchsize=100, pause=1):
    """Process app data in batches, writing directly to file.
    
    parser : custom function to format request
    app_list : dataframe of appid and name
    download_path : path to store data
    data_filename : filename to save app data
    index_filename : filename to store highest index written
    columns : column names for file
    
    Keyword arguments:
    
    begin : starting index (get from index_filename, default 0)
    end : index to finish (defaults to end of app_list)
    batchsize : number of apps to write in each batch (default 100)
    pause : time to wait after each api request (defualt 1)
    
    returns: none
    """
    print('Starting at index {}:\n'.format(begin))
    
    # by default, process all apps in app_list
    if end == -1:
        end = len(app_list) + 1
    
    # generate array of batch begin and end points
    batches = np.arange(begin, end, batchsize)
    batches = np.append(batches, end)
    
    apps_written = 0
    batch_times = []
    
    for i in range(len(batches) - 1):
        start_time = time.time()
        
        start = batches[i]
        stop = batches[i+1]
        
        app_data = get_app_data(start, stop, parser, pause)
        
        rel_path = os.path.join(download_path, data_filename)
        
        # writing app data to file
        with open(rel_path, 'a', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=columns, extrasaction='ignore')
            
            for j in range(3,0,-1):
                print("\rAbout to write data, don't stop script! ({})".format(j), end='')
                time.sleep(0.5)
            
            writer.writerows(app_data)
            print('\rExported lines {}-{} to {}.'.format(start, stop-1, data_filename), end=' ')
            
        apps_written += len(app_data)
        
        idx_path = os.path.join(download_path, index_filename)
        
        # writing last index to file
        with open(idx_path, 'w') as f:
            index = stop
            print(index, file=f)
            
        # logging time taken
        end_time = time.time()
        time_taken = end_time - start_time
        
        batch_times.append(time_taken)
        mean_time = statistics.mean(batch_times)
        
        est_remaining = (len(batches) - i - 2) * mean_time
        
        remaining_td = dt.timedelta(seconds=round(est_remaining))
        time_td = dt.timedelta(seconds=round(time_taken))
        mean_td = dt.timedelta(seconds=round(mean_time))
        
        print('Batch {} time: {} (avg: {}, remaining: {})'.format(i, time_td, mean_td, remaining_td))
            
    print('\nProcessing batches complete. {} apps written'.format(apps_written))

Next, we need functions to handle and prepare the external files

**reset_index** is used for testing and demonstration; setting the index in the stored file to 0 will restart the download process

**get_index** retreives the index from file.

**prepare_data_file** readies the CSV for storing data. If index is 0, we need a blank csv. Otherwise, leave CSV alone.

In [None]:
def reset_index(download_path, index_filename):
    """Reset index in file to 0."""
    rel_path = os.path.join(download_path, index_filename)
    
    with open(rel_path, 'w') as f:
        print(0, file=f)
        

def get_index(download_path, index_filename):
    """Retrieve index from file, returning 0 if file not found."""
    try:
        rel_path = os.path.join(download_path, index_filename)

        with open(rel_path, 'r') as f:
            index = int(f.readline())
    
    except FileNotFoundError:
        index = 0
        
    return index


def prepare_data_file(download_path, filename, index, columns):
    """Create file and write headers if index is 0."""
    if index == 0:
        rel_path = os.path.join(download_path, filename)

        with open(rel_path, 'w', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=columns)
            writer.writeheader()

# ----------------------------------------------------------------------------------------------------------

# Download Steam Data
- NOTE: Currently only using 2000 games from the 27000 game database.

In [None]:
def parse_steam_request(appid, name):
    """Unique parser to handle data from Steam Store API.
    
    Returns : json formatted data (dict-like)
    """
    url = "http://store.steampowered.com/api/appdetails/"
    parameters = {"appids": appid}
    
    json_data = get_request(url, parameters=parameters)
    json_app_data = json_data[str(appid)]
    
    if json_app_data['success']:
        data = json_app_data['data']
    else:
        data = {'name': name, 'steam_appid': appid}
        
    return data

In [None]:
# Set file parameters
download_path = '../downloads/kaggle_steam_dataset'
steam_description_data = 'steam_app_data.csv'
steam_index = 'steam_index.txt'

steam_columns = [
    'type', 'name', 'steam_appid', 'required_age', 'is_free', 'controller_support',
    'dlc', 'detailed_description', 'about_the_game', 'short_description', 'fullgame',
    'supported_languages', 'header_image', 'website', 'pc_requirements', 'mac_requirements',
    'linux_requirements', 'legal_notice', 'drm_notice', 'ext_user_account_notice',
    'developers', 'publishers', 'demos', 'price_overview', 'packages', 'package_groups',
    'platforms', 'metacritic', 'reviews', 'categories', 'genres', 'screenshots',
    'movies', 'recommendations', 'achievements', 'release_date', 'support_info',
    'background', 'content_descriptors'
]

##### NOTE: Code is commented because we have already aquired 2000 games to test functionality on. Running this cell would continue adding to that list, and is a long process

In [None]:
# Overwrites last index for demonstration (would usually store highest index so can continue across sessions)
# reset_index(download_path, steam_index)

# Retrieve last index downloaded from file
# index = get_index(download_path, steam_index)

# # Wipe or create data file and write headers if index is 0
# prepare_data_file(download_path, steam_description_data, index, steam_columns)

# # Set end and chunksize for demonstration - remove to run through entire app list
# process_batches(
#     parser=parse_steam_request,
#     app_list=app_list,
#     download_path=download_path,
#     data_filename=steam_description_data,
#     index_filename=steam_index,
#     columns=steam_columns,
#     begin=index,
#     end=len(app_list),
#     batchsize=1000
# )

In [None]:
# Inspect downloaded data
steam_game_data = pd.read_csv(download_path + '/' + steam_description_data)
steam_game_data.head()

In [None]:
# Inspect size of current database
print(len(steam_game_data))

# ----------------------------------------------------------------------------------------------------------

# Downloading SteamSpy Data
 - NOTE: Currently only using 2000 games from the 27000 game database.

In [None]:
def parse_steamspy_request(appid, name):
    """Parser to handle SteamSpy API data."""
    url = "https://steamspy.com/api.php"
    parameters = {"request": "appdetails", "appid": appid}
    
    json_data = get_request(url, parameters)
    return json_data

In [None]:
steamspy_data = 'steamspy_data.csv'
steamspy_index = 'steamspy_index.txt'

steamspy_columns = [
    'appid', 'name', 'developer', 'publisher', 'score_rank', 'positive',
    'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks',
    'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount',
    'languages', 'genre', 'ccu', 'tags'
]

##### NOTE: Code is commented because we have already aquired 2000 games to test functionality on. Running this cell would continue adding to that list, and is a long process

In [None]:
# ONLY UNCOMMENT THE LINE BELOW TO RESTART THE PROCESS OF WRITING ALL THE GAME DATA TO CSV
# reset_index(download_path, steamspy_index)

# index = get_index(download_path, steamspy_index)

# # Wipe data file if index is 0
# prepare_data_file(download_path, steamspy_data, index, steamspy_columns)

# process_batches(
#     parser=parse_steamspy_request,
#     app_list=app_list,
#     download_path=download_path, 
#     data_filename=steamspy_data,
#     index_filename=steamspy_index,
#     columns=steamspy_columns,
#     begin=index,
#     end=len(app_list),
#     batchsize=1000,
#     pause=0.1
# )

In [116]:
# Inspect downloaded steamspy data
all_steamspy_game_data = pd.read_csv(download_path + '/' + steamspy_data)
all_steamspy_game_data.head()

Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,languages,genre,ccu,tags
0,10,Counter-Strike,Valve,Valve,,176229,4525,0,"10,000,000 .. 20,000,000",9394,128,264,93,999,999,0,"English, French, German, Italian, Spanish - Sp...",Action,15469,"{'Action': 5347, 'FPS': 4763, 'Multiplayer': 3..."
1,20,Team Fortress Classic,Valve,Valve,,4897,829,0,"2,000,000 .. 5,000,000",153,4,17,4,499,499,0,"English, French, German, Italian, Spanish - Sp...",Action,76,"{'Action': 741, 'FPS': 302, 'Multiplayer': 254..."
2,30,Day of Defeat,Valve,Valve,,4679,521,0,"5,000,000 .. 10,000,000",283,0,15,0,499,499,0,"English, French, German, Italian, Spanish - Spain",Action,137,"{'FPS': 780, 'World War II': 245, 'Multiplayer..."
3,40,Deathmatch Classic,Valve,Valve,,1717,376,0,"5,000,000 .. 10,000,000",389,0,9,0,499,499,0,"English, French, German, Italian, Spanish - Sp...",Action,5,"{'Action': 627, 'FPS': 137, 'Classic': 105, 'M..."
4,50,Half-Life: Opposing Force,Gearbox Software,Valve,,11062,567,0,"5,000,000 .. 10,000,000",417,40,204,40,499,499,0,"English, French, German, Korean",Action,114,"{'FPS': 874, 'Action': 316, 'Sci-fi': 242, 'Cl..."


In [117]:
# Inspect steamspy data size
print(len(all_steamspy_game_data))

2000


# ----------------------------------------------------------------------------------------------------------

# Create a Similarity Score Manually <br /><br />
Game tags cannot be compared using their word vectors, since trained datasets such as spaCy or Gensim assigning the vector do not take into consideration the gaming-related context of the tag. <br /><br />
For example; With Gensim and spaCy's word embeddings, the tags 'Singleplayer' and 'Multiplayer' have a high similarity. This makes sense outside of the gaming context, but as steam tags these two are not seen together that often, and should not have such a high similarity. <br /><br />
Therefore, we will create Similarity Scores for each tag; a decimal number between 0 and 1 indicating how often a tag is seen with every other tag.<br /><br /><br />
First, we'll have to grab the tags column from our SteamSpy dataset:

In [44]:
# Grab tags column, inspect a random entry
all_games_all_tags = all_steamspy_game_data['tags']

print(all_steamspy_game_data['name'][500], ": ", all_games_all_tags[500])

Puzzle Kingdoms :  {'Casual': 36, 'Puzzle': 24, 'Match 3': 23, 'RPG': 16, 'Fantasy': 11, 'Singleplayer': 6}


<br />Next, we'll make sure to have a list of only UNIQUE game tags, since many tags in the dataset appear more than once.

In [57]:
unique_taglist = []

for game_tags in all_games_all_tags:
    current_game_tags = re.findall("'([A-Za-z&\s'\-0-9]*)'", game_tags)
    for current_tag in current_game_tags:
        if current_tag in unique_taglist:
            continue
        unique_taglist.append(current_tag)

In [58]:
# Peek some unique tags gathered from the current database, but also print it's full size.
print(f"First 10 unique tags:\n{taglist[:10]}\n")
print(f"Total number of unique tags:\t{len(taglist)}")

First 10 unique tags:
['Action', 'FPS', 'Multiplayer', 'Shooter', 'Classic', 'Team-Based', 'First-Person', 'Competitive', 'Tactical', 'e-sports']

Total number of unique tags:	387


<br /><br />This function does as the name suggests. It goes through every unique tag, counts the number of times it appears and stores it in a dictionary

In [106]:
def countAndStoreTagAppearances():
    
    tag_appearances = {}
    
    for tag in unique_taglist:
        for game_tags in all_games_all_tags:
            if tag in game_tags:
                if not tag in tag_appearances:
                    tag_appearances[tag] = 1
                else:
                    tag_appearances[tag] += 1
                    
    return tag_appearances

<br /><br />After storing tag appearances, we can use it to create our Similarity Scores. This is done by first calculating and storing the number of times a tag appeared with every other tag, then dividing that by the number of times the tag itself appeared.<br /><br />
E.g: If 'Action' appeared with 'FPS' 100 times, and 'Action' itself appeared 200 times, the Similarity Score for 
'Action' and 'FPS' would be 0.5.

In [107]:
def calculateSimilarityScores(tag_appearances):
    
    tag_similarities = {}

    count = 0
    for tag in unique_taglist:
        for game_tags in all_games_all_tags:
            if tag in game_tags:
                if not tag in tag_similarities:
                    tag_similarities[tag] = {}
                current_tag_group = re.findall("'([A-Za-z&\s'\-0-9]*)'", game_tags)
                if tag in current_tag_group:
                    current_tag_group.remove(tag)
                for current_tag in current_tag_group:
                    if current_tag in tag_similarities[tag]:
                        tag_similarities[tag][current_tag] += 1
                    else:
                        tag_similarities[tag][current_tag] = 1

        count += 1
            
    # Turn number of correlations into a fraction (divide by total appearances of the tag)
    for tag in tag_similarities:
        for sub_tag in tag_similarities[tag]:
            tag_similarities[tag][sub_tag] /= tag_appearances[tag]
            
    return tag_similarities

<br /><br />Since our program aims to do more than just show you similar games, this utility function will store existing information from our dataset, albeit in a much cleaner way.

In [104]:
# Store game ids, number of owners, and price (manually formatted). Will be used to create link to game page.
def collectGameInfo():
    
    game_info = {}

    count = 0
    for game in all_steamspy_game_data['name']:
        if not game in game_info:
            game_info[game] = {}
            game_info[game]['appid'] = all_steamspy_game_data['appid'][count]
            game_info[game]['owners'] = re.findall("[0-9,]+", all_steamspy_game_data['owners'][count].replace(",", ""))

            unformatted_price = str(all_steamspy_game_data['price'][count])
            price_len = len(unformatted_price)
            game_info[game]['price'] = unformatted_price[0:price_len - 2] + "." + unformatted_price[price_len-2:price_len]
            game_info[game]['reviews'] = all_steamspy_game_data['positive'][count] + all_steamspy_game_data['negative'][count]
            game_info[game]['good_review_ratio'] = float(all_steamspy_game_data['positive'][count] / game_info[game]['reviews'])

        count += 1
    
    return game_info

In [108]:
# Call the functions and store their data appropriately.
tag_appearances = countAndStoreTagAppearances()
tag_similarities = calculateSimilarityScores(tag_appearances)
game_info = collectGameInfo()

<br /><br />After calling all of our functions, let's take a peek at the generated data to make sure everything is intact.

In [109]:
# Inspect a tag's similarity vectors
if 'Platformer' in tag_similarities:
    print("Platformer:")
    for sub_tag in tag_similarities['Platformer']:
        print(sub_tag, " --> ", tag_similarities['Platformer'][sub_tag])

Platformer:
Action  -->  0.8085106382978723
FPS  -->  0.029787234042553193
Multiplayer  -->  0.23829787234042554
Classic  -->  0.1702127659574468
First-Person  -->  0.0425531914893617
Sci-fi  -->  0.18723404255319148
Shooter  -->  0.13191489361702127
Space  -->  0.06382978723404255
Cyberpunk  -->  0.02553191489361702
Memes  -->  0.01702127659574468
Psychological Horror  -->  0.03404255319148936
Conspiracy  -->  0.01276595744680851
3D  -->  0.03404255319148936
Old School  -->  0.06382978723404255
Retro  -->  0.22127659574468084
Cult Classic  -->  0.03404255319148936
Competitive  -->  0.03404255319148936
Sports  -->  0.02553191489361702
Nudity  -->  0.00851063829787234
Puzzle  -->  0.451063829787234
Puzzle-Platformer  -->  0.2553191489361702
3D Platformer  -->  0.15319148936170213
Singleplayer  -->  0.7489361702127659
Comedy  -->  0.18723404255319148
Female Protagonist  -->  0.19148936170212766
Funny  -->  0.225531914893617
Physics  -->  0.14468085106382977
Story Rich  -->  0.17021276595

In [110]:
# Inspect game_info for a random game
print(f"Counter-Strike: {game_info['Counter-Strike']}")

Counter-Strike: {'appid': 10, 'owners': ['10000000', '20000000'], 'price': '9.99', 'reviews': 180754, 'good_review_ratio': 0.9749659758566892}


<br /><br />Now that every tag has a Similarity Score to another, let's use it to score how similar your game is to everything else out there. To do this however, we'll need some tags used to identify your game. This will eventually be user input. For demonstration and testing purposes, we'll use some tags for a game that that myself and a bunch of buddies are working on.<br /><br /><br />
This function will go through all the games in our dataset and increment current_game_similarity_score by adding the Similarity Score of each of our game's tags to every tag in the game we are comparing to.

In [90]:
def scoreGames(our_tags, must_include_tags):
    
    game_similarities = {}
    current_game_similarity_score = 0
    count = 0
    continue_comparing = True;
    
    """
    Go through all games in our csv, incrementing current_game_similarity_score by the corresponding Similarity Score stored in 
    tag_similarities, for each tag in our_tags, to each tag of a game. 
    """
    for current_game_tags in all_games_all_tags:
        current_game_tags_tidy = re.findall("'([A-Za-z&\s'\-0-9]*)'", current_game_tags)
        if must_include_tags:
            for must_include_tag in must_include_tags:
                if not must_include_tag in current_game_tags_tidy:
                    continue_comparing = False;
                    break;
        if continue_comparing == False:
            continue_comparing = True;
            current_game_similarity_score = 0
            game_similarities[all_steamspy_game_data['name'][count]] = current_game_similarity_score
            count += 1
            continue;
        for our_tag in our_tags:
            if our_tag in tag_similarities:
                if our_tag in current_game_tags_tidy:
                    current_game_similarity_score += 1
                for current_game_tag in current_game_tags_tidy:
                    if current_game_tag in tag_similarities[our_tag]:
                        current_game_similarity_score += tag_similarities[our_tag][current_game_tag]
        game_similarities[all_steamspy_game_data['name'][count]] = current_game_similarity_score
        current_game_similarity_score = 0
        count += 1
        
    return game_similarities

<br /><br />This utility function will do all the fancy printing work for us, displaying the X most similar games to ours. The num_games_to_show value will also be user input. The main reason for this variable is so that not every game single game is shown in the output (i.e, games with very low similarity or even no similarity are not really important for us to see).

In [94]:
def printSimilarGames(num_games_to_show):
    
    # We may not actually find the exact number of games given to the variable 'num_games_to_show'.
    # To cover our bases, count the number of games with a score higher than zero, and store that instead.
    count = 0
    k = Counter(game_similarities)
    high_similarity = k.most_common(num_games_to_show)
    for game in high_similarity:
        if game[1] > 0:
            count += 1
    num_games_to_show = count

    
    # Printing the X most similar games
    print("\t\t\t----------------------------------------------------\n"\
          f"\t\t\t\t\tTOP {num_games_to_show} SIMILAR GAMES:\n"\
          "\t\t\t----------------------------------------------------\n"\
          "\t\t\t\t\t(Prices are in USD)\n"\
          f"\t\t\t\t Searched a database of {len(all_steamspy_game_data)} games;\n\n"
          "- Based on the tags:\t\t\t", end="")
    count = 1
    for tag in our_tags:
        if count < len(our_tags):
            print(f"{tag}, ", end="")
        else:
            print(f"{tag}")
        count += 1

        
    count = 1
    if must_include_tags:
        if len(must_include_tags) > 1:
            print(f"- Games shown had to include the tags:\t", end="")
            for tag in must_include_tags:
                if count < len(must_include_tags):
                    print(f"{tag}, ", end="")
                else:
                    print(f"{tag}\n")
                count += 1
        else:
            print(f"- Games shown had to include the tag:\t{must_include_tags[0]}\n")
    else:
        print("\n")

        
    # DISCLAIMER:
    # Estimates based on reviews are ONLY accurate if 3% of owners reviewed the game.
    # Of course, this is not the case for every game.
    count = 0
    for game in high_similarity:
        game_name = game[0]
        game_score = game[1]

        if game_score > 0:
            count += 1

            game_app_id = game_info[game_name]['appid']
            game_price = game_info[game_name]['price']
            review_ratio = game_info[game_name]['good_review_ratio']
            
            num_owners_floor = game_info[game_name]['owners'][0]
            num_owners_ceiling = game_info[game_name]['owners'][1]
            
            owners_revenue_floor = int(float(num_owners_floor) * float(game_price))
            owners_revenue_ceiling = int(float(num_owners_ceiling) * float(game_price))
            
            num_reviews = game_info[game_name]['reviews']
            review_revenue_floor = int(float(num_reviews) * 30 * float(game_price))
            review_revenue_ceiling = int(float(num_reviews) * 50 * float(game_price))

            print("({})\t{}\n" \
            "\tScore:\t\t\t\t{:.4f}\n" \
            "\tSteam Page:\t\t\thttps://store.steampowered.com/app/{}/\n" \
            "\tPrice:\t\t\t\t${:,.2f}\n" \
            "\tReviews: \t\t\t{:,.0f}% Positive\n" \
            "\tNumber of Owners:\t\t{:,} - {:,}\n" \
            "\tRevenue Based on Owners:\t${:,} - ${:,}\n" \
            "\tNumber of Reviews:\t\t{:,}\n" \
            "\tRevenue Based on Reviews:\t${:,} - ${:,}\n".format(count, 
                                                                  game_name, 
                                                                  game_score, 
                                                                  game_app_id, 
                                                                  float(game_price), 
                                                                  review_ratio * 100, 
                                                                  int(num_owners_floor), 
                                                                  int(num_owners_ceiling), 
                                                                  owners_revenue_floor, 
                                                                  owners_revenue_ceiling, 
                                                                  num_reviews, 
                                                                  review_revenue_floor, 
                                                                  review_revenue_ceiling))

<br /><br /> Now that we've defined those functions, let's test out our the GameFinder!

In [97]:
# Tags we will give to our game
our_tags = ["Puzzle platformer", "Horror", "Story rich", "Dark", "2D", "Platformer", "Puzzle"]

# ONLY compare to games that include this list of tags. If empty, comapre to all.
must_include_tags = ["2D"]

# TODO: Use NLP to compare game descriptions, and consider this to an appropriate extent when scoring a game.
our_description = ""

game_similarities = scoreGames(our_tags, must_include_tags)

In [98]:
num_games_to_show = 60

printSimilarGames(num_games_to_show)

			----------------------------------------------------
					TOP 60 SIMILAR GAMES:
			----------------------------------------------------
					(Prices are in USD)
				 Searched a database of 2000 games;

- Based on the tags:			Puzzle platformer, Horror, Story rich, Dark, 2D, Platformer, Puzzle
- Games shown had to include the tag:	2D

(1)	Trine Enchanted Edition
	Score:				32.2963
	Steam Page:			https://store.steampowered.com/app/35700/
	Price:				$14.99
	Reviews: 			96% Positive
	Number of Owners:		2,000,000 - 5,000,000
	Revenue Based on Owners:	$29,980,000 - $74,950,000
	Number of Reviews:		12,522
	Revenue Based on Reviews:	$5,631,143 - $9,385,239

(2)	Trine 2: Complete Story
	Score:				32.1608
	Steam Page:			https://store.steampowered.com/app/35720/
	Price:				$4.99
	Reviews: 			96% Positive
	Number of Owners:		2,000,000 - 5,000,000
	Revenue Based on Owners:	$9,980,000 - $24,950,000
	Number of Reviews:		19,633
	Revenue Based on Reviews:	$2,939,060 - $4,898,433

(3)	Life Goes On: Do

<br /><br />As we can see, the results are quite accurate, even though the current dataset is only using 2000 out of the 27000 games. Once the full dataset is used, the accuracy of Similarity Scores for each tag will greatly increase, in turn providing us with even more accurate predictions for similar games. <br /><br />
Since it would take a while to load this complete dataset, and the core functionality of the program was the main focus, I decided to keep the 2000 game dataset and continue working on the project. Up to this point, everything was done in a very short amount of time, and there is still much room for improvement, many aspects of which I am currently working on.<br /><br />
Main goals going forward include implementation of a UI or web page, having the most up-to-date database, and adding quality-of-life adjustments such as more filters and wishlist history, to name a couple.

# ----------------------------------------------------------------------------------------------------------