# <u>Similar Game Finder :D</u>

**API REFERENCES**<br />
https://partner.steamgames.com/doc/webapi                          - Valve's steam API<br />
https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI     - API Docs<br />
https://steamapi.xpaw.me/#<br />
https://steamspy.com/api.php                                       - Popularity and sales<br />

**CODE REFERENCES**<br />
https://nik-davis.github.io/posts/2019/steam-data-collection/<br />
https://www.machinelearningplus.com/nlp/gensim-tutorial/#3howtocreateadictionaryfromalistofsentences  - gensim <br />

In [31]:
# standard library imports
import csv
import datetime as dt
import json
import os
import statistics
import time
from collections import Counter

# third-party imports
import numpy as np
import pandas as pd
import requests
from requests.exceptions import SSLError

# For regex
import re

# Make sure to install spacy, along with it's medium or large dataset
import spacy

# Make sure to install gensim
import gensim
from gensim.models import Word2Vec
from gensim import corpora
from gensim.test.utils import common_texts

In [32]:
# customisations - ensure tables show all columns
pd.set_option("max_columns", 100)

# All-Purpose Function to Get API Requests

In [33]:
def get_request(url, parameters=None):
    """Return json-formatted response of a get request using optional parameters.

    Parameters
    ----------
    url : string
    parameters : {'parameter': 'value'}
        parameters to pass as part of get request

    Returns
    -------
    json_data
        json-formatted response (dict-like)
    """
    try:
        response = requests.get(url=url, params=parameters)
    except SSLError as s:
        print('SSL Error:', s)

        for i in range(5, 0, -1):
            print('\rWaiting... ({})'.format(i), end='')
            time.sleep(1)
        print('\rRetrying.' + ' ' * 10)

        # recusively try again
        return get_request(url, parameters)

    if response:
        return response.json()
    else:
        # response is none usually means too many requests. Wait and try again
        print('No response, waiting 10 seconds...')
        time.sleep(10)
        print('Retrying.')
        return get_request(url, parameters)

# Generate List of App IDs Using SteamSpy

In [34]:
# url = "https://steamspy.com/api.php"
# parameters = {"request": "all"}

# request 'all' from steam spy and parse into dataframe
# json_data = get_request(url, parameters=parameters)
# steam_spy_all = pd.DataFrame.from_dict(json_data, orient='index')

# generate sorted app_list from steamspy data
# app_list = steam_spy_all[['appid', 'name']].sort_values('appid').reset_index(drop=True)

# export can be disabled to keep consistency across download sessions
# app_list.to_csv('../downloads/app_list.csv', index=False)

# instead read from stored csv
cwd = os.getcwd()
app_list = pd.read_csv('../downloads/kaggle_steam_dataset/steam.csv')

# display first few rows
app_list.head()

print(len(app_list))

27075


# Define Download Logic

Now that we have app_list, we can iterate over it to request individual app data from the srevers.
This is where we set the logic for that, and store the end data as a csv.
Since it takes a long time to retreive the data, we cannot attempt it all in one go.
We shall define a function to download and process the requests in batches, appending each one to an external file, and keeping track of the highest index in another.

This provides security (easy restart), and means we can complete the download over multiple sessions.

In [35]:
def get_app_data(start, stop, parser, pause):
    """Return list of app data generated from parser.
    
    parser : function to handle request
    """
    app_data = []
    
    # iterate through each row of app_list, confined by start and stop
    for index, row in app_list[start:stop].iterrows():
        print('Current index: {}'.format(index), end='\r')
        
        appid = row['appid']
        name = row['name']

        # retrive app data for a row, handled by supplied parser, and append to list
        data = parser(appid, name)
        app_data.append(data)

        time.sleep(pause) # prevent overloading api with requests
    
    return app_data

In [36]:
def process_batches(parser, app_list, download_path, data_filename, index_filename,
                    columns, begin=0, end=-1, batchsize=100, pause=1):
    """Process app data in batches, writing directly to file.
    
    parser : custom function to format request
    app_list : dataframe of appid and name
    download_path : path to store data
    data_filename : filename to save app data
    index_filename : filename to store highest index written
    columns : column names for file
    
    Keyword arguments:
    
    begin : starting index (get from index_filename, default 0)
    end : index to finish (defaults to end of app_list)
    batchsize : number of apps to write in each batch (default 100)
    pause : time to wait after each api request (defualt 1)
    
    returns: none
    """
    print('Starting at index {}:\n'.format(begin))
    
    # by default, process all apps in app_list
    if end == -1:
        end = len(app_list) + 1
    
    # generate array of batch begin and end points
    batches = np.arange(begin, end, batchsize)
    batches = np.append(batches, end)
    
    apps_written = 0
    batch_times = []
    
    for i in range(len(batches) - 1):
        start_time = time.time()
        
        start = batches[i]
        stop = batches[i+1]
        
        app_data = get_app_data(start, stop, parser, pause)
        
        rel_path = os.path.join(download_path, data_filename)
        
        # writing app data to file
        with open(rel_path, 'a', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=columns, extrasaction='ignore')
            
            for j in range(3,0,-1):
                print("\rAbout to write data, don't stop script! ({})".format(j), end='')
                time.sleep(0.5)
            
            writer.writerows(app_data)
            print('\rExported lines {}-{} to {}.'.format(start, stop-1, data_filename), end=' ')
            
        apps_written += len(app_data)
        
        idx_path = os.path.join(download_path, index_filename)
        
        # writing last index to file
        with open(idx_path, 'w') as f:
            index = stop
            print(index, file=f)
            
        # logging time taken
        end_time = time.time()
        time_taken = end_time - start_time
        
        batch_times.append(time_taken)
        mean_time = statistics.mean(batch_times)
        
        est_remaining = (len(batches) - i - 2) * mean_time
        
        remaining_td = dt.timedelta(seconds=round(est_remaining))
        time_td = dt.timedelta(seconds=round(time_taken))
        mean_td = dt.timedelta(seconds=round(mean_time))
        
        print('Batch {} time: {} (avg: {}, remaining: {})'.format(i, time_td, mean_td, remaining_td))
            
    print('\nProcessing batches complete. {} apps written'.format(apps_written))

Next, we need functions to handle and prepare the external files

**reset_index** is used for testing and demonstration; setting the index in the stored file to 0 will restart the download process

**get_index** retreives the index from file.

**prepare_data_file** readies the CSV for storing data. If index is 0, we need a blank csv. Otherwise, leave CSV alone.

In [37]:
def reset_index(download_path, index_filename):
    """Reset index in file to 0."""
    rel_path = os.path.join(download_path, index_filename)
    
    with open(rel_path, 'w') as f:
        print(0, file=f)
        

def get_index(download_path, index_filename):
    """Retrieve index from file, returning 0 if file not found."""
    try:
        rel_path = os.path.join(download_path, index_filename)

        with open(rel_path, 'r') as f:
            index = int(f.readline())
    
    except FileNotFoundError:
        index = 0
        
    return index


def prepare_data_file(download_path, filename, index, columns):
    """Create file and write headers if index is 0."""
    if index == 0:
        rel_path = os.path.join(download_path, filename)

        with open(rel_path, 'w', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=columns)
            writer.writeheader()

# ----------------------------------------------------------------------------------------------------------

# Download Steam Data
- NOTE: Currently only using 2000 games from the 27000 game database.

In [38]:
def parse_steam_request(appid, name):
    """Unique parser to handle data from Steam Store API.
    
    Returns : json formatted data (dict-like)
    """
    url = "http://store.steampowered.com/api/appdetails/"
    parameters = {"appids": appid}
    
    json_data = get_request(url, parameters=parameters)
    json_app_data = json_data[str(appid)]
    
    if json_app_data['success']:
        data = json_app_data['data']
    else:
        data = {'name': name, 'steam_appid': appid}
        
    return data

In [39]:
# Set file parameters
download_path = '../downloads/kaggle_steam_dataset'
steam_description_data = 'steam_app_data.csv'
steam_index = 'steam_index.txt'

steam_columns = [
    'type', 'name', 'steam_appid', 'required_age', 'is_free', 'controller_support',
    'dlc', 'detailed_description', 'about_the_game', 'short_description', 'fullgame',
    'supported_languages', 'header_image', 'website', 'pc_requirements', 'mac_requirements',
    'linux_requirements', 'legal_notice', 'drm_notice', 'ext_user_account_notice',
    'developers', 'publishers', 'demos', 'price_overview', 'packages', 'package_groups',
    'platforms', 'metacritic', 'reviews', 'categories', 'genres', 'screenshots',
    'movies', 'recommendations', 'achievements', 'release_date', 'support_info',
    'background', 'content_descriptors'
]

##### NOTE: DO NOT RUN THIS CELL UNLESS YOU WANT TO KEEP ITERATING THROUGH GAME LIST AND WRITING TO CSV (Long Process)

In [None]:
# Overwrites last index for demonstration (would usually store highest index so can continue across sessions)
# reset_index(download_path, steam_index)

# Retrieve last index downloaded from file
# index = get_index(download_path, steam_index)

# # Wipe or create data file and write headers if index is 0
# prepare_data_file(download_path, steam_description_data, index, steam_columns)

# # Set end and chunksize for demonstration - remove to run through entire app list
# process_batches(
#     parser=parse_steam_request,
#     app_list=app_list,
#     download_path=download_path,
#     data_filename=steam_description_data,
#     index_filename=steam_index,
#     columns=steam_columns,
#     begin=index,
#     end=len(app_list),
#     batchsize=1000
# )

In [50]:
# inspect downloaded data
steam_game_data = pd.read_csv(download_path + '/' + steam_description_data)
steam_game_data.head()

# print(len(steam_game_data))

Unnamed: 0,type,name,steam_appid,required_age,is_free,controller_support,dlc,detailed_description,about_the_game,short_description,fullgame,supported_languages,header_image,website,pc_requirements,mac_requirements,linux_requirements,legal_notice,drm_notice,ext_user_account_notice,developers,publishers,demos,price_overview,packages,package_groups,platforms,metacritic,reviews,categories,genres,screenshots,movies,recommendations,achievements,release_date,support_info,background,content_descriptors
0,game,Counter-Strike,10,0,False,,,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,,"English<strong>*</strong>, French<strong>*</st...",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'JPY', 'initial': 101000, 'final'...","[574941, 7]","[{'name': 'default', 'title': 'Buy Counter-Str...","{'windows': True, 'mac': True, 'linux': True}","{'score': 88, 'url': 'https://www.metacritic.c...",,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 105098},,"{'coming_soon': False, 'date': '1 Nov, 2000'}","{'url': 'http://steamcommunity.com/app/10', 'e...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [2, 5], 'notes': 'Includes intense vio..."
1,game,Team Fortress Classic,20,0,False,,,One of the most popular online action games of...,One of the most popular online action games of...,One of the most popular online action games of...,,"English, French, German, Italian, Spanish - Sp...",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'CAD', 'initial': 569, 'final': 5...",[29],"[{'name': 'default', 'title': 'Buy Team Fortre...","{'windows': True, 'mac': True, 'linux': True}",,,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 3972},,"{'coming_soon': False, 'date': '1 Apr, 1999'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [2, 5], 'notes': 'Includes intense vio..."
2,game,Day of Defeat,30,0,False,,,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,,"English, French, German, Italian, Spanish - Spain",https://cdn.akamai.steamstatic.com/steam/apps/...,http://www.dayofdefeat.com/,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'CAD', 'initial': 569, 'final': 5...",[30],"[{'name': 'default', 'title': 'Buy Day of Defe...","{'windows': True, 'mac': True, 'linux': True}","{'score': 79, 'url': 'https://www.metacritic.c...",,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 2879},,"{'coming_soon': False, 'date': '1 May, 2003'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
3,game,Deathmatch Classic,40,0,False,,,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,,"English, French, German, Italian, Spanish - Sp...",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Valve'],['Valve'],,"{'currency': 'CAD', 'initial': 569, 'final': 5...",[31],"[{'name': 'default', 'title': 'Buy Deathmatch ...","{'windows': True, 'mac': True, 'linux': True}",,,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 1350},,"{'coming_soon': False, 'date': '1 Jun, 2001'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
4,game,Half-Life: Opposing Force,50,0,False,,,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,,"English, French, German, Korean",https://cdn.akamai.steamstatic.com/steam/apps/...,,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...",,,,['Gearbox Software'],['Valve'],,"{'currency': 'CAD', 'initial': 569, 'final': 5...",[32],"[{'name': 'default', 'title': 'Buy Half-Life: ...","{'windows': True, 'mac': True, 'linux': True}",,,"[{'id': 2, 'description': 'Single-player'}, {'...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 9185},,"{'coming_soon': False, 'date': '1 Nov, 1999'}","{'url': 'https://help.steampowered.com', 'emai...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"


# ----------------------------------------------------------------------------------------------------------

# Downloading SteamSpy Data
 - NOTE: Currently only using 2000 games from the 27000 game database.

In [42]:
def parse_steamspy_request(appid, name):
    """Parser to handle SteamSpy API data."""
    url = "https://steamspy.com/api.php"
    parameters = {"request": "appdetails", "appid": appid}
    
    json_data = get_request(url, parameters)
    return json_data

In [43]:
steamspy_data = 'steamspy_data.csv'
steamspy_index = 'steamspy_index.txt'

steamspy_columns = [
    'appid', 'name', 'developer', 'publisher', 'score_rank', 'positive',
    'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks',
    'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount',
    'languages', 'genre', 'ccu', 'tags'
]

##### NOTE: DO NOT RUN THIS CELL UNLESS YOU WANT TO KEEP ITERATING THROUGH GAME LIST AND WRITING TO CSV (Long Process)

In [18]:
# ONLY UNCOMMENT THE LINE BELOW TO RESTART THE PROCESS OF WRITING ALL THE GAME DATA TO CSV
# reset_index(download_path, steamspy_index)

# index = get_index(download_path, steamspy_index)

# # Wipe data file if index is 0
# prepare_data_file(download_path, steamspy_data, index, steamspy_columns)

# process_batches(
#     parser=parse_steamspy_request,
#     app_list=app_list,
#     download_path=download_path, 
#     data_filename=steamspy_data,
#     index_filename=steamspy_index,
#     columns=steamspy_columns,
#     begin=index,
#     end=len(app_list),
#     batchsize=1000,
#     pause=0.1
# )

In [49]:
# inspect downloaded steamspy data
all_steamspy_game_data = pd.read_csv(download_path + '/' + steamspy_data)
all_steamspy_game_data.head()

# print(len(all_steamspy_game_data))

Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,languages,genre,ccu,tags
0,10,Counter-Strike,Valve,Valve,,176229,4525,0,"10,000,000 .. 20,000,000",9394,128,264,93,999,999,0,"English, French, German, Italian, Spanish - Sp...",Action,15469,"{'Action': 5347, 'FPS': 4763, 'Multiplayer': 3..."
1,20,Team Fortress Classic,Valve,Valve,,4897,829,0,"2,000,000 .. 5,000,000",153,4,17,4,499,499,0,"English, French, German, Italian, Spanish - Sp...",Action,76,"{'Action': 741, 'FPS': 302, 'Multiplayer': 254..."
2,30,Day of Defeat,Valve,Valve,,4679,521,0,"5,000,000 .. 10,000,000",283,0,15,0,499,499,0,"English, French, German, Italian, Spanish - Spain",Action,137,"{'FPS': 780, 'World War II': 245, 'Multiplayer..."
3,40,Deathmatch Classic,Valve,Valve,,1717,376,0,"5,000,000 .. 10,000,000",389,0,9,0,499,499,0,"English, French, German, Italian, Spanish - Sp...",Action,5,"{'Action': 627, 'FPS': 137, 'Classic': 105, 'M..."
4,50,Half-Life: Opposing Force,Gearbox Software,Valve,,11062,567,0,"5,000,000 .. 10,000,000",417,40,204,40,499,499,0,"English, French, German, Korean",Action,114,"{'FPS': 874, 'Action': 316, 'Sci-fi': 242, 'Cl..."


# ----------------------------------------------------------------------------------------------------------

# Get Game Descriptions

 - Not currently in use, but is planned to be used for comparing a given description to other game descriptions using NLP

In [14]:
# Grab game descriptions

# TODO: VECTORIZE

game_description_list = []

for description in steam_game_data['detailed_description']:
    game_description_list.append(description)

# ---------- Debug ------------
# print(game_description_list[0])
# for d in game_description_list:
#     if "sport" in d:
#         print(d)

In [15]:
# Tokenize (split each game's desecription into words)
steam_game_texts = [[text for text in description.split()] for description in game_description_list]

# Create dictionary
steam_game_dictionary = corpora.Dictionary(steam_game_texts)

print(steam_game_dictionary)

steam_corpus = [steam_game_dictionary.doc2bow(doc, allow_update=True) for doc in steam_game_texts]

word_counts = [[(steam_game_dictionary[id], count) for id, count in line] for line in steam_corpus]

Dictionary(72797 unique tokens: ['1', 'Ally', 'Engage', 'Play', 'Rescue']...)


# ----------------------------------------------------------------------------------------------------------

# Create Similarity Vectors Manually

In [45]:
# Grab tags column
tags = all_steamspy_game_data['tags']
print(tags[500])

{'Casual': 36, 'Puzzle': 24, 'Match 3': 23, 'RPG': 16, 'Fantasy': 11, 'Singleplayer': 6}


In [46]:
# Create a list of all UNIQUE game tags

taglist = []

# TODO: VECTORIZE THIS CODE
for tag in tags:
    current_game_tags = re.findall("'([A-Za-z&\s'\-0-9]*)'", tag)
    for current_tag in current_game_tags:
        if current_tag in taglist:
            continue
        taglist.append(current_tag)

print(taglist)

['Action', 'FPS', 'Multiplayer', 'Shooter', 'Classic', 'Team-Based', 'First-Person', 'Competitive', 'Tactical', 'e-sports', 'PvP', 'Old School', 'Military', 'Strategy', 'Survival', 'Score Attack', '1980s', 'Assassin', 'Violent', 'Hero Shooter', 'Class-Based', 'Co-op', 'Fast-Paced', 'Retro', 'Online Co-Op', 'Mod', 'Remake', 'Funny', 'World War II', 'War', 'Historical', 'Singleplayer', 'Difficult', 'World War I', 'Arena Shooter', 'Sci-fi', 'Gore', 'Aliens', 'Adventure', 'Atmospheric', 'Story Rich', 'Silent Protagonist', 'Great Soundtrack', 'Puzzle', 'Moddable', 'Space', 'Cyberpunk', 'Memes', 'Platformer', 'Psychological Horror', 'Conspiracy', '3D', 'Cult Classic', 'Sports', 'Nudity', 'Action-Adventure', 'Open World', 'Dark', 'Simulation', 'Zombies', 'Short', 'Dystopian ', 'Physics', 'Horror', 'Sandbox', 'Realistic', 'Massively Multiplayer', 'Comedy', 'Futuristic', 'Benchmark', 'Free to Play', 'Post-apocalyptic', 'Episodic', 'Cinematic', 'Puzzle-Platformer', '3D Platformer', 'Female Prota

In [162]:
# TODO: VECTORIZE

"""
Here, we are going through our unique taglist, counting the number of times a given tag appears with another tag. 
While doing this, we also count the total number of appearances of a given tag, stored in tag_appearances.
With that information, we then go through tag_similarities (stores a tag and the number of times it appears
with every other tag), dividing the number of times a given tag has appeared with another tag by the total appearances
of that tag.

E.g: If 'Action' appeared with 'FPS' 100 times, and 'Action' itself appeared 200 times, the similarity vector for 
'Action' and 'FPS' would be 0.5.
"""

tag_similarities = {}
tag_appearances = {}
game_info = {}

count = 0
for tag in taglist:
    for game_tags in all_steamspy_game_data['tags']:
        if tag in game_tags:
            if not tag in tag_appearances:
                tag_appearances[tag] = 1
            else:
                tag_appearances[tag] += 1
            if not tag in tag_similarities:
                tag_similarities[tag] = {}
            current_tag_group = re.findall("'([A-Za-z&\s'\-0-9]*)'", game_tags)
            if tag in current_tag_group:
                current_tag_group.remove(tag)
            for current_tag in current_tag_group:
                if current_tag in tag_similarities[tag]:
                    tag_similarities[tag][current_tag] += 1
                else:
                    tag_similarities[tag][current_tag] = 1
                
    count += 1


# Store game ids, number of owners, and price (manually formatted). Will be used to create link to game page.
count = 0
for game in all_steamspy_game_data['name']:
    if not game in game_info:
        game_info[game] = {}
        game_info[game]['appid'] = all_steamspy_game_data['appid'][count]
        game_info[game]['owners'] = re.findall("[0-9,]+", all_steamspy_game_data['owners'][count].replace(",", ""))
        
        unformatted_price = str(all_steamspy_game_data['price'][count])
        price_len = len(unformatted_price)
        game_info[game]['price'] = unformatted_price[0:price_len - 2] + "." + unformatted_price[price_len-2:price_len]
        
        game_info[game]['reviews'] = all_steamspy_game_data['positive'][count] + all_steamspy_game_data['negative'][count]
        
    count += 1

    
print(game_info)
# Turn number of correlations into a fraction (divide by total appearances of the tag)
for tag in tag_similarities:
    for sub_tag in tag_similarities[tag]:
        current_tag_correlations = tag_similarities[tag][sub_tag]
        tag_similarities[tag][sub_tag] = current_tag_correlations / tag_appearances[tag]

        
# Printing all similarity vectors
# for tag in tag_similarities:
#     print(tag, ":")
#     for sub_tag in tag_similarities[tag]:
#         print(sub_tag, " --> ", tag_similarities[tag][sub_tag])
#     print("\n-----------------------------------------------------------------\n")



In [191]:
# Tags we will give to our game
our_tags = ["Puzzle platformer", "Horror", "Story rich", "Dark", "2D", "Platformer", "Puzzle"]

# ONLY compare to games that include this list of tags. If empty, comapre to all.
must_include = ["2D", "Platformer"]

# TODO: Implement weights ?
weights = []

num_games_to_show = 60

game_similarities = {}
current_game_similarity_score = 0
count = 0
continue_comparing = True;

"""
Go through all games in our csv, incrementing current_game_similarity_score by the corresponding similarity vector stored in 
tag_similarities, for each tag in our_tags, to each tag of a game. 
"""
for current_game_tags in all_steamspy_game_data['tags']:
    current_game_tags_tidy = re.findall("'([A-Za-z&\s'\-0-9]*)'", current_game_tags)
    if must_include:
        for must_include_tag in must_include:
            if not must_include_tag in current_game_tags_tidy:
                continue_comparing = False;
                break;
    if continue_comparing == False:
        continue_comparing = True;
        current_game_similarity_score = 0
        game_similarities[all_steamspy_game_data['name'][count]] = current_game_similarity_score
        count += 1
        continue;
    for our_tag in our_tags:
        if our_tag in tag_similarities:
            if our_tag in current_game_tags_tidy:
                current_game_similarity_score += 1
            for current_game_tag in current_game_tags_tidy:
                if current_game_tag in tag_similarities[our_tag]:
                    current_game_similarity_score += tag_similarities[our_tag][current_game_tag]
    game_similarities[all_steamspy_game_data['name'][count]] = current_game_similarity_score
    current_game_similarity_score = 0
    count += 1
    

# Printing the 60 most similar games
print("\t\t\t----------------------------------------------------\n"\
      f"\t\t\t\t\tTOP {num_games_to_show} SIMILAR GAMES:\n"\
      "\t\t\t----------------------------------------------------\n"\
      "\t\t\t\t\t(Prices are in USD)\n"\
      f"\t\t\t\t Searched a database of {len(all_steamspy_game_data)} games;\n\n"
      "- Based on the tags:\t\t\t", end="")
count = 1
for tag in our_tags:
    if count < len(our_tags):
        print(f"{tag}, ", end="")
    else:
        print(f"{tag}")
    count += 1

count = 1
if must_include:
    if len(must_include) > 1:
        print(f"- Games shown had to include the tags:\t", end="")
        for tag in must_include:
            if count < len(must_include):
                print(f"{tag}, ", end="")
            else:
                print(f"{tag}\n")
            count += 1
    else:
        print(f"- Games shown had to include the tag:\t{must_include[0]}\n")
else:
    print("\n")


# review_revenues are ONLY accurate if 3% of owners reviewed the game.
# This is not the case for many games.
k = Counter(game_similarities)
high_similarity = k.most_common(60)
count = 0
for game in high_similarity:
    
    game_name = game[0]
    game_score = game[1]
    if game_score > 0:
        count += 1
        
        game_app_id = game_info[game_name]['appid']
        game_price = game_info[game_name]['price']
        num_owners_floor = game_info[game_name]['owners'][0]
        num_owners_ceiling = game_info[game_name]['owners'][1]
        owners_revenue_floor = int(float(num_owners_floor) * float(game_price))
        owners_revenue_ceiling = int(float(num_owners_ceiling) * float(game_price))
        num_reviews = game_info[game_name]['reviews']
        review_revenue_floor = int(float(num_reviews) * 30 * float(game_price))
        review_revenue_ceiling = int(float(num_reviews) * 50 * float(game_price))
        
        print("({})\t{}\n" \
        "\tScore:\t\t\t\t{:.4f}\n" \
        "\tSteam Page:\t\t\thttps://store.steampowered.com/app/{}/\n" \
        "\tPrice:\t\t\t\t${:,.2f}\n" \
        "\tNumber of Owners:\t\t{:,} - {:,}\n" \
        "\tRevenue Based on Owners:\t${:,} - ${:,}\n" \
        "\tNumber of Reviews:\t\t{:,}\n" \
        "\tRevenue Based on Reviews:\t${:,} - ${:,}\n".format(count, game_name, game_score, game_app_id, float(game_price), int(num_owners_floor), int(num_owners_ceiling), owners_revenue_floor, owners_revenue_ceiling, num_reviews, review_revenue_floor, review_revenue_ceiling))

			----------------------------------------------------
					TOP 60 SIMILAR GAMES:
			----------------------------------------------------
					(Prices are in USD)
				 Searched a database of 2000 games;
- Based on the tags:			Puzzle platformer, Horror, Story rich, Dark, 2D, Platformer, Puzzle
- Games shown had to include the tags:	2D, Platformer

(1)	Trine Enchanted Edition
	Score:				32.2963
	Steam Page:			https://store.steampowered.com/app/35700/
	Price:				$14.99
	Number of Owners:		2,000,000 - 5,000,000
	Revenue Based on Owners:	$29,980,000 - $74,950,000
	Number of Reviews:		12,522
	Revenue Based on Reviews:	$5,631,143 - $9,385,239

(2)	Trine 2: Complete Story
	Score:				32.1608
	Steam Page:			https://store.steampowered.com/app/35720/
	Price:				$4.99
	Number of Owners:		2,000,000 - 5,000,000
	Revenue Based on Owners:	$9,980,000 - $24,950,000
	Number of Reviews:		19,633
	Revenue Based on Reviews:	$2,939,060 - $4,898,433

(3)	Life Goes On: Done to Death
	Score:				31.8766
	Steam Pa

 ### IMPROVEMENTS
 - Add WEIGHTS that user can define for each tag?
 - Be more lenient towards similar tags? (e.g, "2D" and "2D Platformer")
 - Add option to filter date created?
 - Add option to add a description; use NLP to compare similarity?
 - UPDATE DATABASE

# ----------------------------------------------------------------------------------------------------------

# Other Solutions

Empty for now.

# ----------------------------------------------------------------------------------------------------------

# Gensim Word Embeddings
 - Not currently needed, may come in handy in the future

In [20]:
# # Make directory if it does not exist
# path = "./models/"

# if os.path.exists(path) == False:
#     try:
#         os.mkdir(path)
#     except OSError:
#         print ("Creation of the directory %s failed" % path)
#     else:
#         print ("Successfully created the directory %s " % path)
        
# filename = './downloads/GoogleNews-vectors-negative300.bin.gz'
# google_news_model = gensim.models.KeyedVectors.load_word2vec_format(filename, binary=True)
# google_news_model.save("./models/google_news.model")