Author: Joshua Williams

The purpose of this code is to generate csv files that quantify the appearence of certain attributes within our top 1000 well recieved games (data4.csv). Its purpose is also to make sure that these csv files are easily convertible to Chartify charts for plots_for_top_1000.ipynb.

In [6]:
def display_csv(fname):
    '''
    Function to display a dataframe from a file name (this file must have the .csv extension!)
    fname: the name of a csv to be displayed 
    '''
    import pandas as pd
    import json
    from IPython.display import display, HTML

    assert isinstance(fname, str) and '.csv' in fname, 'Input is not a valid name of a csv file'
    
    df = pd.read_csv(fname)
    display(df)
display_csv(fname='data4.csv')

Unnamed: 0.1,Unnamed: 0,title,all_ratio,genres,specs,platform,developer,positive_review,positive_review_rank
0,37316,Trine 2: Complete Story,"95% of the 10,956 user reviews for this game a...","['Action', 'Adventure', 'Indie']","['Single-player', 'Multi-player', 'Co-op', 'Sh...","Linux,Mac,Windows,",FrozenbytePublisher:FrozenbyteRelease Date: Ju...,939.0,1
1,32949,The Legend of Heroes: Trails in the Sky SC,97% of the 963 user reviews for this game are ...,['RPG'],"['Single-player', 'Steam Achievements', 'Steam...","Windows,","Nihon FalcomPublisher:XSEED Games, Marvelous U...",934.0,2
2,29628,Mad Father,96% of the 972 user reviews for this game are ...,"['Adventure', 'Indie']","['Single-player', 'Steam Achievements', 'Steam...","Windows,","senPublisher:AGM PLAYISMRelease Date: Sep 22, ...",933.0,3
3,13710,Cube Escape: Paradox,96% of the 970 user reviews for this game are ...,"['Adventure', 'Indie']","['Single-player', 'Steam Achievements', 'Steam...","Mac,Windows,",Rusty LakePublisher:Rusty LakeFranchise:Rusty ...,931.0,4
4,2323,Higurashi When They Cry Hou - Ch.1 Onikakushi,96% of the 966 user reviews for this game are ...,['Adventure'],"['Single-player', 'Steam Achievements', 'Steam...","Linux,Mac,Windows,",07th ExpansionPublisher:MangaGamerRelease Date...,927.0,5
5,74,Civilization IV: Beyond the Sword,96% of the 964 user reviews for this game are ...,['Strategy'],['Single-player'],"Mac,Windows,",Firaxis GamesPublisher:2KFranchise:Sid Meier's...,925.0,6
6,34756,Hero of the Kingdom II,94% of the 978 user reviews for this game are ...,"['Adventure', 'Casual', 'Indie', 'RPG']","['Single-player', 'Steam Achievements', 'Steam...","Linux,Mac,Windows,",Lonely TroopsPublisher:Lonely TroopsRelease Da...,919.0,7
7,30226,ACE Academy,95% of the 967 user reviews for this game are ...,"['Action', 'Adventure', 'Indie', 'RPG', 'Simul...","['Single-player', 'Steam Cloud']","SteamOS,Linux,Mac,Windows,",PixelFade IncPublisher:PixelFade IncRelease Da...,919.0,7
8,37314,X3: Albion Prelude,92% of the 988 user reviews for this game are ...,"['Action', 'Simulation']","['Single-player', 'Downloadable Content', 'Ste...","Linux,Mac,Windows,","EgosoftPublisher:EgosoftRelease Date: Dec 15, ...",909.0,9
9,36772,Papo &amp; Yo,95% of the 954 user reviews for this game are ...,"['Adventure', 'Indie']","['Single-player', 'Steam Achievements', 'Full ...","Linux,Mac,Windows,",Minority Media Inc.Publisher:Minority MediaRel...,906.0,10


In [2]:
def countPlatform(fname='data4.csv'):
    '''
    Function to count how many games support each platform (Windows, Mac, Linux)
    
    (fname): A string name of a dataframe
    '''
    import pandas as pd
    assert isinstance(fname, str) and '.csv' in fname, 'fname is not a string name of a .csv file'
    
    df = pd.read_csv(fname)
    
    platforms = df['platform'].tolist()
    platform_dict = {}
    
    for platform_string in platforms:
        
        platform_list = list(platform_string.split(','))
        platform_list.remove('')
        
        index = 0
        while index < len(platform_list):
            if platform_list[index] in platform_dict.keys():
                platform_dict[platform_list[index]] += 1
            else:
                platform_dict[platform_list[index]] = 1
            index += 1
        
    platform_df = pd.DataFrame(platform_dict, index=["count"])

    return platform_df

In [3]:
def countQuantity(fname='data4.csv', col_name=''):
    '''
    Function to count the quantity of appearencesof an attribute in a specfied column
    NOTE: This function is geared toward columns that have list type items
    
    (fname): A string name of a dataframe
    (col_name): string name of a column in the dataframe
    '''
    import pandas as pd
    import ast
    assert isinstance(fname, str) and '.csv' in fname, 'fname is not a string name of a .csv file'
    assert isinstance(col_name, str), 'col_name is not of string type'
    
    df = pd.read_csv(fname)
    
    assert col_name in df, 'Specified column name does not exist in the specified dataframe!'
    
    col_val_list = df[col_name].tolist()
    column_dict = {}
    
    for col_val in col_val_list:
        if pd.isnull(col_val):
            continue
        
        # Converts string of a list to a list
        col_val = ast.literal_eval(col_val)
        
        for val in list(col_val):
            if val in column_dict.keys():
                column_dict[val] += 1
            else:
                column_dict[val] = 1
        
    column_df = pd.DataFrame(column_dict, index=["count"])

    column_df = column_df.sort_values('count', axis=1, ascending=False)
    
    return column_df

In [9]:
def countPublisher(fname='data4.csv'):
    '''
    Function to count the quantity of appearences of a developer in our data
    
    (fname): A string name of a dataframe
    '''
    import pandas as pd
    import re
    assert isinstance(fname, str) and '.csv' in fname, 'fname is not a string name of a .csv file'
    
    df = pd.read_csv(fname)
    
    developers = df['developer'].tolist()
    developer_dict = {}
    
    for developer in developers:
        # Get the publisher name between these two strings
        start = 'Publisher:'
        end = 'Release Date:'
        
        if pd.isnull(developer):
            continue
        
        developer = developer[developer.find(start)+len(start):developer.find(end)]
        
        if developer in developer_dict.keys():
            developer_dict[developer] += 1
        else:
            developer_dict[developer] = 1
        
    developer_df = pd.DataFrame(developer_dict, index=["count"])
    developer_df = developer_df.sort_values('count', axis=1, ascending=False)
    
    return developer_df

In [10]:
def main():
    '''
    Author: Joshua Williams
    '''
    import pandas as pd
    
    # Get platform dataframe
    df_platform = countPlatform()
    # Save to csv
    #df_platform.to_csv('platform_count.csv')
    print('Platform Dataframe')
    display_csv('platform_count.csv')
    
    
    # Get specs dataframe
    df_specs = countQuantity(col_name='specs')
    # Save to csv
    #df_specs.to_csv('specs_count.csv')
    print('Specs Dataframe')
    display_csv('specs_count.csv')
    
              
    # Get genres dataframe
    df_genres = countQuantity(col_name='genres')
    # Save to csv
    #df_genres.to_csv('genres_count.csv')            
    print('Genres Dataframe')
    display_csv('genres_count.csv')
    
    
    # Get developers dataframe
    df_developers = countDeveloper()
    # Save to csv
    #df_developers.to_csv('developers_count.csv')
    print('Publisher Dataframe')
    display_csv('developers_count.csv')
    
if __name__ == "__main__":
    main()

Platform Dataframe


Unnamed: 0.1,Unnamed: 0,Linux,Mac,Windows,SteamOS
0,count,336,465,1000,41


Specs Dataframe


Unnamed: 0.1,Unnamed: 0,Single-player,Steam Achievements,Steam Trading Cards,Steam Cloud,Multi-player,Full controller support,Partial Controller Support,Steam Leaderboards,Co-op,...,Windows Mixed Reality,Keyboard / Mouse,Gamepad,Valve Anti-Cheat enabled,Commentary available,SteamVR Collectibles,Includes Source SDK,Steam Turn Notifications,Mods,Mods (require HL2)
0,count,911,680,547,437,286,257,209,190,138,...,18,15,13,11,10,7,5,4,1,1


Genres Dataframe


Unnamed: 0.1,Unnamed: 0,Indie,Action,Adventure,Strategy,RPG,Casual,Simulation,Free to Play,Early Access,...,Sports,Racing,Utilities,Animation &amp; Modeling,Design &amp; Illustration,Video Production,Education,Software Training,Game Development,Audio Production
0,count,619,428,335,228,213,194,189,131,69,...,42,42,7,3,2,1,1,1,1,1


Developers Dataframe


Unnamed: 0.1,Unnamed: 0,Paradox Interactive,Daedalic Entertainment,Devolver Digital,Ubisoft,SEGA,Electronic Arts,SCS Software,Sekai Project,AGM PLAYISM,...,Abrakam SA,DANKIE,Screeps,The Working Parts,Frogsong Studios AB,Hangover Cat Purrroduction,Titans,Lucky Pause,ZuoBuLaiGame,Pirotexnik
0,count,18,13,12,12,10,8,8,8,7,...,1,1,1,1,1,1,1,1,1,1
