# Steam Game Recommendation System
### Author: Randy Nguyen
#### Class: CIS 678-01

## Project Introduction and Motivation

Recommendation systems were a concept that I had heard about on multiple occasions but have yet to create. By searching for recommendation system datasets, I found the University of California San Diego's Computer Science website, https://cseweb.ucsd.edu/~jmcauley/datasets.html , where they posted several datasets including one about video games. I liked the idea of building a program that I could personally use so I followed through with this concept. However, the dataset from UCSD's website was not in proper JSON format so I searched for datasets elsewhere.

In [1]:
import pandas as pd 
import numpy as np
import pprint
pp = pprint.PrettyPrinter(indent = 4)

## The Data
The dataset used in this project was found on Kaggle at https://www.kaggle.com/tamber/steam-video-games. It includes around 12,000 users and 5,400 games and the user interactions with those games. The **action** column has purchase and play and the **hours** column will be 1 if the action was purchase and the number of hours played if the action was play.<br>

This recommendation system will use the purchase data.

In [2]:
df = pd.read_csv("steam-200k.csv", sep = ",")
df.describe()

Unnamed: 0,UserID,Hours
count,200000.0,200000.0
mean,103655900.0,17.874384
std,72080740.0,138.056952
min,5250.0,0.1
25%,47384200.0,1.0
50%,86912010.0,1.0
75%,154230900.0,1.3
max,309903100.0,11754.0


In [3]:
df.head()

Unnamed: 0,UserID,GameName,Action,Hours
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0
1,151603712,The Elder Scrolls V Skyrim,play,273.0
2,151603712,Fallout 4,purchase,1.0
3,151603712,Fallout 4,play,87.0
4,151603712,Spore,purchase,1.0


In [4]:
# Filter dataset to only include purchase data
# https://cmdlinetips.com/2018/02/how-to-subset-pandas-dataframe-based-on-values-of-a-column/
isPurchase = df['Action'] == "purchase"
dfPurchase = df[isPurchase]
dfPurchase.head()

Unnamed: 0,UserID,GameName,Action,Hours
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0
2,151603712,Fallout 4,purchase,1.0
4,151603712,Spore,purchase,1.0
6,151603712,Fallout New Vegas,purchase,1.0
8,151603712,Left 4 Dead 2,purchase,1.0


## The Method
This recommender system will use Jaccard's Similarity as the metric for defining how similar two users are. After compiling a list of X similar users. The system will recommend the Y most purchased games.

In [5]:
# Create a utilty matrix where the users are the rows and the games are the columns.
# Fill in the matrix with 0 where there are blanks.
utilityMatrix = dfPurchase.pivot_table(index='UserID', columns='GameName', values='Hours').fillna(0)

# Use reset_index() to be able to search table by UserID
# https://stackoverflow.com/questions/21646710/pandas-pivot-table-using-index-data-of-dataframe
utilityMatrix = utilityMatrix.reset_index()
utilityMatrix.head()

GameName,UserID,007 Legends,0RBITALIS,1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby),10 Second Ninja,"10,000,000",100% Orange Juice,1000 Amps,12 Labours of Hercules,12 Labours of Hercules II The Cretan Bull,...,rFactor 2,realMyst,realMyst Masterpiece Edition,resident evil 4 / biohazard 4,rymdkapsel,sZone-Online,samurai_jazz,the static speaks my name,theHunter,theHunter Primal
0,5250,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,76767,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,86540,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,103360,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,144736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [6]:
def getGamesList(user):
    # Turn each row into a list of strings that is each user's purchased games
    userGames = []
    for col in user.columns:
        if(user[col].item() == 1):
            userGames.append(col)
    return userGames

def intersection(list1, list2): 
    # https://www.geeksforgeeks.org/python-intersection-two-lists/
    # Finds the common items between two lists
    list3 = [value for value in list1 if value in list2] 
    return list3

def union(list1, list2): 
    # https://www.geeksforgeeks.org/python-union-two-lists/
    # Use OR operator to avoid duplicates
    list3 = list(set(list1) | set(list2)) 
    return list3

def Diff(list1, list2): 
    # Finds the items in list1 that are not in list2
    list3 = (list(set(list1) - set(list2))) 
    return list3


def countX(list1, x): 
    # Counts the number of times an item appears in a list
    # https://www.geeksforgeeks.org/python-count-occurrences-element-list/
    count = 0
    for item in list1: 
        if (item == x): 
            count = count + 1
    return count 

def jaccard(list1, list2):
    # Calculates the Jaccard Score of two lists
    score = len(intersection(list1, list2)) / len(union(list1, list2))
    return score

def getJaccardScores(list1, list2):
    # Returns a list of Jaccard Scores for list1 against all of list2
    jScores = []
    for user in range(1,len(list2)):
        jScores.append(jaccard(list1, list2[user]))
    return pd.DataFrame(jScores)

def getTopGames(list1, numUsers, numGames):
    topGames = []
    jValues = getJaccardScores(list1, usersGameList)
    jIndex = jValues.sort_values(0, ascending=False).head(numUsers).index
    for i in range(0,len(jIndex)):
        topGames.append(Diff(usersGameList[jIndex[i]+1], list1))
        
    # https://thispointer.com/python-convert-list-of-lists-or-nested-list-to-flat-list/
    flatList = [ item for elem in topGames for item in elem]
    uniqueGames = list(set(flatList))
    
    indices = []
    for game in uniqueGames:
        indices.append(countX(flatList, game))
    indices = pd.DataFrame(indices)
    
    topGames = []
    for i in indices.sort_values(0, ascending=False).head(numGames).index:
        topGames.append(uniqueGames[i])
    return topGames

In [7]:
# Create list of strings for each user that contains their purchased games
usersGameList = []
for user in utilityMatrix.UserID:
    gl = getGamesList(utilityMatrix[utilityMatrix.UserID == user])
    usersGameList.append(gl)

In [8]:
averageUser= ['Grand Theft Auto V', 
             'Kerbal Space Program', 
             'Duck Game', 
             'Call of Duty World at War', 
             'Star Wars - Battlefront II',
             'The Forest', 
             'Goat Similator']
truckUser = ['Farming Simulator 15',
             'Car Mechanic Simulator 2015',
             'Euro Truck Simulator 2',
             'Euro Truck Simulator',
             'Assetto Corsa',
             'Project CARS',
             'Bus Driver',
             'RACE 07',
             'Test Drive Unlimited 2']
spaceUser = ['Kerbal Space Program',
             'Space Engineers',
             'Elite Dangerous',
             'Besiege',
             'Microsoft Flight',
             'Robocraft',
             'XCOM Enemy Unknown',
             'XCOM Enemy Within']

### Testing
The initial test showed that the average user would be recommended most of Steam's popular games regardless of genre. However, the autoUser did show unique results that are DLCs to games they already own. By changing the parameters, the recommendations changed. With fewer similar users, the games closer to the genre would be recommended. The space user received better recommendations in this scenario but the auto user did not see changes other than the order of games recommended. When there were more similar users, both the auto and space user were recommended Steam's popular games.

In [9]:
pp.pprint(getTopGames(averageUser, 50, 4))
pp.pprint(getTopGames(truckUser, 50, 4))
pp.pprint(getTopGames(spaceUser, 50, 4))

['Dota 2', 'Team Fortress 2', 'Counter-Strike Global Offensive', 'Robocraft']
[   'Euro Truck Simulator 2 - Going East!',
    'RACE 07 - Formula RaceRoom Add-On',
    'Euro Truck Simulator 2 - Ice Cold Paint Jobs Pack',
    'Scania Truck Driving Simulator']
['Team Fortress 2', 'Dota 2', 'Unturned', 'War Thunder']


In [11]:
pp.pprint(getTopGames(truckUser, 10, 4))
pp.pprint(getTopGames(spaceUser, 10, 4))

[   'Euro Truck Simulator 2 - Ice Cold Paint Jobs Pack',
    'Euro Truck Simulator 2 - Going East!',
    'Scania Truck Driving Simulator',
    'RaceRoom Racing Experience ']
[   'Warhammer 40,000 Dawn of War II',
    "Sid Meier's Civilization Beyond Earth - Rising Tide",
    "Sid Meier's Civilization Beyond Earth",
    'War Thunder']


In [12]:
pp.pprint(getTopGames(truckUser, 300, 4))
pp.pprint(getTopGames(spaceUser, 300, 4))

[   'Counter-Strike Global Offensive',
    'Team Fortress 2',
    'Left 4 Dead 2',
    'Unturned']
['Unturned', 'Team Fortress 2', 'Dota 2', 'Heroes & Generals']


### Conclusion

By using curated profiles that had games in a certain genre, the recommendation system was able to recommend other games in the genre when using a smaller set of similar users. As the range of accepted users expanded, the likelihood of being recommended a generally popular game increased. 

In the future, there may be a dataset that includes genre data as well as user reviews that is in a usable format, which would enhance the program. However, the results of the current program given this dataset were significant and has potential to be used if the dataset included more recent items.