# Social Computing/Social Gaming - Summer 2022

# Exercise Sheet 3: Collaborative Filtering with Steam Games

In this exercise, we will build a collaborative filtering recommender system using data we gather from Steam. We will use your friends list to get information about owned games for each ID, and the time each game was played.

Usually, collaborative filtering is based on some sort of rating to determine the similarity between users. However, for games, the enjoyment and a rating do not always match. Additionally, only about 10% of players actually rate the games they play, which would make for a very incomplete dataset. Therefore, the playtime will be used instead of a rating system. This has the added benefit that playtime is usually the most authentic metric of enjoyment, as players are very unlikely to spend much time on a game they don't enjoy.

## Task 3.1: Obtaining the data


**1.** Your first task is to **gather the data** needed to create the recommender system. **Create a data structure** that holds the needed information for each player and game. To do this, **open the URL** with the given `request()` function, **read** the json response and retrieve your games library and playtime. Then **save** the games into a dictionary with `key=name` and `values=playtime`. **Do not add** games with 0 playtime to this dictionary.


**Notes:** 
- You have three different options to solve this exercise. You can either:
    - Use your own Steam profile (strongly recommended)
    - Use the provided default Steam account (in case you do not own a Steam profile)
    - Use the provided .json file (in case you do not have a Steam profile and the default Steam account becomes overcrowded)
- your choice will not affect your grade in any way
- You cannot obtain a list from your profile with the Steam API unless your profile is set to public. 
- Upon executing the code below, you will notice that a lot of profiles "`couldnt decode`". These are private or deleted profiles and it is totally fine to get this message.


**Hints**:
- In case you wish to use your own Steam profile, but are afraid to share your personal [key](https://steamcommunity.com/dev/apikey) [1] and id, please be informed that you can delete them **after** solving the tasks and before submitting your solutions. The outputs will be saved in the Jupyter Notebook.
- To obtain the games a user owns, use this: `games = data['response']['games']`. This returns a list of games, including the playtime (in minutes) which can be retrieved like this: `playtime = game['playtime_forever']`, where game refers to an item from the list of games. 

Execute the following code cell to install the needed library for this exercise.

In [2]:
!pip install mlxtend



In [13]:
# Use this if you want to work with the default IDs
import requests
import urllib
import pandas as pd
import json
from urllib.request import Request, urlopen
from pandas.io.json import json_normalize
from requests.exceptions import HTTPError

# You can replace these values with your own ID and API key
key = "FB779ED85245344586B27465C0F5A7F2"
id = "76561198134752407"
url = "http://api.steampowered.com/ISteamUser/GetPlayerSummaries/v0002/?key="+key+"&steamids="+id
r = requests.get(url)
data = r.json()

# Get friendslist
# This is just a template. In order to get your personalized list, you need to change the id and key above.
request = Request("http://api.steampowered.com/ISteamUser/GetFriendList/v0001/?key="+key+"&steamid="+id+"&relationship=friend")
response = urlopen(request)
elevations = response.read()
data = json.loads(elevations)
friendslist = data['friendslist']
friends = friendslist['friends']

# Get all friends
friendids = []
tempIDs = []
for friend in friends:
    friendids.append(friend['steamid'])
    
print(len(friendids), "ok")

# Get friends of friends
x = 0

while x < len(friendids):
    friendID = friendids[x]
    request = Request("http://api.steampowered.com/ISteamUser/GetFriendList/v0001/?key="+key+"&steamid="+friendID+"&relationship=friend")
    try:
        response = urlopen(request)    
    except urllib.error.HTTPError as e:
        print('401')
    elevations = response.read()
    try:
        data = json.loads(elevations)
    except json.JSONDecodeError:
        print("couldn't decode")
    friendslist = data['friendslist']
    friends = friendslist['friends']

    friendidsNew = []
    for friend in friends:
        friendidsNew.append(friend['steamid'])
        
    tempIDs += friendidsNew
    x += 1

friendids += tempIDs
friendids = list(dict.fromkeys(friendids))
friendids = list(set(friendids))
print(len(friendids))


6 ok
401
couldn't decode
335


In [15]:
# Trim the list of IDs to reasonable values:
if len(friendids)>250:
    friendids = friendids[:250]
print(len(friendids))

users_gamedicts = {} # The dictionary containing all information for every ID
gamedict = {} # A dict containing information for one player

# Get owned games of friendslist:
request = Request("http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key="+key+"&steamid="+id+"&include_appinfo=1&format=json")

# TODO:
# Open the URL, read the json response and retrieve your games library and playtime
# Save the games into a dictionary with key=name and values=playtime
# Hint 1: You can obtain the games a user owns with data['response']['games']
# Hint 2: You can retrieve their playtime with game['playtime_forever']
data = json.loads(urlopen(request).read())
res = data['response']
for game in res['games']:
    if game['playtime_forever'] > 0:
        gamedict.update({game['name']:game['playtime_forever']})

# Add the dictionary to the users_gamedict       
users_gamedicts[id] = gamedict

# Do the same for your friends and their friends
for friendID in friendids:
    # TODO:
    request = Request("http://api.steampowered.com/IPlayerService/GetOwnedGames/v0001/?key="+key+"&steamid="+friendID+"&include_appinfo=1&format=json")
    data = json.loads(urlopen(request).read())
    res = data['response']
    gamedict_new = {}
    if 'games' in data.keys():
        for game in data['games']:
            if game['playtime_forever'] > 0:
                gamedict_new.update({game['name']:game['playtime_forever']})
    
        users_gamedicts[friendID] = gamedict

        
print(users_gamedicts[id])

250
{'Left 4 Dead 2': 1135, 'Stronghold Kingdoms': 39, 'PlanetSide 2': 254, 'Warframe': 620, 'War Thunder': 2833, 'Path of Exile': 7945, 'Cry of Fear': 30, 'Counter-Strike Nexon: Studio': 9, 'Trove': 1, 'Unturned': 25, 'NEOTOKYO°': 613, 'Heroes & Generals': 1, 'Counter-Strike: Global Offensive': 23191, 'Zero-K': 1, 'Clicker Heroes': 10799, 'Neverwinter Nights: Enhanced Edition': 127, 'Crusader Kings II': 3609, 'Among Us': 499, 'Mindustry': 3}


In [16]:
gamesofallusers = []

# TODO 1: Convert the gamedict to a list of lists:

for idx in users_gamedicts:
    games = [game for game in users_gamedicts[idx]]
    gamesofallusers.append(games)


# It should look something like this:
'''
[
    [
    'Path of Exile',
    'Europa Universalis IV',
    'Titan Quest Anniversary Edition',
    'Black Desert Online',
    'Crusader Kings II'
    ],
    [
    'Counter-Strike',
    'Day of Defeat',
    'Deathmatch Classic',
    'Ricochet'
    ]
]
''' 
# Each list within this list represents the games of one user
print(gamesofallusers[0])
    
    
# Remove common Steam entries that are not games:
for game in gamesofallusers:
    if 'Dota 2 Test' in game:
        game.remove('Dota 2 Test')
    if 'True Sight' in game:
        game.remove('True Sight')
    if 'True Sight: Episode 1' in game:
        game.remove('True Sight: Episode 1')
    if 'True Sight: Episode 2' in game:
        game.remove('True Sight: Episode 2')
    if 'True Sight: Episode 3' in game:
        game.remove('True Sight: Episode 3')
    if 'True Sight: The Kiev Major Grand Finals' in game:
        game.remove('True Sight: The Kiev Major Grand Finals')
    if 'True Sight: The International 2017' in game:
        game.remove('True Sight: The International 2017')
    if 'True Sight: The International 2018 Finals' in game:
        game.remove('True Sight: The International 2018 Finals')
        

['Left 4 Dead 2', 'Stronghold Kingdoms', 'PlanetSide 2', 'Warframe', 'War Thunder', 'Path of Exile', 'Cry of Fear', 'Counter-Strike Nexon: Studio', 'Trove', 'Unturned', 'NEOTOKYO°', 'Heroes & Generals', 'Counter-Strike: Global Offensive', 'Zero-K', 'Clicker Heroes', 'Neverwinter Nights: Enhanced Edition', 'Crusader Kings II', 'Among Us', 'Mindustry']


In [27]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

te = TransactionEncoder()
# TODO 2: Tinker around with the values
temp = te.fit(gamesofallusers).transform(gamesofallusers)
df = pd.DataFrame(temp, columns=te.columns_)
items = apriori(df, min_support=0.5, use_colnames=True)

print(items.where(items['support'] > 0.0).sort_values(by='support', ascending=False))

        support                                           itemsets
0           1.0                                         (Among Us)
349520      1.0  (Left 4 Dead 2, NEOTOKYO°, Heroes & Generals, ...
349533      1.0  (Left 4 Dead 2, NEOTOKYO°, Heroes & Generals, ...
349532      1.0  (Left 4 Dead 2, NEOTOKYO°, Heroes & Generals, ...
349531      1.0  (Left 4 Dead 2, NEOTOKYO°, Heroes & Generals, ...
...         ...                                                ...
174757      1.0  (Clicker Heroes, Stronghold Kingdoms, Among Us...
174756      1.0  (Clicker Heroes, Stronghold Kingdoms, Among Us...
174755      1.0  (Clicker Heroes, Stronghold Kingdoms, Among Us...
174754      1.0  (Clicker Heroes, Stronghold Kingdoms, Among Us...
524286      1.0  (Left 4 Dead 2, Among Us, Cry of Fear, Counter...

[524287 rows x 2 columns]


In [28]:
from mlxtend.frequent_patterns import association_rules

# TODO 2: Play around with the treshold value
temp = association_rules(items, metric="confidence", min_threshold=0.1)
print("Min Threshold: 0.1")
print(temp)


temp = association_rules(items, metric="confidence", min_threshold=0.5)
print("Min Threshold: 0.5")
print(temp)

**TODO 2: Write your observations here**

- From the association rule output, one can recommend Counter-Strike: Global Offensive if an user plays PAYDAY 2 as the confidence is 1, the highest. Similarly same recommendation can be made if an user plays Garry's Mod as the confidence is second highest.

- Confidence of 1 means that the player who plays game x also plays game y. So, it can be used as a measure to recommend games. It doesn't capture how similar two games are, rather only look at whether y is played by players who play x.

- Highest support is for Counter-Strike: Global Offensive from the frequent_itemlist. Highest confidence is for PAYDAY 2 as andecedent and Counter-Strike: Global Offensive as consequent from the association_rules. One can infer a correlation that the game with the highest similarity is the consequent in the association rule. This makes sense because the highest support means that the game is played by most players in the dataset. So, this game has higher chance of being a consequent or a suitable candidate for recommending as it would appear together with many other games.

- Lift can't be commented in the above scenario as it captures the correlation of andecedents and consequents. The one with highest confidence need not have high Lift as seen from the association rules. This is because confidence doesn't take into account the correlation between x and y. The highest Lift is for PLAYERUNKNOWN'S BATTLEGROUNDS as andecedent and Left 4 Dead 2 as consequent.


## Task 3.3: The Recommender System: Similarity Score


Finally, it is time to build the recommender system. 

**1.** The first thing to do is to **implement a similarity score** that will be used to predict a user's playtime of an unowned game. We implement a similarity score between two users by taking the relative distance between two players. We use the following formula:

$$d(u, v) = \sum_{i~\in~common~games} \frac{|r_{u,i} - r_{v,i}|}{r_{v,i}}$$ 

Where $u$ and $v$ are users and $r_{u,i}$ is the playtime of user $u$ for game $i$. 

You can then return the similarity with  
$$ w_{u,v} = \frac{1}{1 + d(u, v)} $$

**Notes:** 
- If no common games exist return 0.

**a) Implement similarity scores:** Besides the given similarity score, we want to explore how other measurements behave. Hence, we will implement the euclidean distance and cosine similarity. The scores can be selected by setting the respective variable on `True`.

In [None]:
from math import sqrt
    
def calculate_similarity(user1ID, user2ID, given=True, euclidean=False, cosine=False):    
    common_games = []
    user1games = users_gamedicts[user1ID]
    user2games = users_gamedicts.get(user2ID,user1games)
    common_games = list(set(user1games).intersection(user2games))
    differences = []
    
    # TODO: Calculate the similarity score between two friends based on their common games:
    # Needs to be done
    if(euclidean):
        for game in common_games:
            differences.append(abs(user1games[game]-user2games[game]))
        return sqrt(sum(differences))
    elif(cosine):
        for game in common_games:
            differences.append(user1games[game]*user2games[game])
        return sum(differences)
    elif(given):
        d = 0
        for i in common_games:
            d += abs(user1ID[i] - user2ID[i])/user2ID[i]
        if len(common_games) == 0:
            return 0
        return 1/(1+d)


## Task 3.4: Recommender System: Predict ratings

With the similarity score calculated, we can now predict a user's playtime for games they don't own.

**1.** First, we **create a set of all games**, but we **delete** all games that are owned by less than 3 players. The reason is simple: If only 1 or 2 players own a game, it is impossible to derive a meaningful prediction since there is not enough data. 

The predicted playtime for a game works analogous to the predicted rating of a movie/item in a conventional collaborative filtering recommender system:

$$r_{u,i} = \frac{\sum_{v \in N_i(u)} w_{u,v}r_{v,i}}{\sum_{v \in N_i(u)} w_{u,v}}$$

where 
- $r_{u,i}$ is the estimated recommendation of item $i$ for target user $u$. 
- $N_i(u)$ is the set of similar users to target user $u$ for the designated item $i$. 
- $w_{u,v}$ is the similarity score between users $u$ and $v$ (used as a weighting factor).  

**Notes:** 
- In our case, we use playtime as a recommendation measure and the set $N_i(u)$ consists of user $u$ friends list and friends of friends list. In our scenario, we do not need the index $i$ as our friends list does not change between games.
- Keep in mind that we have already taken out the games with a playtime of 0. In this case, they are considered "unowned" and not taken into account in this exercise.

In [None]:
# List of all games that are owned by at least 1 person
allGames = []
for user in gamesofallusers:
    for game in user:
        allGames.append(game)
        
# TODO : Create a list of games owned by at least 3 people
unique_games = list(set(allGames))
games = [game for game in unique_games if allGames.count(game) >=3]
print('Number of unique games played by >3 ', len(games))

# TODO: Find out which games you do not own out of all games because we are only interested in recommendations for games that we do not own
def difference(allGames, yourGames): 
    # TODO:
    return [game for game in allGames if game not in yourGames]


# TODO: Predict ratings based on the formula above for each unowned game
# use 'given', 'euclidean' and 'cosine' to switch between measurements
def predict_ratings(given=True, euclidean=False, cosine=False):
    similarity_scores = {}
    # TODO:
    '''Hint: Iterate over all unowned games and for each game calculate a rating based
        on your friends playtime and similarity score ''' 
    if(euclidean):
        for game in difference(unique_games, games):
            similarity_scores[game] = 0
            for friend in users_gamedicts:
                if game in users_gamedicts[friend]:
                    similarity_scores[game] += 1/calculate_similarity(friend, user, given=given, euclidean=True)
            similarity_scores[game] /= len(users_gamedicts)
    elif(cosine):
        for game in difference(unique_games, games):
            similarity_scores[game] = 0
            for friend in users_gamedicts:
                if game in users_gamedicts[friend]:
                    similarity_scores[game] += 1/calculate_similarity(friend, user, given=given, cosine=True)
            similarity_scores[game] /= len(users_gamedicts)
            
    elif(given):
        not_owned_games = difference(games, users_gamedicts[int(id)])
        rating = {}
        for game in not_owned_games:
            score_nr = 0
            score_dr = 0
            for user in users_gamedicts:
                if int(id) != user and game in users_gamedicts[user].keys():
                    sim = calculate_similarity(users_gamedicts[int(id)],users_gamedicts[user])
                    score_nr += sim* users_gamedicts[user][game]
                    score_dr += sim
            if score_dr != 0:
                rating.update({game:score_nr/score_dr})
    return rating

rating = predict_ratings()
print(rating)

NameError: name 'gamesofallusers' is not defined

## Task 3.5: Recommender System: Discussion

**1.** **Sort** the predicted ratings by estimated playtime (highest first) and **print out** the top 8 predictions for you (or the default user if you are using the default ID). 

**2.** **Discuss** the difference in recommendations between the collaborative filtering approach and the association rule approach. Would you consider one more accurate than the other? Why/why not?

**3.** **Discuss** the differences in the similarity scores.

In [None]:
# TODO: 
rating_sorted = sorted(rating.items(), key = lambda kv:(kv[1], kv[0]), reverse=True)
print(rating_sorted[:5])

NameError: name 'rating' is not defined

In [None]:
# TODO: 

In [None]:
# TODO: 

**TODO: Write your observations here**

## References

[1] https://steamcommunity.com/dev/apikey