# Pulling data from Steam Web API

Since the tables made only provide a bit of aggregate data per game, it would be useful extract information on each game via steampower's Steam Web API. This notebook is going to walk through the process of extracting data from the web and merging it into our data table in the hopes that we will be able to create a recommendation system.

We begin by loading the data from the CSVs to pandas, with appropriate dtypes.

In [1]:
#inline graphing just in case, and libraries to be used
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
from tqdm import tqdm
from bs4 import BeautifulSoup

#dtypes for games stats dataframe, extracted from prior sheets
dtype_dict_games = {'buyer-count': 'int64',
 'player-count': 'int64',
 'accumulated-hours-played': 'float64',
 'player-frac-of-buyer': 'float64',
 'avg-hours-played': 'float64'}

#dtypes for users stats dataframe, extracted from prior sheets
dtype_dict_users = {'purchased-game-count': 'int64',
 'played-game-count': 'int32',
 'played-hours-count': 'float64',
 'purhased-gametitles-list': 'object',
 'played-gametitles-list': 'object',
 'percent-library-played': 'float64',
 'played-hours-avg': 'float64',
 'played-hours-std': 'float64',
 'played-hours-max': 'float64',
 'most-played-game': 'object'}

#load boad CSVs into DataFrames
games_stats_df = pd.read_csv('./steam_game_aggregate_data.CSV', index_col= 0,dtype=dtype_dict_games)
users_stats_df = pd.read_csv('./steam_user_aggregate_data.CSV', index_col= 0,dtype=dtype_dict_users)

After looking at the documentation for the [storefront API](https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI) (which can return store page information for each game), it requires each games' appid. Our data doesn't have these appids included, however, a second API found through [this thread](https://stackoverflow.com/questions/57441606/how-to-get-the-steam-appid-by-appname-in-steam-webapi) allows us to retrieve a list of game appids to their corresponding title. 

Since the API returns a json, it will be easier to first move it into a DataFrame before merging it with our other table.

Since this process takes several minutes, I decided to add the [tdqm](https://github.com/tqdm/tqdm) wrapper on my loop.

In [2]:
#using requests modules to send a GET requestion
r = requests.get('http://api.steampowered.com/ISteamApps/GetAppList/v2')

#assign the relevant information from the json return to the variable 'app_json'
app_json = r.json()['applist']['apps']

#Create empty DataFrame with two columns, loop through each dict in the list, add both values to DataFrame
appid_df = pd.DataFrame(columns=['appid', 'name'])
for app in tqdm(app_json):
    appid_df = appid_df.append(app, ignore_index=True)

#print head to confirm results
appid_df.head()

100%|██████████| 99421/99421 [11:53<00:00, 139.26it/s]


Unnamed: 0,appid,name
0,216938,Pieterw test app76 ( 216938 )
1,660010,test2
2,660130,test3
3,1304450,Die Again
4,1304520,Phogbound


By looping through each title in our DataFrame, we can search, match, and add the appid from this API. Since the searching by string is not not garanteed a hit, a -1 value will be placed for missing appids. this unfortunately cuts the size of our game list from over 50000 to about 3000 games.

In [3]:
#add the index as a column to make it more easily accessible
games_stats_df['game-title'] = games_stats_df.index

#define function to search appid_df by title string, returning appid if matched, -1 if unmatched
def getAppIdFromName(name):
    try:
        appid = appid_df[appid_df['name'] == name]['appid'].iloc[0]
    except:
        return -1
    else:
        return appid

#applys function above to title column to create appid column in games_stats_df
games_stats_df['appid'] = games_stats_df['game-title'].apply(lambda x: getAppIdFromName(x))

#print to confirm results
games_stats_df.head()

Unnamed: 0,buyer-count,player-count,accumulated-hours-played,player-frac-of-buyer,avg-hours-played,game-title,appid
007 Legends,1,1,0.7,1.0,0.7,007 Legends,-1
0RBITALIS,3,3,1.2,1.0,0.4,0RBITALIS,278440
1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby),7,5,20.0,0.714286,4.0,1... 2... 3... KICK IT! (Drop That Beat Like a...,15540
10 Second Ninja,6,2,5.9,0.333333,2.95,10 Second Ninja,271670
10000000,1,1,3.6,1.0,3.6,10000000,227580


To understand what to pull from this API, the json response will be loaded into a DataFrame to explore. It is also good to cross refrerence the description of the data from the documentation linked above. Since the API call requires an appid to test out, the appid for 'PAY DAY 2' was used.

In [5]:
#make GET request for page front content of Pay Day 2
app_info = requests.get('https://store.steampowered.com/api/appdetails?appids=218620')

#assign app_data for testing in the next few cells
app_data = app_info.json()['218620']['data']

#load to DataFrame to view
pd.DataFrame(r.json()['218620'])

Unnamed: 0,success,data
about_the_game,True,"<strong><a href=""https://store.steampowered.co..."
achievements,True,"{'total': 1207, 'highlighted': [{'name': 'Comi..."
background,True,https://steamcdn-a.akamaihd.net/steam/apps/218...
categories,True,"[{'id': 2, 'description': 'Single-player'}, {'..."
content_descriptors,True,"{'ids': [], 'notes': None}"
controller_support,True,full
detailed_description,True,"<strong><a href=""https://store.steampowered.co..."
developers,True,[OVERKILL - a Starbreeze Studio.]
dlc,True,"[1347750, 1347751, 1351060, 1252200, 1255151, ..."
genres,True,"[{'id': '1', 'description': 'Action'}, {'id': ..."


After looking through the json, it seems like the data that will be most useful for a our recommendation system is in the description values, such as "about_the_game", "detailed_description", "short_description", and it wouldn't hurt to add the words from "genres" and "categories", too.

Below are function defined to properly extract the strings from each of theses values.

In [28]:
def get_categories(app_data):
    try:
        categories = app_data['categories']
    except KeyError:
        return ""
    else:
        category_string = ""
        for cat in app_data['categories']:
            category_string = category_string + " " + cat['description']

        return category_string

get_categories(app_data)

' Single-player Multi-player Co-op Online Co-op Steam Achievements Full controller support Steam Trading Cards In-App Purchases Steam Cloud Remote Play on Phone Remote Play on Tablet Remote Play on TV'

In [29]:
def get_genres(app_data):
    try:
        genres = app_data['genres']
    except KeyError:
        return ""
    else:
        genres_string = ""
        for gen in genres:
            genres_string = genres_string + " " + gen['description']
    
        return genres_string

get_genres(app_data)

' Action RPG'

Since all the descriptions are HTML formatted strings, utilizing BeautifulSoup library to parse the text will save us a lot of work of cleaning the tags ourselves. This last function parses all three descriptive HTML strings into one long string.

In [9]:
def html_parser(html_string):
    soup = BeautifulSoup(html_string, 'html.parser')
    return soup.get_text()

def get_all_descriptions(app_data):
    return " ".join([html_parser(app_data['detailed_description']),
                        html_parser(app_data['about_the_game']),
                        html_parser(app_data['short_description'])])

get_all_descriptions(app_data)

'Looking for the San Martín Bundle? Click here!PAYDAY 2 is an action-packed, four-player co-op shooter that once again lets gamers don the masks of the original PAYDAY crew - Dallas, Hoxton, Wolf and Chains - as they descend on Washington DC for an epic crime spree. The CRIMENET network offers a huge range of dynamic contracts, and players are free to choose anything from small-time convenience store hits or kidnappings, to big league cyber-crime or emptying out major bank vaults for that epic PAYDAY. While in DC, why not participate in the local community, and run a few political errands?Up to four friends co-operate on the hits, and as the crew progresses the jobs become bigger, better and more rewarding. Along with earning more money and becoming a legendary criminal comes a character customization and crafting system that lets crews build and customize their own guns and gear.Key FeaturesRob Banks, Get Paid – Players must choose their crew carefully, because when the job goes down 

Final function puts everything together to return one string with descriptions, genres, and categories. Not the conditional if-statement to deal with -1 values.

In [14]:
def collect_descriptions(app_data):
    return get_categories(app_data) + get_genres(app_data) + get_all_descriptions(app_data)

collect_descriptions(app_data)

' Single-player Multi-player Co-op Online Co-op Steam Achievements Full controller support Steam Trading Cards In-App Purchases Steam Cloud Remote Play on Phone Remote Play on Tablet Remote Play on TV Action RPGLooking for the San Martín Bundle? Click here!PAYDAY 2 is an action-packed, four-player co-op shooter that once again lets gamers don the masks of the original PAYDAY crew - Dallas, Hoxton, Wolf and Chains - as they descend on Washington DC for an epic crime spree. The CRIMENET network offers a huge range of dynamic contracts, and players are free to choose anything from small-time convenience store hits or kidnappings, to big league cyber-crime or emptying out major bank vaults for that epic PAYDAY. While in DC, why not participate in the local community, and run a few political errands?Up to four friends co-operate on the hits, and as the crew progresses the jobs become bigger, better and more rewarding. Along with earning more money and becoming a legendary criminal comes a c

For each row in the DataFrame, we have to create the querystring and add it to the URL for the GET request. One more function will be written to complete this.

In [32]:
def getDescrFromAPI(appid):
    if appid == -1:
        return ""
    else:
        url_string = 'https://store.steampowered.com/api/appdetails?appids=' + str(appid)
        app_json = requests.get(url_string).json()
        try: 
            app_data = app_json[str(appid)]['data']
        except:
            return ""
        else:    
            return collect_descriptions(app_data)

In [33]:
games_stats_df['all-descriptions'] = games_stats_df['appid'].apply(getDescrFromAPI)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)