# Pulling data from Steam Web API

After looking at the documentation for the [storefront API](https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI) (which can return store page information for each game), it requires each games' appid. Our data doesn't have these appids included, however, a second API found through [this thread](https://stackoverflow.com/questions/57441606/how-to-get-the-steam-appid-by-appname-in-steam-webapi) allows us to retrieve a list of game appids to their corresponding title. 

Since the engine to be built will be content based

Since this process takes several minutes, I decided to add the [tdqm](https://github.com/tqdm/tqdm) wrapper on my loop.

In [1]:
#inline graphing just in case, and libraries to be used
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
from tqdm import tqdm
from bs4 import BeautifulSoup

After looking at the documentation for the [storefront API](https://wiki.teamfortress.com/wiki/User:RJackson/StorefrontAPI) (which can return store page information for each game), it requires each games' appid. Our data doesn't have these appids included, however, a second API found through [this thread](https://stackoverflow.com/questions/57441606/how-to-get-the-steam-appid-by-appname-in-steam-webapi) allows us to retrieve a list of game appids to their corresponding title. 

Since the API returns a json, it will be easier to first move it into a DataFrame before merging it with our other table.

Since this process takes several minutes, I decided to add the [tdqm](https://github.com/tqdm/tqdm) wrapper on my loop.

In [2]:
#using requests modules to send a GET requestion
r = requests.get('http://api.steampowered.com/ISteamApps/GetAppList/v2')

#assign the relevant information from the json return to the variable 'app_json'
app_json = r.json()['applist']['apps']

#Create empty DataFrame with two columns, loop through each dict in the list, add both values to DataFrame
appid_df = pd.DataFrame(columns=['appid', 'name'])
for app in tqdm(app_json):
    appid_df = appid_df.append(app, ignore_index=True)

#print head to confirm results
appid_df.head()

100%|██████████| 100000/100000 [13:39<00:00, 121.96it/s]


Unnamed: 0,appid,name
0,216938,Pieterw test app76 ( 216938 )
1,660010,test2
2,660130,test3
3,1346400,Battle Engine Aquila
4,1346420,Transpire


Now that all the app names and correspond appid are stored in a data frame, information about each app can be queried from the storefront API.

To understand what to pull from this API, the json response will be loaded into a DataFrame to explore. It is also good to cross refrerence the description of the data from the documentation linked above. Since the API call requires an appid to test out, the appid for 'PAY DAY 2' was used.

In [5]:
#make GET request for page front content of Pay Day 2
app_info = requests.get('https://store.steampowered.com/api/appdetails?appids=218620')

#assign app_data for testing in the next few cells
app_data = app_info.json()['218620']['data']

#load to DataFrame to view
pd.DataFrame(app_info.json()['218620'])

Unnamed: 0,success,data
about_the_game,True,"<strong><a href=""https://store.steampowered.co..."
achievements,True,"{'total': 1207, 'highlighted': [{'name': 'Comi..."
background,True,https://steamcdn-a.akamaihd.net/steam/apps/218...
categories,True,"[{'id': 2, 'description': 'Single-player'}, {'..."
content_descriptors,True,"{'ids': [], 'notes': None}"
controller_support,True,full
detailed_description,True,"<strong><a href=""https://store.steampowered.co..."
developers,True,[OVERKILL - a Starbreeze Studio.]
dlc,True,"[1347750, 1347751, 1351060, 1252200, 1255151, ..."
genres,True,"[{'id': '1', 'description': 'Action'}, {'id': ..."


After looking through the json, it seems like the data that will be most useful for a our recommendation system is in the description values, such as "about_the_game", "detailed_description", "short_description", and it wouldn't hurt to add the words from "genres" and "categories", too.

Below are functions defined to properly extract the strings from each of theses values.

In [6]:
def get_categories(app_data):
    try:
        categories = app_data['categories']
    except KeyError:
        return ""
    else:
        category_string = ""
        for cat in app_data['categories']:
            category_string = category_string + " " + cat['description']

        return category_string

get_categories(app_data)

' Single-player Multi-player Co-op Online Co-op Steam Achievements Full controller support Steam Trading Cards In-App Purchases Steam Cloud Remote Play on Phone Remote Play on Tablet Remote Play on TV'

In [7]:
def get_genres(app_data):
    try:
        genres = app_data['genres']
    except KeyError:
        return ""
    else:
        genres_string = ""
        for gen in genres:
            genres_string = genres_string + " " + gen['description']
    
        return genres_string

get_genres(app_data)

' Action RPG'

Since all the descriptions are HTML formatted strings, utilizing BeautifulSoup library to parse the text will save us a lot of work of cleaning the tags ourselves. This last function parses all three descriptive HTML strings into one long string.

In [8]:
def html_parser(html_string):
    soup = BeautifulSoup(html_string, 'html.parser')
    return soup.get_text()

def get_all_descriptions(app_data):
    try:
        detailed_description = html_parser(app_data['detailed_description']
        about_the_game = html_parser(app_data['about_the_game']
        short_description =  html_parser(app_data['short_description']
        return " ".join([detailed_description, about_the_game, short_description])
    
    except:
        return " "
    else:
        return " ".join([detailed_description, about_the_game, short_description])
                                         
get_all_descriptions(app_data)

'Looking for the San Martín Bundle? Click here!PAYDAY 2 is an action-packed, four-player co-op shooter that once again lets gamers don the masks of the original PAYDAY crew - Dallas, Hoxton, Wolf and Chains - as they descend on Washington DC for an epic crime spree. The CRIMENET network offers a huge range of dynamic contracts, and players are free to choose anything from small-time convenience store hits or kidnappings, to big league cyber-crime or emptying out major bank vaults for that epic PAYDAY. While in DC, why not participate in the local community, and run a few political errands?Up to four friends co-operate on the hits, and as the crew progresses the jobs become bigger, better and more rewarding. Along with earning more money and becoming a legendary criminal comes a character customization and crafting system that lets crews build and customize their own guns and gear.Key FeaturesRob Banks, Get Paid – Players must choose their crew carefully, because when the job goes down 

Final function puts everything together to return one string with descriptions, genres, and categories. Not the conditional if-statement to deal with -1 values.

In [27]:
def collect_descriptions(app_data):
    return get_categories(app_data) + get_genres(app_data) + " " + get_all_descriptions(app_data)

collect_descriptions(app_data)

' Single-player Multi-player Co-op Online Co-op Steam Achievements Full controller support Steam Trading Cards In-App Purchases Steam Cloud Remote Play on Phone Remote Play on Tablet Remote Play on TV Action RPG Looking for the San Martín Bundle? Click here!PAYDAY 2 is an action-packed, four-player co-op shooter that once again lets gamers don the masks of the original PAYDAY crew - Dallas, Hoxton, Wolf and Chains - as they descend on Washington DC for an epic crime spree. The CRIMENET network offers a huge range of dynamic contracts, and players are free to choose anything from small-time convenience store hits or kidnappings, to big league cyber-crime or emptying out major bank vaults for that epic PAYDAY. While in DC, why not participate in the local community, and run a few political errands?Up to four friends co-operate on the hits, and as the crew progresses the jobs become bigger, better and more rewarding. Along with earning more money and becoming a legendary criminal comes a 

For each row in the DataFrame, we have to create the querystring and add it to the URL for the GET request. One more function will be written to complete this.

In [28]:
def getDescrFromAPI(appid):
    if appid == -1:
        return ""
    else:
        url_string = 'https://store.steampowered.com/api/appdetails?appids=' + str(appid)
        try:
            app_json = requests.get(url_string).json()
            app_data = app_json[str(appid)]['data']
        except:
            return ""
        else:    
            return collect_descriptions(app_data)

In [29]:
games_stats_df['all-descriptions'] = games_stats_df['appid'].apply(getDescrFromAPI)

In [33]:
games_stats_df[games_stats_df['all-descriptions'] != "" ]['all-descriptions']

0RBITALIS                                                      Single-player Steam Achievements Steam Tradin...
1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby)     Single-player Steam Achievements Steam Tradin...
10 Second Ninja                                                Single-player Steam Achievements Steam Tradin...
10,000,000                                                     Single-player Steam Achievements Steam Tradin...
100% Orange Juice                                              Single-player Multi-player PvP Online PvP Co-...
                                                                                    ...                        
Angels Fall First                                              Single-player Multi-player PvP Online PvP LAN...
Angry Birds Space                                              Single-player Steam Achievements Steam Tradin...
Angry Video Game Nerd Adventures                               Single-player Steam Achievements Full con

In [35]:
games_stats_df_webdata = pd.concat([games_stats_df['appid'], games_stats_df['all-descriptions']],axis=1)

In [36]:
games_stats_df_webdata.to_csv('./steam_game_aggregate_data_bonus_webdata.CSV')