In [21]:
import requests
import json
import pandas as pd

In [25]:
with open("../apikey", "r") as f:
    apikey = f.read()

# Steam API

The official Steam API is largely undocumented. Reference from: https://github.com/Revadike/InternalSteamWebAPI/wiki/Get-App-Details

Let's start with the GetAppList request

In [31]:
url = f"https://api.steampowered.com/IStoreService/GetAppList/v1/?key={apikey}"

response = requests.get(url).json()
response['response'].keys()


dict_keys(['apps', 'have_more_results', 'last_appid'])

The response returns three objects: a list of apps, a boolean indicating whether there are more results, and the last app id returned (to use as a start point for the next request)

In [29]:
applist = pd.DataFrame(data['response']['apps']).set_index('appid')
applist.head(10)

Unnamed: 0_level_0,name,last_modified,price_change_number
appid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10,Counter-Strike,1666823513,22856550
20,Team Fortress Classic,1579634708,22856550
30,Day of Defeat,1512413490,22856550
40,Deathmatch Classic,1568752159,22856550
50,Half-Life: Opposing Force,1579628243,22856550
60,Ricochet,1599518374,22856550
70,Half-Life,1700269108,23366843
80,Counter-Strike: Condition Zero,1602535977,22856550
130,Half-Life: Blue Shift,1579629868,22856550
220,Half-Life 2,1699003213,22856550


Now let's see what details we can get on individual games

In [40]:
appid = applist.index[0]
url = f"https://store.steampowered.com/api/appdetails?appids={appid}"

response = requests.get(url).json()
print(response[str(appid)].keys())

game_data = response[str(appid)]["data"]

dict_keys(['success', 'data'])


In [41]:
game_data.keys()

dict_keys(['type', 'name', 'steam_appid', 'required_age', 'is_free', 'detailed_description', 'about_the_game', 'short_description', 'supported_languages', 'header_image', 'capsule_image', 'capsule_imagev5', 'website', 'pc_requirements', 'mac_requirements', 'linux_requirements', 'developers', 'publishers', 'price_overview', 'packages', 'package_groups', 'platforms', 'metacritic', 'categories', 'genres', 'screenshots', 'recommendations', 'release_date', 'support_info', 'background', 'background_raw', 'content_descriptors', 'ratings'])

# SteamSpy API

SteamSpy is a third party service that collects stats on Steam games. This includes number of owners for games, estimated using a sampling methodology.

Reference:
* https://steamspy.com/about
* https://steamspy.com/api.php

## All games list
We'll start with the all request which returns 1000 games at a time, sorted by number of owners

In [58]:
url = "https://steamspy.com/api.php?request=all&page=0"
response = requests.get(url).json()
pd.DataFrame(response).T

Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,ccu
570,570,Dota 2,Valve,Valve,,1843054,410804,0,"200,000,000 .. 500,000,000",34766,1640,5551,830,0,0,0,612102
730,730,Counter-Strike: Global Offensive,Valve,Valve,,7049012,1037081,0,"100,000,000 .. 200,000,000",17061,745,3581,280,0,0,0,1003113
578080,578080,PUBG: BATTLEGROUNDS,"KRAFTON, Inc.","KRAFTON, Inc.",,1385753,983594,0,"50,000,000 .. 100,000,000",24487,477,11428,177,0,0,0,287938
440,440,Team Fortress 2,Valve,Valve,,956242,125664,0,"50,000,000 .. 100,000,000",6334,570,1056,111,0,0,0,81137
1063730,1063730,New World,Amazon Games,Amazon Games,,192854,82249,0,"50,000,000 .. 100,000,000",7477,0,5046,0,3999,3999,0,6835
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32470,32470,STAR WARS Empire at War - Gold Pack,Petroglyph,"LucasArts, Lucasfilm, Disney",,35217,858,0,"1,000,000 .. 2,000,000",2007,15,262,15,1999,1999,0,2007
1129580,1129580,Medieval Dynasty,Render Cube,Toplitz Productions,,35431,3589,0,"1,000,000 .. 2,000,000",4888,10,1146,10,3499,3499,0,3679
654880,654880,Dream Daddy: A Dad Dating Simulator,Game Grumps,Game Grumps,,5313,488,0,"1,000,000 .. 2,000,000",3577,0,5362,0,1499,1499,0,24
577800,577800,NBA 2K18,Visual Concepts,2K,,5477,13366,0,"1,000,000 .. 2,000,000",9105,0,4454,0,0,0,0,72


## Individal game details

In [108]:
url = f"https://steamspy.com/api.php?request=appdetails&appid={appid}"
response = requests.get(url).json()

response.keys()

dict_keys(['appid', 'name', 'developer', 'publisher', 'score_rank', 'positive', 'negative', 'userscore', 'owners', 'average_forever', 'average_2weeks', 'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount', 'ccu', 'languages', 'genre', 'tags'])

The additional fields we get from this app-level API are language, genre and tags. Because the tag field has a nested dict, we need to normalize it to coerce into a pandas dataframe

In [109]:
game_data_ss = pd.json_normalize(response)
game_data_ss

Unnamed: 0,appid,name,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,...,tags.e-sports,tags.PvP,tags.Old School,tags.Military,tags.Strategy,tags.Survival,tags.Score Attack,tags.1980s,tags.Assassin,tags.Nostalgia
0,10,Counter-Strike,Valve,Valve,,231342,6053,0,"10,000,000 .. 20,000,000",55686,...,1215,907,807,647,627,312,296,276,237,180


If we want to build a dataset with tags for every game, we will end up with a lot of columns - let's keep the top 5 instead

In [110]:
get_store_tags_response = requests.get("https://store.steampowered.com/actions/ajaxgetstoretags").json()
len(get_store_tags_response["tags"])

446

In [111]:
cols = ["tag"+str(i+1) for i in range(5)]
tags = list(response['tags'].keys())[0:5]
tags_df = pd.DataFrame({cols[i]:tags[i] for i in range(5)}, index=[0])

response.pop("tags")
game_data_ss = pd.DataFrame(response, index = [0])
pd.concat([game_data_ss, tags_df], axis=1)