  This is an API for Steam Spy. It accepts requests in GET string and returns data in JSON arrays.

  Allowed poll rate - 4 requests per second.

  ## Examples: ##
   
  * http://steamspy.com/api.php?request=appdetails&appid=730 - returns data for Counter-Strike: Global Offensive
  * http://steamspy.com/api.php?request=top100in2weeks - return Top 100 apps by players in the last two weeks
		

  ## Common parameters: ##
 
  * request - code for API request call.
  * appid - Application ID (a number).


  ## Accepted requests: ##
  
  ### appdetails ###

  Returns details for the specific application. Requires *appid* parameter.  

  ### genre ###

  Returns games in this particular genre. Requires *genre* parameter and works like this:
  
  * http://steamspy.com/api.php?request=genre&genre=Early+Access


  ### top100in2weeks ###

  Returns Top 100 games by players in the last two weeks.

  ### top100forever ###

  Returns Top 100 games by players since March 2009.

  ### top100owned ###

  Returns Top 100 games by owners.

  ### all ###

  Returns all games with owners data sorted by owners.


  ## Return format for an app: ##

  * appid - Steam Application ID. If it's 999999, then data for this application is hidden on developer's request, sorry.
  * name - the game's name
  * developer - comma separated list of the developers of the game
  * publisher - comma separated list of the publishers of the game
  * score_rank - score rank of the game based on user reviews
  * owners - owners of this application on Steam. **Beware of free weekends!**
  * owners_variance - variance in owners. The real number of owners lies somewhere on owners +/- owners_variance range.   
  * players_forever - people that have played this game since March 2009.
  * players_forever_variance - variance for total players.
  * players_2weeks - people that have played this game in the last 2 weeks.
  * players_2weeks_variance - variance for the number of players in the last two weeks. 
  * average_forever - average playtime since March 2009. In minutes.
  * average_2weeks - average playtime in the last two weeks. In minutes.
  * median_forever - median playtime since March 2009. In minutes.
  * median_2weeks - median playtime in the last two weeks. In minutes.
  * ccu - peak CCU yesterday.
  * price - US price in cents.
  * tags - the game's tags with votes in JSON array


  ## Questions? ##

  Contact me by e-mail: *sergey at galyonkin dot com*.

  

In [1]:
import requests
import pandas as pd

In [2]:
url = 'http://www.steamspy.com/api.php?request=appdetails&appid=730'

r = requests.get(url)
r.json()

{'appid': 730,
 'average_2weeks': 738,
 'average_forever': 17668,
 'ccu': 506896,
 'developer': 'Valve',
 'median_2weeks': 296,
 'median_forever': 4496,
 'name': 'Counter-Strike: Global Offensive',
 'owners': 35540597,
 'owners_variance': 173231,
 'players_2weeks': 10073660,
 'players_2weeks_variance': 94886,
 'players_forever': 34386047,
 'players_forever_variance': 170620,
 'price': '1499',
 'publisher': 'Valve',
 'score_rank': 74,
 'tags': {'Action': 11771,
  'Co-op': 4894,
  'Competitive': 10524,
  'Difficult': 3525,
  'FPS': 18481,
  'Fast-Paced': 3443,
  'First-Person': 8067,
  'Military': 5103,
  'Moddable': 2757,
  'Multiplayer': 14884,
  'Online Co-Op': 6316,
  'PvP': 6999,
  'Realistic': 3531,
  'Shooter': 13910,
  'Strategy': 4881,
  'Tactical': 9258,
  'Team-Based': 11220,
  'Trading': 3532,
  'War': 4824,
  'e-sports': 7375}}

# Herunterladen aller Daten des heutigen Tages

In [3]:
url = 'http://steamspy.com/api.php?request=all'

r = requests.get(url)

Daten einlesen und Names-Index setzen

In [4]:
df = pd.DataFrame.from_dict(r.json(), orient='index', dtype='int').set_index('name')

Übersicht über die Datenstruktur

In [5]:
df.head(2)

Unnamed: 0_level_0,appid,developer,publisher,score_rank,owners,owners_variance,players_forever,players_forever_variance,players_2weeks,players_2weeks_variance,average_forever,average_2weeks,median_forever,median_2weeks,ccu,price,tags
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Counter-Strike,10,Valve,Valve,97,14014789,111439,9832293,93767,327313,17283,11106,750,410,77,19062,999,"{'Action': 2560, 'FPS': 1930, 'Multiplayer': 1..."
Counter-Strike: Condition Zero,100,Valve,Valve,71,11363575,100637,2046723,43141,12653,3399,237,365,32,25,62,999,"{'Action': 354, 'FPS': 274, 'Shooter': 201, 'M..."


# Publisher mit der höchsten durchschnittlichen Spieldauer pro Spieler
Nur Publisher mit mindestens n Spielen werden berücksichtigt.

In [24]:
n = 8
games = 10000

df_temp = df[df['owners']>games].groupby('developer').agg({'owners':'sum', 'average_forever':'mean', 'appid':'count'})
df_temp[df_temp['appid']>=n].sort_values(by='average_forever', ascending=False).head(10)

Unnamed: 0_level_0,owners,average_forever,appid
developer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Paradox Development Studio,7147418,3685.076923,13
Bethesda Game Studios,33436577,2664.0,8
Valve,410830791,2430.965517,29
Daybreak Game Company,30395737,2409.25,8
SQUARE ENIX,5421320,2271.769231,13
"KOEI TECMO GAMES CO., LTD.",2081102,2211.85,20
Firaxis Games,14360485,1790.636364,11
Obsidian Entertainment,10850283,1775.5,8
Eugen Systems,3169281,1568.625,8
Bohemia Interactive,22295029,1564.266667,15


In [25]:
df[(df['developer']=='Paradox Development Studio') & (df['owners']>games)]['average_2weeks']

name
Crusader Kings II                            697
Crusader Kings Complete                       48
Hearts of Iron 2 Complete                    280
March of the Eagles                            0
Europa Universalis: Rome - Gold Edition        0
Europa Universalis IV                        948
Europa Universalis III Complete              708
Hearts of Iron III                           281
Stellaris                                    905
Hearts of Iron IV                            789
Victoria II                                  348
Victoria I Complete                         1384
Sengoku                                        0
Name: average_2weeks, dtype: int64

In [26]:
df[(df['developer']=='Paradox Development Studio') & (df['owners']>games)]['average_forever']

name
Crusader Kings II                            6371
Crusader Kings Complete                       495
Hearts of Iron 2 Complete                    3141
March of the Eagles                           893
Europa Universalis: Rome - Gold Edition       775
Europa Universalis IV                       12919
Europa Universalis III Complete              2832
Hearts of Iron III                           2495
Stellaris                                    5191
Hearts of Iron IV                            7464
Victoria II                                  3976
Victoria I Complete                          1012
Sengoku                                       342
Name: average_forever, dtype: int64

In [108]:
url = 'https://steamspy.com/country/'

In [109]:
from bs4 import BeautifulSoup
import datetime
import re

In [110]:
html = requests.get(url).text
bs = BeautifulSoup(html, 'lxml')

In [141]:
pattern = re.compile('[1-5]\. ')
today = datetime.date.today()

data = []
for row in bs.find('tbody').findAll('tr'):
    entries = row.findAll('td')
    
    country = entries[1].get_text()
    games_per_user = float(entries[2].get_text())
    time = entries[3].get_text().split(':')
    time = 60*int(time[0]) + int(time[1])

    owned_games = re.split(pattern, entries[4].get_text())[1:]
    owned_games = list(zip(list(range(1,6)), owned_games, ['most owned games']*5))
    favorite_games = re.split(pattern, entries[5].get_text())[1:]
    favorite_games = list(zip(list(range(1,6)), favorite_games, ['favorite games (2 weeks)']*5))
    
    for rank in owned_games+favorite_games:
        data.append([country, games_per_user, time, rank[0], rank[1], rank[2], today])

In [142]:
fd = pd.DataFrame(data, columns=['country', 'games per user', 'minutes (2 weeks)', 'rank', 'name', 'category', 'today']).set_index('today')
fd.head()

Unnamed: 0_level_0,country,games per user,minutes (2 weeks),rank,name,category
today,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-10-19,United States,41.86,1565,1,Team Fortress 2,most owned games
2017-10-19,United States,41.86,1565,2,Counter-Strike: Global Offensive,most owned games
2017-10-19,United States,41.86,1565,3,Garry's Mod,most owned games
2017-10-19,United States,41.86,1565,4,Unturned,most owned games
2017-10-19,United States,41.86,1565,5,Dota 2,most owned games


In [143]:
fd.shape

(1000, 6)

# Fehlerhafte Interpretation
Kleine Länder sind hier gleichbedeutend mit Großen. Somit spielt die Anzahl der Spieler keinerlei Rolle!

In [150]:
n = 10

fd_temp = fd[fd['category']=='favorite games (2 weeks)'].groupby('name').agg({'rank':'mean', 'name':'count'})
fd_temp[fd_temp['name']>=n].sort_values('rank')

Unnamed: 0_level_0,rank,name
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Counter-Strike: Global Offensive,1.38,100
Dota 2,2.443182,88
PLAYERUNKNOWN'S BATTLEGROUNDS,2.489583,96
Rocket League,3.636364,33
Paladins,4.0,33
Grand Theft Auto V,4.25,32
Counter-Strike,4.3,10
Team Fortress 2,4.461538,26
Warframe,4.619048,21


In [214]:
url = 'http://steamspy.com/sale'

html = requests.get(url).text
bs = BeautifulSoup(html, 'lxml')

In [220]:
pattern_1 = re.compile('\(\$(.*?)\)')
pattern_2 = re.compile('\((.*?)%\)')

data  = []
for row in bs.find('tbody').findAll('tr'):
    entries = row.findAll('td')
    
    rank = int(entries[0].get_text())
    appid = int(entries[1].find('a')['href'].rsplit('/', 1)[-1])
    name = entries[1].get_text().strip()
    owner_before = int(entries[2]['data-order'])
    owner_before_std = int(entries[2].find('font').get_text()[1:].replace(',', ''))
    owner_after = int(entries[3]['data-order'])
    owner_after_std = int(entries[3].find('font').get_text()[1:].replace(',', ''))
    sales = int(entries[4]['data-order'])
    increase = float(entries[5]['data-order'])
    price = int(entries[6]['data-order'])
    try:
        discount_percentage = int(entries[7]['data-order'])
    except:
        discount_percentage = None
    discount_absolut = int(re.search(pattern_1, entries[7].get_text()).group(1).replace('.', ''))
    user_score_1 = int(entries[8]['data-order'])
    try:
        user_score_2 = int(re.search(pattern_2, entries[8].get_text()).group(1))
    except:
        user_score_2 = None
    data.append([rank, appid, name, owner_before, owner_before_std, owner_after, owner_after_std, sales, increase, price, discount_percentage, discount_absolut, user_score_1, user_score_2])

In [221]:
dd = pd.DataFrame(data, columns=['rank', 'appid', 'name', 'ower_before', 'owner_before_std', 'owner_after', 'owner_after_std', 'sales', 'increase', 'price', 'discount_percentage', 'dicount_absolut', 'user_score_1', 'user_score_2'])
dd.head()

Unnamed: 0,rank,appid,name,ower_before,owner_before_std,owner_after,owner_after_std,sales,increase,price,discount_percentage,dicount_absolut,user_score_1,user_score_2
0,1,730,Counter-Strike: Global Offensive,22273413,140867,22891335,144941,617922,2.77,1499,50.0,749,90,83.0
1,2,240,Counter-Strike: Source,14869820,116543,14950456,118698,80636,0.54,1999,0.0,1999,95,88.0
2,3,550,Left 4 Dead 2,14125477,113729,14518139,117053,392662,2.78,1999,80.0,399,96,89.0
3,4,320,Half-Life 2: Deathmatch,13505650,111320,13634900,113601,129250,0.96,499,80.0,99,87,
4,5,4000,Garry's Mod,11501361,103069,11914420,106492,413059,3.59,999,75.0,249,95,


In [222]:
dd.shape

(1357, 14)

In [223]:
dd[dd['name']=='Tomb Raider']

Unnamed: 0,rank,appid,name,ower_before,owner_before_std,owner_after,owner_after_std,sales,increase,price,discount_percentage,dicount_absolut,user_score_1,user_score_2
31,32,203160,Tomb Raider,4104282,62315,4191120,63951,86838,2.12,1999,75.0,499,95,86.0


In [219]:
bs.find('div', {'class':'panel-title'})

<div class="panel-title">Steam Summer Sale </div>