<a href="https://colab.research.google.com/github/Jcc329/Classifying-Malaria-Infected-Cells-Using-Neural-Networks/blob/main/Raw_data/Accessing_Steam_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data 606 - Data Science Capstone
### Jessica Conroy

Project Stage: Data Acquisition

This notebook aims to access and request data from the Steam API and Steamspy API. 

### Accessing Steam Data Process

The first call to the steam API gets a list of all games currently or soon to be available on the Steam service.

This list is then converted into a pandas dataframe and cleaned by removing as many blank, test, or beta games as possible based on the name of the game. This is so that the final dataset doesn't contain new games that don't have enough review information, or 'games' that were created without any associated data (for example, by someone testing how to use the platform).

The final dataframe is then passed to a function define below. That function randomizes the dataframe using sklearn shuffle and then impliments 3 api calls for each appid in the list, adding the data for that game to a dictionary. The first API requests the general steam data, the second requests the top 20 reviews and associated review metadata, the third requests supplementary data available from the steamspy API. 

This loop runs for 6 hours and then ends. The goal being to collect a large random sample of games that I can then analyze while keeping in mind time limitations and rate limits.

The function then converts the final dictionary into a dataframe and returns that dataframe.

### Saving the data

Output data is saved as a CSV to my local machine.

### Primary Analysis

Basic descriptive statistics are run.

### Data cleaning

The final dataset contains several columns with many values (subdictionaries).
To handle these, I will identify all columns containing desired data, remove unneccessary columns and use the apply function to convert the multidimentional columns into their own dataframes that can be appened back onto the original dataframe. 

#### Text Cleaning

Any text data will undergo additional cleaning to prepare it for analysis, including converting the text to lowercase, removing symbols and punctuation, and generally tidying the data.

This concludes the goals of this notebook. A cleaned dataset will be save and the next stage of EDA will occur in the next notebook in this series. 

### Sources

Inspiration came from https://nik-davis.github.io/posts/2019/steam-data-collection/ 


In [1]:
!pip install steamspypi

Collecting steamspypi
  Downloading steamspypi-1.1.1-py3-none-any.whl (11 kB)
Installing collected packages: steamspypi
Successfully installed steamspypi-1.1.1


In [2]:
# standard library imports
import csv
import datetime as dt
import json
import os
import statistics
import time

# third-party imports
import numpy as np
import pandas as pd
import requests
import steamspypi
from sklearn.utils import shuffle

# customisations - ensure tables show all columns
pd.set_option("max_columns", 100)
pd.set_option('display.max_rows', None)

In [3]:
#Get all game ids and names
#URL call found here: https://partner.steamgames.com/doc/webapi/ISteamApps
URL = 'https://api.steampowered.com/ISteamApps/GetAppList/v2/'

response = requests.get(url=URL)
json_data = response.json()
GameIDs = pd.DataFrame.from_dict(json_data['applist']['apps'])
#Clean up the dataframe to remove empty strings and test/demo games
GameIDs['name'] = GameIDs['name'].str.strip()
GameIDs['name'] = GameIDs['name'].str.lower()
GameIDs = GameIDs[GameIDs['name'].isin(['','pieterw test app76 ( 216938 )','test2','test3', 'tidewoken public test', 
                                        'now testing: 407', 'test re(quietmansion1 special teaser)', '<h1>test</h1>', 
                                        'test', 'test project', 'steamvr performance test', 'testcontent', 'vrq test'
                                        ]) == False]
GameIDs = GameIDs[GameIDs['name'].str.contains('playtest')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('closed testing')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('testapp')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains(' test ')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('betatest')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('test server')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('beta test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('tidewoken public test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('open test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('dev test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('- test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('feature test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('technical test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('early access testing')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('_test')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains(' demo')==False]
GameIDs = GameIDs[GameIDs['name'].str.contains('public test')==False]


In [4]:
GameIDs.shape

(125752, 2)

In [5]:
#Create function to collect data from APIs
def CollectSteamData(GameIDDF):
    '''
    input: dataframe containing IDs and names of games 
    output: dataframe containing all api data from a random sample of the games
    '''
    #Steam API 1: primary game data
    #https://stackoverflow.com/questions/69512319/steam-api-to-get-game-info
    #Steam API 2: Review data
    #https://partner.steamgames.com/doc/store/getreviews
    #Steamspy API: Supplemental usage and cost data
    # https://pypi.org/project/steamspypi/
    # https://steamspy.com/api.php
    
    #Randomize the data frame
    IDs = shuffle(GameIDDF)
    GameDict = {}
    starttime = time.time()
    for appid in IDs['appid']:
        try:
            gameURL = 'http://store.steampowered.com/api/appdetails?appids=' + str(appid)
            response = requests.get(url=gameURL)
            json_data = response.json()
            GameData = json_data[str(appid)]['data']
            time.sleep(1) # 1 second rate limit on API calls
            reviewURL = 'http://store.steampowered.com/appreviews/' + str(appid) + '?json=1'
            response = requests.get(url=reviewURL)
            json_data = response.json()
            ReviewScore = json_data['query_summary']['review_score']
            ReviewScoreDesc = json_data['query_summary']['review_score_desc']
            reviewText = ''
            for review in json_data['reviews']:
                reviewText = reviewText + review['review']
            
            ReviewDict = {'Review Score':ReviewScore, 'Review Score Description': ReviewScoreDesc, 'Top Reviews by Upvotes':reviewText}

            data_request = dict()
            data_request['request'] = 'appdetails'
            data_request['appid'] = str(appid)
            steamspydata = steamspypi.download(data_request)

            # Combine all three json dictionaries and convert to dataframe
            GameData.update(ReviewDict)
            GameData.update(steamspydata)
            time.sleep(1) # 1 second rate limit on API calls

        except: #games that do not have any associated data or other failed api calls
            time.sleep(1)
        endtime = time.time()
        elapsedtime = (endtime-starttime)/60
        if elapsedtime >= 360: #If Greater than or equal to 6 hours, then end
            break
        #add all data for current app loop to GameDict
        GameDict.update({str(appid): GameData})
    #Convert to Dataframe
    GameDF = pd.DataFrame.from_dict(GameDict, orient='index')

    return GameDF

In [6]:
Sample_Game_Data = CollectSteamData(GameIDs)

In [7]:
from google.colab import files
Sample_Game_Data.to_csv('RawSteamGameData.csv') 
files.download('RawSteamGameData.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [8]:
Sample_Game_Data.shape

(7309, 62)

In [9]:
Sample_Game_Data.columns

Index(['type', 'name', 'steam_appid', 'required_age', 'is_free',
       'detailed_description', 'about_the_game', 'short_description',
       'supported_languages', 'header_image', 'website', 'pc_requirements',
       'mac_requirements', 'linux_requirements', 'developers', 'publishers',
       'price_overview', 'packages', 'package_groups', 'platforms',
       'categories', 'genres', 'screenshots', 'movies', 'release_date',
       'support_info', 'background', 'content_descriptors', 'Review Score',
       'Review Score Description', 'Top Reviews by Upvotes', 'appid',
       'developer', 'publisher', 'score_rank', 'positive', 'negative',
       'userscore', 'owners', 'average_forever', 'average_2weeks',
       'median_forever', 'median_2weeks', 'price', 'initialprice', 'discount',
       'ccu', 'languages', 'genre', 'tags', 'fullgame', 'reviews',
       'achievements', 'legal_notice', 'dlc', 'controller_support',
       'recommendations', 'ext_user_account_notice', 'demos', 'metacritic'

In [10]:
Sample_Game_Data.describe(include='all')

Unnamed: 0,type,name,steam_appid,required_age,is_free,detailed_description,about_the_game,short_description,supported_languages,header_image,website,pc_requirements,mac_requirements,linux_requirements,developers,publishers,price_overview,packages,package_groups,platforms,categories,genres,screenshots,movies,release_date,support_info,background,content_descriptors,Review Score,Review Score Description,Top Reviews by Upvotes,appid,developer,publisher,score_rank,positive,negative,userscore,owners,average_forever,average_2weeks,median_forever,median_2weeks,price,initialprice,discount,ccu,languages,genre,tags,fullgame,reviews,achievements,legal_notice,dlc,controller_support,recommendations,ext_user_account_notice,demos,metacritic,drm_notice,alternate_appid
count,7309,7308.0,7309.0,7309.0,7309,7309.0,7309.0,7309.0,7043,7309,4022,7309,7309,7309,6632,7309,5283,5366,7309,7309,7003,6507,6652,4338,7309,7309,7309.0,7309,7291.0,7291,7291.0,7291.0,7291.0,7291.0,7291.0,7291.0,7291.0,7291.0,7291,7291.0,7291.0,7291.0,7291.0,6779.0,6779.0,6779.0,7291.0,6779,7290.0,7291,3002,564,1809,2703,533,2205,814,66,391,250,40,1.0
unique,10,6712.0,,12.0,2,6084.0,6084.0,6526.0,1429,6661,2807,5049,1245,796,4472,3731,255,4928,4908,5,1314,685,6115,3978,2501,4306,6116.0,620,,19,3442.0,,4074.0,3364.0,4.0,,,,11,,,,,146.0,66.0,37.0,,1145,622.0,3336,1512,531,1652,2018,504,1,539,55,361,235,15,1.0
top,game,,,0.0,False,,,,English<strong>*</strong><br><strong>*</strong...,https://cdn.akamai.steamstatic.com/steam/apps/...,http://www.fantasygrounds.com,[],[],[],[TigerQiuQiu],[],"{'currency': 'USD', 'initial': 99, 'final': 99...",[130890],[],"{'windows': True, 'mac': False, 'linux': False}","[{'id': 2, 'description': 'Single-player'}]","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...","[{'id': 256786325, 'name': 'Soul of Empress Tr...","{'coming_soon': False, 'date': ''}","{'url': '', 'email': ''}",,"{'ids': [], 'notes': None}",,No user reviews,,,,,,,,,"0 .. 20,000",,,,,0.0,0.0,0.0,,English,,[],"{'appid': '252690', 'name': 'Fantasy Grounds C...","“Spot on acting, intelligent music and an art ...",{'total': 0},© 2015 UBISOFT ENTERTAINMENT. ALL RIGHTS RESER...,"[579510, 881830]",full,{'total': 108},PlayFab (Supports Linking to Steam Account),"[{'appid': 1196960, 'description': ''}]","{'score': 79, 'url': 'https://www.metacritic.c...",Denuvo Anti-tamper<br>5 different PC within a ...,243580.0
freq,4089,7.0,,7038.0,6676,645.0,645.0,132.0,1949,11,119,735,4636,5213,151,1311,707,8,1980,5234,1138,353,5,4,94,806,657.0,6347,,3548,3548.0,,1144.0,1774.0,7276.0,,,,6382,,,,,1350.0,1350.0,6325.0,,3424,1272.0,3150,119,4,37,74,2,2205,11,3,2,3,14,1.0
mean,,,999509.7,,,,,,,,,,,,,,,,,,,,,,,,,,1.545878,,,999742.8,,,,813.404197,109.467151,0.169387,,64.316692,3.386778,57.441092,3.637361,,,,44.250171,,,,,,,,,,,,,,,
std,,,496969.8,,,,,,,,,,,,,,,,,,,,,,,,,,2.829286,,,497020.7,,,,14050.711455,2164.124272,3.767921,,1085.400584,55.440596,1050.501621,59.419759,,,,1134.643738,,,,,,,,,,,,,,,
min,,,70.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,70.0,,,,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,,,,0.0,,,,,,,,,,,,,,,
25%,,,595754.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,595972.0,,,,0.0,0.0,0.0,,0.0,0.0,0.0,0.0,,,,0.0,,,,,,,,,,,,,,,
50%,,,962960.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,962990.0,,,,1.0,0.0,0.0,,0.0,0.0,0.0,0.0,,,,0.0,,,,,,,,,,,,,,,
75%,,,1427740.0,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,1428240.0,,,,20.0,6.0,0.0,,0.0,0.0,0.0,0.0,,,,0.0,,,,,,,,,,,,,,,


In [11]:
#Expand columns containing multiple datapoints
#https://stackoverflow.com/questions/38231591/split-explode-a-column-of-dictionaries-into-separate-columns-with-pandas