### Get the list of all games with its id number and ouput a file at `/data/game_id.csv`
As of 11/8/2019. There are 345727 games. More information about the API can be found here https://rawg.io/apidocs and its endpoints can be found here https://api.rawg.io/docs/

In [1]:
import json
import requests
from pprint import pprint
import os
import csv
from time import time
import concurrent.futures
import functools
import math

with open("../secret.json", "r") as f:
    API_KEY = json.load(f)["API_KEY"]
print(API_KEY)


c0011343e46b4572adc0d83b6afd32e5


## Multithreading
This function is responsible for requesting pages of games (40 games per page) and save as a JSON file in `/data/game_id/`.

In [12]:
def worker(start_index, urls_per_worker, urls, downloaded_files, headers):
    for url in urls[start_index: start_index + urls_per_worker]:
        if url.rsplit("page=")[-1] in downloaded_files:
            continue
        try:
            # Request API
            json_data = json.loads(requests.get(url, headers=headers).text)

            # Get wanted data
            D = {game["id"]: game["slug"] for game in json_data["results"]}

            # Save data
            page_no = int(url.split("page=")[-1])
            with open(fr"../data/game_id/{page_no}.json", "w", encoding="utf8") as f:
                json.dump(D, f)
        except:
            print(f"Error with {url}")

    # Verbose notification
    print(
        f"Done from {urls[start_index]} to {urls[start_index + urls_per_worker]}")


In [42]:
# Create data folder if not already existed
if not os.path.exists('../data/game_id/'):
    os.makedirs('../data/game_id/')

# Make the first request to get the total amount of pages to get
headers = {'User-Agent': 'App Name: Education purpose', }
params = {"key": API_KEY, "page_size": 40, "page": 1}
response = requests.get(rf"https://api.rawg.io/api/games?",
                        headers=headers, params=params)
json_data = json.loads(response.text)
no_of_pages = math.ceil(json_data["count"]/40)

# Skip downloaded files
downloaded_files = {file.split(".", 1)[0]
                    for file in os.listdir("../data/game_id/")}

# Make urls
url = response.url
urls = [url[:-1] + str(i) for i in range(1, no_of_pages + 1)
        if str(i) not in downloaded_files]

# Set up number of workers
max_workers = 64
urls_per_worker = int(len(urls)/max_workers)
start_index = range(0, len(urls), urls_per_worker)

print(f"There are {len(urls)} urls, {max_workers} workers. Thus, each worker will request {urls_per_worker} urls")


KeyError: 'count'

The following codes apply concurrent programming to speed up the progress. 32 workers are running at the same time. Each of the workers will individually make a request. Time was reduced from ~ 4 hours to ~40 minutes for 17272 pages

In [24]:
# Run all workers on all urls
t0 = time()
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
    temp = functools.partial(worker,
                             urls_per_worker=urls_per_worker,
                             urls=urls,
                             downloaded_files=downloaded_files,
                             headers=headers,
                             )
    executor.map(temp, start_index)
print(f"Time taken: {time()-t0}")


Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=13087
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14749
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14447
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14449
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14450
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14451
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14452
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14453
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page_size=40&page=14454
Error with https://api.rawg.io/api/games?key=c0011343e46b4572adc0d83b6afd32e5&page

KeyboardInterrupt: 

Load each JSON file in `/data/game_id/` and write to a CSV file which is saved at `/data/game_id.csv`

In [5]:
with open("../data/game_id.csv", "w") as f:
    csv_file = csv.writer(f, lineterminator="\n")
    for file in os.listdir("../data/game_id/"):
        try:
            json_data = json.load(open(f"../data/game_id/{file}", "r"))
        except:
            print(file)
        for game_id, game_name in json_data.items():
            csv_file.writerow([game_id, game_name])
