# Querying ballchasing.com API
ballchasing.com is a data aggregation website for Rocket League, that collects and stores data from hundreds of thousands of uploaded games from players.

In [22]:
import pandas as pd
import numpy as np
import requests
import json
import csv
from rich.progress import Progress

In [3]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
# pd.set_option('display.max_colwidth', None)

First I queried the API for replay IDs, which is how I will later use the GET Replay(ID) function from the API documentation to get individual stats from each game.

For the paramaters, I choose to get ranked standard games - ranked because people take ranked games more seriously than casual ones, and standard in Rocket League just means the 3v3 game mode which is the most popular and format for the world championships. 1v1 and 2v2 also exist but are not the main gamemode.

I got a list of 100,000 replays, but due to how long the next query took, I went with only 50,000 games for this initial modeling.

In [9]:
headers = {
    'Authorization': 'V3xqFi9mOZONpvx4SKm6bRrYFxGMCxQ8LECZwMJM'
}


params = {
    'playlist': 'ranked-standard',
    'season': 'f13',
    'count': 200
}

replay_ids = set()
url = 'https://ballchasing.com/api/replays'

while len(replay_ids) < 100000:
    response = requests.get(url, headers=headers, params=params)
    
    if response.status_code == 200:
        data = response.json()
        replay_ids.update([replay['id'] for replay in data['list']])

        next_page_url = data.get('next')
        if next_page_url:
            url = next_page_url
        else:
            print("No more pages available.")
            break
    else:
        print("Failed to fetch data:", response.status_code)
        break

print("Total unique replay IDs collected:", len(replay_ids))

Total unique replay IDs collected: 100000


The two code snippets below are just for saving and loading the list of queried replay IDs into a csv file for later use

In [17]:
# save list as csv file

with open('C:/Users/nickh/Desktop/BrainStation Course/Capstone Project RL/replay_ids_list.csv', 'w') as file:
    for replay in replay_ids_list:
        file.write(replay + '\n')

In [18]:
# load list 

replay_ids_list = []

with open('C:/Users/nickh/Desktop/BrainStation Course/Capstone Project RL/replay_ids_list.csv', 'r') as file:
    for line in file:
        replay_ids_list.append(line.strip())


I am using a new python library, rich, here to add a progress bar to the API so I can have a better idea of when these queries will end, as each query took over 5 hours. This was ran multiple times, as there were errors that occurred and would terminate the entire script, so I eventually added the block of code at the bottom, that wrote to a json file every 1000 games. I probably ran this query 5 or 6 times, due to troubleshooting and errors and the like.

This code gets all the data from the API for each replay ID, and then in the EDA and Modeling Jupyter Notebook, I load, loop through the data to get what I want, flatten it and turn it into a dataframe. I attempted to format the data while querying it and decided to seperate those tasks, as it was taking way too long to do all at once.

In [13]:
from rich import progress as rp

In [21]:
headers = {
    'Authorization': 'V3xqFi9mOZONpvx4SKm6bRrYFxGMCxQ8LECZwMJM'
}

list_of_replay_ids = replay_ids_list35000to50000 
all_replays_data = []

def save_data_to_json(data, filename='C:/Users/nickh/Desktop/BrainStation Course/Capstone Project RL/AllData35000to50000NoProcessing.json'):
    with open(filename, 'a') as file:
        json.dump(data, file)
        file.write('\n')
        
        
with rp.Progress() as progress:
    completed = progress.add_task('She movinnn', total=len(list_of_replay_ids)) # rich library code to intialize a progress bar equal to the length of the list of replay IDs
    for index, replay_id in enumerate(list_of_replay_ids):
        replay_url = f'https://ballchasing.com/api/replays/{replay_id}'
        response = requests.get(replay_url, headers=headers)

        if response.status_code == 200: # the status code for the API for a successfull query was 200
            replay_data = response.json()
            all_replays_data.append(replay_data)
            progress.advance(completed) # this is where rich updates the progress bar in the loop
            
            # periodically saving to a json file in the case of an unexpected error
            if (index + 1) % 1000 == 0 or (index + 1) == len(list_of_replay_ids):
                save_data_to_json(all_replays_data)
                print('just wrote to json file')
                all_replays_data = []
        else:
            print(f"FAILURE {replay_id}: {response.status_code}")



Output()