In [2]:
import pandas as pd
import numpy as np
import requests
import matplotlib.pyplot as plt
from pprint import pprint
%matplotlib inline

In [3]:
import json
import time
import tqdm

In [4]:
pd.options.display.max_columns = 100
pd.options.display.max_rows = 10000

Acquiring the data was in itself the biggest challenege posed by this project. This is because Activison changed the default data privacy settings to private. As a result, accessing the in-game data of other players has now become impossible, unless players manually change their privacy settings to public on the Activision website. This means that classic APIs such as tracker.gg or RapidAPI have lost a lot in terms of practicality without a list of public gamer ids, which I am yet to find. 

After initial research, Alexandre le Corre's Warzone API available on RapidAPI was the best option. However, it has two limitations. The first is that without access to a list of 'public' gamer ids the only available data for download is 'Leaderboard'. Unfortunately, this data is not very exciting containing 16 variables such as: Prestige, XP, Time Played, Wins, Losses, Killstreaks, etc. The second problem is that being a Freemium API, I was limited to 500 rows of data a day.

I also came across an API developed by iShot, which was available on Postman. Even though the privacy constraint still existed this API has access to a wealth of public data (79 columns) such as: Missions Completed, Player Stats, Weapons etc. as long as one can provide match ids. By chance, browsing through the API's discussion on Discord I found a pdf file posted by Caedrius with a list of 1000 match ids (each match contains data on c.150 players) in which he played. Even though I was ideally looking for a list of random ids belonging to a range of players, this was the best I could find and would provide more than enough data to explore my question.

So, once I managed to convert the pdf into a JSON, I adapted the Postman API from the platform so that I could collect the data from all the match ids in one go. This is the exercise that has been carried out in this section.


In [5]:
df_match = pd.read_json("matchids.json", orient ='columns', dtype = {"matchId": object})

In [7]:
# This is the dataframe of the pdf of match ids belonging to Caedrius
df_match.head()

Unnamed: 0,platform,title,timestamp,type,matchId,map
0,battle,mw,2021-01-04 22:18:36,6552125305277136,6562602456312274390,3227376819739457
1,battle,mw,2021-01-04 22:07:30,6552125305277136,13368494592966401607,3227376819739457
2,battle,mw,2021-01-04 15:51:12,3465102547637332,2284449555397668477,3227376819739457
3,battle,mw,2021-01-01 16:13:47,3465102547637332,11831952490139849563,3227376819739457
4,battle,mw,2021-01-01 15:40:38,3465102547637332,807350072882036898,3227376819739457


In [8]:
# extracting the match ids into an iterable list
matchIds = df_match['matchId'].tolist()

In [12]:
# Adjusted Postman API so that all the data associated to each match id can be downloaded in a single collected effort

matches = []

for matchId in matchIds:
    url = 'https://www.callofduty.com/api/papi-client/crm/cod/v2/title/mw/platform/battle/fullMatch/wz/{}/it'.format(matchId)
    r_ = requests.get(url=url)
#     time.sleep(random.randint(2, 5))
    matches.append(r_)

In [9]:
matches.head()

NameError: name 'matches' is not defined

In [None]:
######DON'T LOAD#####

In [14]:
# each match is then converted into a readable dictionary format
duels = []
for match in matches:
    duel = json.loads(match.text)
    duels.append(duel)

In [15]:
duels[1]

{'status': 'success',
 'data': {'allPlayers': [{'utcStartSeconds': 1609796483,
    'utcEndSeconds': 1609798050,
    'map': 'mp_don3',
    'mode': 'br_brduos',
    'matchID': '13368494592966401607',
    'duration': 1567000,
    'playlistName': None,
    'version': 1,
    'gameType': 'wz',
    'playerCount': 151,
    'playerStats': {'kills': 1.0,
     'medalXp': 10.0,
     'objectiveTeamWiped': 1.0,
     'matchXp': 7069.0,
     'scoreXp': 3200.0,
     'wallBangs': 0.0,
     'score': 2900.0,
     'totalXp': 10279.0,
     'headshots': 0.0,
     'assists': 0.0,
     'challengeXp': 0.0,
     'rank': 30.0,
     'scorePerMinute': 242.3398328690808,
     'distanceTraveled': 211795.75,
     'teamSurvivalTime': 656688.0,
     'deaths': 2.0,
     'kdRatio': 0.5,
     'objectiveBrDownEnemyCircle1': 1.0,
     'objectiveBrMissionPickupTablet': 1.0,
     'bonusXp': 0.0,
     'objectiveBrKioskBuy': 1.0,
     'gulagDeaths': 1.0,
     'timePlayed': 718.0,
     'executions': 0.0,
     'gulagKills': 0.0,
 

In [22]:
# all the matches are then concated into one single large dataframe 
# json_normalize flattens the nested dictionaries into a single dataframe (excluding player.loadout)
all_dfs = []
for i in tqdm.tqdm(range(len(duels))):
    try:
        all_dfs.append(pd.json_normalize(duels[i]['data']['allPlayers']))
    except:
        print(f'{i} is useless')
mega_df = pd.concat(all_dfs)

 50%|█████     | 505/1000 [01:36<00:53,  9.28it/s]

502 is useless


100%|██████████| 1000/1000 [02:55<00:00,  5.69it/s]


In [269]:
mega_df.to_csv('../../mega_df.csv')

In [39]:
mega_df.shape

(149009, 157)