# Getting match data from the RIOT games API
This notebook goes through the process of aggregating match data from the RIOT games API. Due to rate limiting and the convoluted method of obtaining match data, the process can take quite a while. Where possible, I have saved the intermediate results so that long steps do not need to be repeated. 

## Here is the workflow for obtaining the data:

- [X] Step 1. Obtain summoner IDs by looking up the summoner data from the first 100 pages of each tier and division 
(save summoner_ids to 'summoner_id_file')

- [X] Step 2. Use those summoner IDs to obtain the corresponding PUUIDs
(save puuids in a pickle file 'PUUIDs' )

- [X] Step 3. Use the PUUIDs to query the match history of those summoners, obtaining a list of match IDs
(save match IDs in a pickle file 'match_IDs' )

- [ ] Step 4. Use the match IDs to get the match data
(save match data in a pickle file 'match_data')



In [1]:
"""
@author: Mark Bugden
August 2022

Part of a ML project in predicting win rates for League of Legends games based on team composition.
Current update available on GitHub: https://github.com/Mark-Bugden
"""

# Import anything necessary
import requests
import pandas as pd
from ratelimit import limits, sleep_and_retry
import pickle
import math


# This gives us a progress bar for longer computations. 
from tqdm.notebook import tqdm
# To use it, just wrap any iterable with tqdm(iterable).
# Eg: 
# for i in tqdm(range(100)):
#     ....




# We need to pick a region. 
region_list = ['BR1', 'EUN1', 'EUW1', 'JP1', 'KR', 'LA1', 'LA2', 'NA1', 'OC1', 'RU', 'TR1']
region = 'EUN1'


# Here are the tiers and divisions
tier_list = ['DIAMOND', 'PLATINUM', 'GOLD', 'SILVER', 'BRONZE', 'IRON']
division_list = ['I', 'II', 'III', 'IV']



# Load the data for the champions
champion_url = 'http://ddragon.leagueoflegends.com/cdn/12.14.1/data/en_US/champion.json'
r = requests.get(champion_url)
json_data = r.json()
champion_data = json_data['data']

#champions = list(champion_data.keys())
#champion_data['Zyra']

### Note: 
The API rates are meant to be 20/1s and 100/120s, but I have found that I get errors when I set the ratelimit to exactly that. I have found that I don't get any errors when I set it at half the rate, which works for now, but doubles the time required to get the data. I should try it again at slightly over half the rate, to see if I get an error. If I do, then I am probably accidentally accessing the API twice per call instead of once.

### Update
Setting it at 5/7s seems to be a good compromise. 

### Update
My new API rate limit is 50/10, so to be safe I will set it at 8/2 (which is 40/10).


In [2]:
# Some useful functions


def unique(l):
  
    # insert the list to the set
    list_set = set(l)
    # convert the set to the list
    unique_list = (list(list_set))
    return unique_list

def flatten(l):
    ''' Flattens a list
    
    Parameters
    ----------
        l:list
            A list to be flattened
    
    Returns
    -------
        list
            The flattened list
    
    '''
    return [item for sublist in l for item in sublist]


def chunks(lst, n):
    ''' Splits a list into n equal pieces
    
    Parameters
    ----------
        l:list
            A list to be split into chunks
    
    Returns
    -------
        list
            The list
    '''
    n = max(1, n)
    k, m = divmod(len(lst), n)
    return (lst[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))


# We will need an API key to access the Riot games API. I have one of these, but I don't want it to be publically available on my GitHub, so I am storing it locally in a text file. 

def getAPI_key():
    ''' Accesses my locally stored API key so that I don't have to include it publically on GitHub
    
    Returns
    -------
        string:
            My API key for RIOT games

    '''
    f = open("api_key.txt", "r")
    return f.read()



# Our API calls are rate limited to 100 every 2 minutes, or 20 every second. So we will use the ratelimit package to limit how many times we call the API. 
# If the rate limit is reached, the program will sleep until it can try again. We will set the rate to 5 calls per 7s. This will be slower for short queries, but won't give us errors long ones.
# Note that it should be 5/6s, but for some reason that gave me Error:429. Trying 7s just to be a bit safer.




@sleep_and_retry
@limits(8, 2)
def callAPI(url):
    ''' Send and retrieve API requests, rate limited to the RIOT games API rate limit. 
    
    Parameters
    ----------
        url: string
            The URL of the request you are making. 

    Returns
    -------
        list
            A list of dictionaries encoding the data accessed. 
    '''
    r = requests.get(url)
    if r.status_code != 200:
        raise Exception('API response: {}'.format(r.status_code))
        
    return r.json()


# If I am getting a 401 error, I probably just need to refresh my API key from the developer website
# If I am getting a 404 error, there is a problem with that particular entry. 



def get_summoner_ids(page=1):
    '''
    Aggregates a list of summoner ids from the first page of all the low-ranking tiers and divisions.
    
    Parameters
    ----------
        page: int
            Which page is queried for the summoner info
    
    Returns
    -------
        list
            A list of summoner ids
    
    '''
    summoners = []

    # For all leagues from Iron to Diamond, and for all tiers from I to IV, send a request to get the first page of the summoners for that league and tier.
    for tier in tqdm(tier_list):
        for division in division_list:

            url = 'https://' + region + '.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/' + tier + '/' + division + '?page=' + str(page) + '&api_key='

            # Here json_data is a list. Each item in the list corresponds to one summoner, and is a dict whose key/value pairs contain information about that summoner.
            json_data = callAPI(url + getAPI_key())
            

            for item in json_data:
                summoners.append(item)

    summoners_df = pd.DataFrame(summoners)

    return summoners_df['summonerId'].tolist()





def get_puuids(ids):
    ''' Takes a list of summoner ids and queries the RIOT API for their puuids
    
    Parameters
    ----------
        ids: list
            A list of summoner ids
        
    Returns
    -------
        list
            A list of the corresponding puuids
    '''
    
    summoner_info = []

    for summoner in tqdm(ids):
        url = 'https://' + region + '.api.riotgames.com/lol/summoner/v4/summoners/' + summoner + '?api_key='
        try:
            json_data = callAPI(url + getAPI_key())
            summoner_info.append(json_data)
        except Exception as e:
            print(e)
            
    df_summ = pd.DataFrame(summoner_info)
    
    return df_summ['puuid'].tolist(), df_summ





def get_match_ids(puuids, n = 100):
    ''' Takes a list of puuids and returns the match IDs for the previous n matches. Any duplicate match IDs are removed. 
    
    Parameters
    ----------
        puuids: list
            A list of puuids to query
        n: int 
            The number of matches to get per puuid
        
    Returns
    -------
        list
            A list of match ids
    '''
    
    match_id = []
    
    for puuid in tqdm(puuids):
        url = 'https://europe.api.riotgames.com/lol/match/v5/matches/by-puuid/' + puuid + '/ids?start=0&count=100&api_key='
        try:
            json_data = callAPI(url + getAPI_key())
            match_id.append(json_data)
        except Exception as e:
            print(e)

    return list(set(flatten(match_id)))






def get_match_data(batch):
    ''' Accesses the match data for a given batch of match ids and returns the data as a list. Skips an entry if a 404 error is returned
    
    Parameters
    ----------
        batch: list
            A list of match ids 
            
    Returns
    -------
        list
            A list containing the match data for each of the match ids in batch
            
    '''
    data_list = []
    
    for match in tqdm(batch):
        url = 'https://europe.api.riotgames.com/lol/match/v5/matches/'+ match + '?api_key='
        try:
            json_data = callAPI(url + getAPI_key())
            data_list.append(json_data)
        except Exception as e:
            print(e)
    return data_list

# Getting the Summoner IDs

In [3]:
# Getting the summoner ids is as easy as calling the get_summoner_list function 
# We will get the first, say 100 pages of each division
#summoner_ids = []
#for page in range(1,101):
#    print(page)
#    summoner_ids = summoner_ids + get_summoner_ids(page)

In [4]:
# Make sure we don't have any duplicates
#summoner_ids = unique(summoner_ids)

In [5]:
# Save them to a file because it took so long to get
#with open("Step 1 summoner ids/summoner_id_file", "wb") as fp:   #Pickling
#    pickle.dump(summoner_ids, fp)

In [6]:
# Load the summoner ids
with open("Step 1 summoner ids/summoner_id_file", "rb") as fp:   # Unpickling
    summoner_ids = pickle.load(fp)

In [7]:
len(summoner_ids)

417256

In [8]:
# Since this is such a large number, the next step is going to take a long time. 417256/4 ~ 104300s ~28 hours . We will need to split it up and do it in batches. 
Numsplits = 8 
spl = math.floor(len(summoner_ids)/Numsplits)

summoner_id_batches = {}
for i in range(0, Numsplits):
    summoner_id_batches[f"batch{i}"] = summoner_ids[i*spl:(i+1)*spl]
    


In [9]:
print(summoner_id_batches.keys())
print('lenght of batch0: ', len(summoner_id_batches['batch0']))

dict_keys(['batch0', 'batch1', 'batch2', 'batch3', 'batch4', 'batch5', 'batch6', 'batch7'])
lenght of batch0:  52157


# Getting the PUUIDs

In [10]:
# get_puuids takes approximately 3.6 hours on each summoner_id_batches['batchX']

In [11]:
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch0'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch0", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch0", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)

In [12]:
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch1'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch1", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch1", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)

In [13]:
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch2'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch2", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch2", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)

In [14]:
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch3'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch3", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch3", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)
    
    
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch4'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch4", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch4", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)
    

In [15]:
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch5'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch5", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch5", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)

In [16]:
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch6'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch6", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch6", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)

In [17]:
#summoner_puuids, summoner_data = get_puuids(summoner_id_batches['batch7'])

# Save this to pickle file since it took a long time to get
#with open("Step 2 puuids/puuid_file_batch7", "wb") as fp:   #Pickling
#    pickle.dump(summoner_puuids, fp)

#with open("summoner data/summoner_data_file_batch7", "wb") as fp:   #Pickling
#    pickle.dump(summoner_data, fp)

The puuid_file_batchX files are lists saved as pickle files. Each element of the list is a puuid. To unpickle, we use
```
with open(str("Step 2 puuids/puuid_file_batch0"), "rb") as fp:   # Unpickling
    summoner_puuids = pickle.load(fp) 
```


The summoner_data_file_batchX files are dictionaries saved as pickle files. To unpickle them, we use
```  
with open(str("summoner data/summoner_data_file_batch0"), "rb") as fp:   # Unpickling
    summoner_data = pickle.load(fp) 
```

In [18]:
# Load all the puuid batch files as a dictionary
puuid_batches = {}
for i in range(0, 8):
    with open(str("Step 2 puuids/puuid_file_batch"+str(i)), "rb") as fp:   # Unpickling
        puuid_batches["batch{0}".format(i)] = pickle.load(fp)
        
# Load all the summoner_data batch files and combine them into a big dataframe
summoner_data_batches = {}
for i in range(0, 8):
    with open(str("summoner data/summoner_data_file_batch"+str(i)), "rb") as fp:   # Unpickling
        summoner_data_batches["batch{0}".format(i)] = pickle.load(fp)
summoner_data = pd.concat(summoner_data_batches, ignore_index=True)
        

In [19]:
# All of the summoner information is contained in the dataframe summoner_data
summoner_data.head()

Unnamed: 0,id,accountId,puuid,name,profileIconId,revisionDate,summonerLevel
0,2TQCTFCJE1PI5WHMG2Qxjh_Vo_Ju8eQ2OarNradfNQ40ZHQ,E9F0b7xP56qjwXgmrPJWZdpie3Oe6NRCQGzvyePU9nayTA,HTUDJux_oQMU6ghOg1LS8TvaQnzDHZdMTLArzr7iEzcBAi...,Atesh,506,1662566903000,92
1,YNRElnuhhS-iWtIVh0zlD_yIYSFK4bHnuGjjz-BEZrU3onA,FFRvjFILjiYVKsIaTwzQCn7NSsG50qD6RGnE1WGpG83V5Q,Bsa_8nMih4YPYLiJCoL09PnFG_RlBcMX2iPYdcehw3_rGh...,Guzoll,682,1666451932000,365
2,SxDNYSfSqa_smQo1lIL0MpEppXHWiTBP1jyWIDTCEHAGK5...,KR0caZgtxgM8_Jsi0tq2vPMdroFcFGe1GJtrPpmMDGTYpp...,OVGe8En9cUxBp5JHf9_rSeksKG1ZCIIITokVXtGqZZns4L...,Fatsou,1298,1666218411000,293
3,88aTHNaiQyG2gqLkuWwWTp-_aY7Q5oIbKDr4MK9te5Mcqpk,TkmbW74370Y8fwdafrqQf6lxK8fAtURb-jqg8EGkP0AxvAs,tAZdTPY_CkIosKcLBFiOWA_1YsTsDp5yQJYpOlERJR4CvP...,Szefciębson 2115,5413,1666480443122,489
4,vp89cmGE32oYZcPJHayg8ijk4u96mShP0PumrMjtOmzhUNY,150FjKZZGdt1G5RO5wnaCzkEG0R_UeLow1HI1o5sIQeXBg,u3QcsJ56LtPuebqyXFK3nKqyrpBTPxQmUTSxWDGDEoaNBm...,marimitsous7,7,1652646921000,172


In [20]:
# The puuids are now stored in a dictionary puuid_batches. The keys are
print(puuid_batches.keys())

dict_keys(['batch0', 'batch1', 'batch2', 'batch3', 'batch4', 'batch5', 'batch6', 'batch7'])


In [21]:
# The first 10 elements of the first batch are
print(puuid_batches["batch0"][:10])

['HTUDJux_oQMU6ghOg1LS8TvaQnzDHZdMTLArzr7iEzcBAiUdKyknzRuQhi06lgGkazYhUkLggSbpnw', 'Bsa_8nMih4YPYLiJCoL09PnFG_RlBcMX2iPYdcehw3_rGhkXt2P0P7TAxP3SGFtIE1HQ_feo9ZD-OA', 'OVGe8En9cUxBp5JHf9_rSeksKG1ZCIIITokVXtGqZZns4L2BmF3vfnSU-AUK1HvlMsZjh5V8p1ewsg', 'tAZdTPY_CkIosKcLBFiOWA_1YsTsDp5yQJYpOlERJR4CvPCZPdBznLXXBeVnrua7G4qLAVrtu3NWhg', 'u3QcsJ56LtPuebqyXFK3nKqyrpBTPxQmUTSxWDGDEoaNBma-l5XtBi4ek6_H9lAobEWqjmndK9OyVQ', 'bGPVaqe_Qz3rdcdlIm0RLMziqtrnw9Kq1heLT5OehAdTKicj65-f3c9XYW7CwLOOByk-Y1yk7lSiUw', 'Vy-EqEEU6ztNQL17IhUzhzf52ua9crMm5va9GCYrrPJJKtfIfGzyyEiF8z7B_2WeqvtwusveOfU46A', '0Jz41qAiK1a0ZD5TR9pajNlo9CJmGIG6vCtqwq64f5o0ne3YUuUEfODkRwFCNXp1REnL-2je_Un7pw', 'bPe2HKorRPMuBsUQIjPu2EDCPWIKq7XXAdLOkAv8g7eeuG6GSxyK-D3Zo_d_1PNdpM1sEG1I4k7Plg', '0kndBNuhuKcvp6N8tHCdHRlQobXkt0wBm1UUXFwzamfYeolwRgy4rMTQr3UOmggfo4HpTfI4q87J6Q']


# Getting the Match IDs

In [22]:
# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch0'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch0", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)

In [23]:
##################################################################################
# Approximately 3 hours

# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch1'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch1", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)

In [24]:
##################################################################################
# Approximately 3 hours

# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch2'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch2", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)
    
##################################################################################
# Approximately 3 hours

# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch3'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch3", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)
    
##################################################################################
# Approximately 3 hours

# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch4'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch4", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)
    
##################################################################################
# Approximately 3 hours

# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch5'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch5", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)
    
##################################################################################
# Approximately 3 hours

# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch6'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch6", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)
    
##################################################################################

In [25]:
# Approximately 3 hours

# Now that we have a list of puuids, we can query their match histories. We will also have to do this in batches. 
# Since we will be doing one query per puuid, it makes sense to use the batches we already have.

#match_ids = get_match_ids(puuid_batches['batch7'])

# Save this to pickle file since it took a long time to get
#with open("Step 3 match ids/match_id_file_batch7", "wb") as fp:   #Pickling
#    pickle.dump(match_ids, fp)
    
    

In [26]:
# Load all the match_id batch files as a dictionary, making sure we only take unique values

match_id_batches = {}
for i in range(0, 8):
    with open(str("Step 3 match ids/match_id_file_batch"+str(i)), "rb") as fp:   # Unpickling
        match_id_batches["batch{0}".format(i)] = pickle.load(fp)
        match_id_batches['batch{0}'.format(i)] = unique(match_id_batches['batch{0}'.format(i)])

In [27]:
# The match ids are now stored in a dictionary match_id_batches. The keys are
print(match_id_batches.keys())

dict_keys(['batch0', 'batch1', 'batch2', 'batch3', 'batch4', 'batch5', 'batch6', 'batch7'])


In [28]:
# The first ten elements of match_id_batches['batch0'] are:
print(match_id_batches['batch0'][:10])

['EUN1_2949965728', 'EUN1_3198243338', 'EUN1_3231154689', 'EUN1_3205390161', 'EUN1_3196810490', 'EUN1_3206651913', 'EUN1_3164880211', 'EUN1_3228276362', 'EUN1_2873470913', 'EUN1_3207530104']


In [29]:
# How many match ids do we have?
match_id_count = 0
for i in range(0,8):
    match_id_count += len(match_id_batches['batch{0}'.format(i)])
print(f'The number of match ids is {match_id_count/1000000} million')

The number of match ids is 38.653157 million


In [30]:
# Let us suppose that I want to get match data over a period of 1 day. How much match data can I get? The rate limit I have is 8/2s.
print(f'The number of matches for which I can obtain match data is about {60*60*24*(8/2)} per day')

The number of matches for which I can obtain match data is about 345600.0 per day


In [31]:
# This means that I can get a million matches if I collect over three days. That's enough to start with. It is about a fifth of one batch.
# This also means that it would take about 120 days to get all the match data.


In [32]:
# We have the match IDs, but now we need the match data. Unfortunately, this is going to take a LONG time to get due to rate limiting, so we will be doing it in mini batches.
# We will begin by splitting each match_id_batches into 100 mini batches.

match_id_minibatches = {}
for i in range(0,8):
    match_id_minibatches['batch{0}'.format(i)] = list(chunks(match_id_batches['batch{0}'.format(i)], 100))


In [33]:
# Great, we now have a big list of match IDs split up into batches and minibatches. 
# Each minibatch should take about 3.5 hours to run. Let's test on one.

#match_data = get_match_data(match_id_minibatches['batch0'][0])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch0", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)

In [34]:
# Great, we now have a big list of match IDs split up into batches and minibatches. 
# Each minibatch should take about 3.5 hours to run.

#match_data = get_match_data(match_id_minibatches['batch0'][1])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch1", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)

In [35]:
# Great, we now have a big list of match IDs split up into batches and minibatches. 
# Each minibatch should take about 3.5 hours to run.

#match_data = get_match_data(match_id_minibatches['batch0'][2])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch2", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)
    
#match_data = get_match_data(match_id_minibatches['batch0'][3])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch3", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)

In [36]:
# Great, we now have a big list of match IDs split up into batches and minibatches. 
# Each minibatch should take about 3.5 hours to run.

#match_data = get_match_data(match_id_minibatches['batch0'][4])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch4", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)

In [37]:
  
#match_data = get_match_data(match_id_minibatches['batch0'][5])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch5", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)

In [38]:
  
#match_data = get_match_data(match_id_minibatches['batch0'][6])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch6", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)

In [39]:
  
#match_data = get_match_data(match_id_minibatches['batch0'][7])

# Save this to pickle file since it took a long time to get
#with open("Step 4 match data/batch0/minibatch7", "wb") as fp:   #Pickling
#    pickle.dump(match_data, fp)

## That's probably enough for now

In [40]:
# This loading is quite slow since the games have a lot of information and there are a lot of games.

ranked_matches = []

for i in tqdm(range(0, 8)):
    with open(str("Step 4 match data/batch0/minibatch"+str(i)), "rb") as fp:   # Unpickling
        match_data = pickle.load(fp)
    
    batch_ranked_matches = []
    for match in tqdm(range(len(match_data))):
        for j in range(10):
            if (match_data[match]['info']['gameDuration'] >= 900) and (match_data[match]['info']['queueId'] == 420):
                row_dict = {k: match_data[match]['info']['participants'][j][k] for k in ('win', 'championName', 'teamId', 'summonerName')}
                row_dict['team'] = 'Blue' if row_dict['teamId']==100 else 'Red'
                row_dict['matchId'] = match_data[match]['metadata']['matchId']
                row_dict['gameMode'] = match_data[match]['info']['queueId']
                batch_ranked_matches.append(row_dict)

        
    ranked_matches = ranked_matches+ batch_ranked_matches

rankeddf = pd.DataFrame(ranked_matches)
rankeddf = rankeddf.drop(columns=['teamId'])
rankeddf = rankeddf.set_index(['matchId', 'team'])
        
    

  0%|          | 0/8 [00:00<?, ?it/s]

  0%|          | 0/48287 [00:00<?, ?it/s]

  0%|          | 0/48281 [00:00<?, ?it/s]

  0%|          | 0/48286 [00:00<?, ?it/s]

  0%|          | 0/48284 [00:00<?, ?it/s]

  0%|          | 0/48281 [00:00<?, ?it/s]

  0%|          | 0/48282 [00:00<?, ?it/s]

  0%|          | 0/48272 [00:00<?, ?it/s]

  0%|          | 0/48277 [00:00<?, ?it/s]

In [41]:
len(ranked_matches)

1710440

In [None]:
# We have approximately 100000 mateches to work with.

In [42]:
rankeddf.head(30)

Unnamed: 0_level_0,Unnamed: 1_level_0,win,championName,summonerName,gameMode
matchId,team,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
EUN1_3205133592,Blue,False,TahmKench,mariogrzyb321,420
EUN1_3205133592,Blue,False,Elise,xBakuu,420
EUN1_3205133592,Blue,False,Azir,Vecrone,420
EUN1_3205133592,Blue,False,Jhin,UNfriendlyEwok,420
EUN1_3205133592,Blue,False,Yuumi,metrosexual,420
EUN1_3205133592,Red,True,Garen,DEMACI4,420
EUN1_3205133592,Red,True,Kayn,zombieldtv,420
EUN1_3205133592,Red,True,Cassiopeia,whimsical 11 19,420
EUN1_3205133592,Red,True,Varus,ΞΞ ZielU ΞΞ,420
EUN1_3205133592,Red,True,Xerath,savo2kk,420


In [43]:
rankeddf.info()

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1710440 entries, ('EUN1_3205133592', 'Blue') to ('EUN1_3191594151', 'Red')
Data columns (total 4 columns):
 #   Column        Dtype 
---  ------        ----- 
 0   win           bool  
 1   championName  object
 2   summonerName  object
 3   gameMode      int64 
dtypes: bool(1), int64(1), object(2)
memory usage: 54.2+ MB


In [44]:
# Seems like we have a nice pool of ~21000 ranked games to begin with. Not bad. 

In [45]:
# Let's save this as a csv which we will import in the next notebook.
rankeddf.to_csv('ranked_matches.csv')