# Data Scraping

In this notebook we will be using the Riot Games API to scrape data.

The Riot Games API has a limit of 100 queries every 2 minutes, so we will be keeping track of how many queries we execute so that we know when we have to wait 2 minutes before executing any more queries.

In [1]:
import requests
import time
import pandas as pd

Here we declare our variables, simply change them depending on the user, region, season you want to use the predictor on. 

In our case we used a summoner named Platinum Rule on the EUW (Europe West) server for season 13. 
To generate an API key you will need a Riot Games account. Once you have an acccount you can get a key here: https://developer.riotgames.com/docs/portal

In [2]:
API_KEY = "RGAPI-c766cad8-81f2-4531-9194-70e2cf8c1bad"
summoner_name = "Platinum%20Rule"
region = "euw1"
season = 13
query_count = 0

# Collecting Player Data

We will now create functions that will be used to get unique IDs for the summoner. This will later be used to scraping game data specific to that summoner. 

In [3]:
def get_puuid(summoner_name, region, API_KEY):
    URL = "https://{}.api.riotgames.com/lol/summoner/v4/summoners/by-name/{}?api_key={}".format(region, summoner_name, API_KEY)
    response = requests.get(URL).json()
    puuid = response.get("puuid")
    
    global query_count
    query_count += 1
    
    return puuid

In [4]:
def get_aid(puuid, region, API_KEY):
    URL = "https://{}.api.riotgames.com/lol/summoner/v4/summoners/by-name/{}?api_key={}".format(region, summoner_name, API_KEY)
    response = requests.get(URL).json()
    aid = response.get("accountId")
        
    global query_count
    query_count += 1
    
    return aid

In [5]:
puuid = get_puuid(summoner_name, region, API_KEY)
aid = get_aid(summoner_name, region, API_KEY)

# Setting up our functions for game data collection

First we will need to collect the game ids of all the games our target player (Platinum Rule in this case) played in our target season (season 13). 

The Riot Games API limits the amount of games it returns per query to 100. So we will be using the total amount of games played by the account to re-execute the query adjusting our begin_index and end_index, that way we can collect data about all the games our player played. By the end we want to have a list of all the game IDs of the games the player played. We Will then use that list to iterate through all the player's games and collect data from each game.

In [6]:
total_games = requests.get("https://{}.api.riotgames.com/lol/match/v4/matchlists/by-account/{}?season={}&endIndex=1&beginIndex=0&api_key={}".format(region, aid, season, API_KEY)).json().get("totalGames")
query_count += 1

In [7]:
def get_gameids(region, aid, season, end_index, begin_index, API_KEY):
    URL = "https://{}.api.riotgames.com/lol/match/v4/matchlists/by-account/{}?season={}&endIndex={}&beginIndex={}&api_key={}".format(region, aid, season, end_index, begin_index, API_KEY)
    response = requests.get(URL).json()
    gameids = []
    for i in response.get("matches"):
        gameids.append(i.get("gameId"))
            
    global query_count
    query_count += 1
    
    return gameids

In [8]:
gameids = []
while total_games > 0:
    end_index = total_games
    if (total_games - 90) > 0:
        begin_index = total_games - 90
        total_games -= 90
    else:
        begin_index = 0
        total_games = 0
        
    gameids += get_gameids(region, aid, season, end_index, begin_index, API_KEY)

Great now that we have a list of all the game ids (gamedids), we can begin to collect more detailed data about what exactly happened each game. We will then use this data to make predictions about game outcomes.

Keep in mind our plan is to create a model that can predict whether a player is likely to win or lose the game before the game actually begins, so the data we collect should be data that's available before the game starts. 

In [9]:
def get_game(region, gid, API_KEY):
    URL = "https://{}.api.riotgames.com/lol/match/v4/matches/{}?api_key={}".format(region, gid, API_KEY)
    response = requests.get(URL).json()
                
    global query_count
    query_count += 1
    
    return response

In League Of Legends each team can ban 5 champions before the game begins, let's collect this data. 

In [10]:
def get_bans(game):
    team100 = game.get("teams")[0]
    team200 = game.get("teams")[1]
    bans = {}
    
    for i in team100.get("bans"):
        bans["team100_ban_{}".format(i.get("pickTurn"))] = i.get("championId")
        
    for i in team200.get("bans"):
        bans["team200_ban_{}".format(i.get("pickTurn"))] = i.get("championId")
        
    return bans

In League Of Legends each player can select something called "Runes" these are certain ability or upgrades that a player selects before starting a game.

In [11]:
def get_participant(game, number):
    participant = game.get("participants")[number]
    participant_stats = participant.get("stats")
    data = {"teamId_participant{}".format(number):participant.get("teamId"), "championId_participant{}".format(number):participant.get("championId"), 
           "spell1Id_participant{}".format(number):participant.get("spell1Id"), "spell2Id_participant{}".format(number):participant.get("spell2Id"),
           "role_participant{}".format(number):participant.get("DUO_SUPPORT"), "lane_participant{}".format(number):participant.get("BOTTOM"), 
           "perk0_participant{}".format(number):participant_stats.get("perk0"), "perk1_participant{}".format(number):participant_stats.get("perk1"), 
           "perk2_participant{}".format(number):participant_stats.get("perk2"), "perk3_participant{}".format(number):participant_stats.get("perk3"), 
           "perk4_participant{}".format(number):participant_stats.get("perk4"), "perkPrimaryStyle_participant{}".format(number):participant_stats.get("perkPrimaryStyle"), 
           "perkSubStyle_participant{}".format(number):participant_stats.get("perkSubStyle"), "statPerk0_participant{}".format(number):participant_stats.get("statPerk0"),
           "statPerk1_participant{}".format(number):participant_stats.get("statPerk1"), "statPerk2_participant{}".format(number):participant_stats.get("statPerk2")}
    return data

Finally, we also want to collect the following:

   - The game type    
   - The season (although this should not be relevant as we only collect data regarding a certain season (season 13 in our case))    
   - The game version (which patch the game was played on)   
   - The map    
   - The game mode

And last but not least whether the game was won or not (this will allow us to train a supervised learning algorithm). 

In [12]:
def get_game_data(game):
    game = get_game(region, game, API_KEY)
    
    #Checking for RESPONSE ERRORS
    if len(game) <= 2:
        print(game)
        get_game_data(game.get("gameId")) 
    else:
        data = {}
        data["gameId"] = game.get("gameId")
        data["gameType"] = game.get("gameType")
        data["seasonId"] = game.get("seasonId")
        data["gameVersion"] = game.get("gameVersion")
        data["mapId"] = game.get("mapId")
        data["gameMode"] = game.get("gameMode")

        data = data | get_bans(game)

        for i in range(10):
            data = data | get_participant(game, i)

        if game.get("teams")[0].get("win") == "Win":
            data["win"] = 1
        else:
            data["win"] = 0
    return data

# Collecting the game data

Now we can begin to use our functions to collect all the data from all the matches our player played.

We will loop through each game individually and add it to a list (data). We also check after each query execution if our query count exceeded a multiple of 100, if so the loop sleeps for 2 minutes (this is to avoid the query threshold mentioned earlier).

In [13]:
data = []

In [14]:
for i in gameids:
    data.append(get_game_data(i))
    if (query_count % 100) == 0:
        time.sleep(121)

In [15]:
len(data)

183

# Glancing at our data and saving it to a CSV

Great now that we have our data we can take a quick look just to make sure everything is fine and save it as a CSV.

We will explore the data we collected in the EDA notebook. 

In [16]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [17]:
df = pd.json_normalize(data)

In [18]:
df.head()

Unnamed: 0,gameId,gameType,seasonId,gameVersion,mapId,gameMode,team100_ban_1,team100_ban_2,team100_ban_3,team100_ban_4,team100_ban_5,team200_ban_6,team200_ban_7,team200_ban_8,team200_ban_9,team200_ban_10,teamId_participant0,championId_participant0,spell1Id_participant0,spell2Id_participant0,role_participant0,lane_participant0,perk0_participant0,perk1_participant0,perk2_participant0,perk3_participant0,perk4_participant0,perkPrimaryStyle_participant0,perkSubStyle_participant0,statPerk0_participant0,statPerk1_participant0,statPerk2_participant0,teamId_participant1,championId_participant1,spell1Id_participant1,spell2Id_participant1,role_participant1,lane_participant1,perk0_participant1,perk1_participant1,perk2_participant1,perk3_participant1,perk4_participant1,perkPrimaryStyle_participant1,perkSubStyle_participant1,statPerk0_participant1,statPerk1_participant1,statPerk2_participant1,teamId_participant2,championId_participant2,spell1Id_participant2,spell2Id_participant2,role_participant2,lane_participant2,perk0_participant2,perk1_participant2,perk2_participant2,perk3_participant2,perk4_participant2,perkPrimaryStyle_participant2,perkSubStyle_participant2,statPerk0_participant2,statPerk1_participant2,statPerk2_participant2,teamId_participant3,championId_participant3,spell1Id_participant3,spell2Id_participant3,role_participant3,lane_participant3,perk0_participant3,perk1_participant3,perk2_participant3,perk3_participant3,perk4_participant3,perkPrimaryStyle_participant3,perkSubStyle_participant3,statPerk0_participant3,statPerk1_participant3,statPerk2_participant3,teamId_participant4,championId_participant4,spell1Id_participant4,spell2Id_participant4,role_participant4,lane_participant4,perk0_participant4,perk1_participant4,perk2_participant4,perk3_participant4,perk4_participant4,perkPrimaryStyle_participant4,perkSubStyle_participant4,statPerk0_participant4,statPerk1_participant4,statPerk2_participant4,teamId_participant5,championId_participant5,spell1Id_participant5,spell2Id_participant5,role_participant5,lane_participant5,perk0_participant5,perk1_participant5,perk2_participant5,perk3_participant5,perk4_participant5,perkPrimaryStyle_participant5,perkSubStyle_participant5,statPerk0_participant5,statPerk1_participant5,statPerk2_participant5,teamId_participant6,championId_participant6,spell1Id_participant6,spell2Id_participant6,role_participant6,lane_participant6,perk0_participant6,perk1_participant6,perk2_participant6,perk3_participant6,perk4_participant6,perkPrimaryStyle_participant6,perkSubStyle_participant6,statPerk0_participant6,statPerk1_participant6,statPerk2_participant6,teamId_participant7,championId_participant7,spell1Id_participant7,spell2Id_participant7,role_participant7,lane_participant7,perk0_participant7,perk1_participant7,perk2_participant7,perk3_participant7,perk4_participant7,perkPrimaryStyle_participant7,perkSubStyle_participant7,statPerk0_participant7,statPerk1_participant7,statPerk2_participant7,teamId_participant8,championId_participant8,spell1Id_participant8,spell2Id_participant8,role_participant8,lane_participant8,perk0_participant8,perk1_participant8,perk2_participant8,perk3_participant8,perk4_participant8,perkPrimaryStyle_participant8,perkSubStyle_participant8,statPerk0_participant8,statPerk1_participant8,statPerk2_participant8,teamId_participant9,championId_participant9,spell1Id_participant9,spell2Id_participant9,role_participant9,lane_participant9,perk0_participant9,perk1_participant9,perk2_participant9,perk3_participant9,perk4_participant9,perkPrimaryStyle_participant9,perkSubStyle_participant9,statPerk0_participant9,statPerk1_participant9,statPerk2_participant9,win
0,5030880353,MATCHED_GAME,13,11.1.352.5559,11,CLASSIC,360.0,131.0,141.0,25.0,84.0,92.0,350.0,120.0,28.0,53.0,100,238,4,14,,,8112,8143,8138,8106,9111,8100,8000,5008,5008,5002,100,122,6,4,,,8010,9111,9104,8299,8275,8000,8200,5005,5008,5002,100,11,4,11,,,9923,8143,8138,8135,9111,8100,8000,5005,5008,5002,100,133,7,4,,,8005,9101,9104,8014,8275,8000,8200,5008,5008,5002,100,875,4,14,,,8230,8275,8234,8236,8444,8200,8400,5005,5008,5003,200,33,11,4,,,8439,8446,8429,8242,9111,8400,8000,5005,5003,5002,200,266,4,12,,,8010,9111,9103,8299,8453,8000,8400,5008,5008,5002,200,18,4,7,,,9923,8139,8138,8135,8009,8100,8000,5005,5003,5003,200,55,14,4,,,8128,8143,8138,8135,9111,8100,8000,5008,5008,5002,200,40,4,14,,,8214,8226,8233,8236,8138,8200,8100,5008,5003,5002,0
1,5030907800,MATCHED_GAME,13,11.1.352.5559,11,CLASSIC,11.0,147.0,420.0,555.0,777.0,350.0,526.0,105.0,28.0,89.0,100,235,4,3,,,8351,8313,8345,8410,9104,8300,8000,5005,5008,5002,100,145,7,4,,,9923,8139,8138,8135,8009,8100,8000,5005,5008,5002,100,6,12,4,,,8005,9111,9105,8299,8304,8000,8300,5008,5008,5002,100,54,4,11,,,8229,8226,8210,8236,8429,8200,8400,5005,5002,5002,100,84,4,14,,,8112,8139,8138,8135,8014,8100,8000,5008,5008,5003,200,236,4,12,,,8005,8009,9104,8014,8139,8000,8100,5005,5008,5002,200,111,14,4,,,8439,8401,8473,8453,8347,8400,8300,5007,5002,5002,200,202,7,4,,,8128,8139,8138,8135,8009,8100,8000,5008,5008,5002,200,55,14,4,,,8128,8143,8138,8135,9111,8100,8000,5008,5008,5002,200,64,4,11,,,8010,9111,9105,8299,8304,8000,8300,5005,5008,5002,0
2,5030866370,MATCHED_GAME,13,11.1.352.5559,11,CLASSIC,89.0,81.0,25.0,18.0,82.0,28.0,350.0,141.0,89.0,-1.0,100,29,14,4,,,9923,8139,8138,8135,8236,8100,8200,5008,5008,5002,100,99,14,4,,,8229,8226,8210,8237,8009,8200,8000,5008,5008,5002,100,236,4,7,,,8005,8009,9103,8014,8139,8000,8100,5005,5008,5002,100,19,4,11,,,8128,8143,8138,8105,8234,8100,8200,5005,5008,5002,100,142,4,14,,,8112,8139,8138,8135,8345,8100,8300,5008,5002,5002,200,420,4,12,,,8010,8009,9105,8299,8139,8000,8100,5008,5008,5002,200,147,14,4,,,8229,8226,8210,8237,8009,8200,8000,5008,5008,5002,200,51,7,4,,,8021,9111,9104,8014,8233,8000,8200,5005,5008,5002,200,55,14,4,,,8128,8143,8138,8135,9111,8100,8000,5008,5008,5002,200,121,4,11,,,8112,8143,8138,8135,8233,8100,8200,5008,5008,5002,0
3,5030912328,MATCHED_GAME,13,11.1.352.5559,11,CLASSIC,28.0,238.0,350.0,98.0,9.0,164.0,54.0,350.0,28.0,875.0,100,143,4,3,,,8229,8226,8210,8237,8126,8200,8100,5008,5008,5002,100,60,4,11,,,8112,8126,8138,8105,8275,8100,8200,5005,5008,5002,100,81,4,7,,,8010,8009,9103,8014,8226,8000,8200,5008,5008,5002,100,17,4,14,,,8214,8226,8234,8237,8135,8200,8100,5005,5008,5002,100,39,14,4,,,8010,9111,9104,8299,8345,8000,8300,5005,5008,5001,200,55,14,4,,,8128,8143,8138,8135,9111,8100,8000,5008,5008,5002,200,25,4,14,,,8229,8226,8210,8237,8347,8200,8300,5007,5008,5002,200,23,4,6,,,8008,9111,9105,8299,8444,8000,8400,5005,5008,5003,200,154,4,11,,,8010,9111,9105,8299,8444,8000,8400,5007,5003,5003,200,145,4,1,,,9923,8143,8138,8106,8210,8100,8200,5005,5008,5002,0
4,5030831079,MATCHED_GAME,13,11.1.352.5559,11,CLASSIC,238.0,777.0,17.0,51.0,157.0,20.0,555.0,266.0,350.0,28.0,100,105,4,14,,,8112,8143,8138,8106,8226,8100,8200,5007,5008,5001,100,98,4,11,,,8005,9111,9105,8014,8143,8000,8100,5005,5008,5002,100,360,7,4,,,8010,9111,9103,8014,8226,8000,8200,5005,5008,5002,100,201,4,3,,,8465,8463,8473,8451,8304,8400,8300,5008,5002,5001,100,82,12,4,,,8010,9111,9105,8299,8401,8000,8400,5005,5008,5002,200,55,4,14,,,8010,9111,9105,8014,8143,8000,8100,5008,5008,5003,200,121,11,4,,,8010,9111,9105,8014,8143,8000,8100,5008,5008,5002,200,114,12,4,,,8010,9111,9104,8014,8473,8000,8400,5005,5008,5002,200,81,4,7,,,8010,9111,9104,8014,8304,8000,8300,5005,5008,5002,200,89,4,3,,,8439,8463,8429,8451,8210,8400,8200,5007,5002,5002,1


In [19]:
df.to_csv("games.csv", index=False)