<h2 align = 'center'> Creating the newdf dataframe </h2>

Once we got all the match objects into a folder (4.3 GB of data), the next step is to extract all the data that we want from each match, and discard the rest. In order to do this, we have to decide which variables in-game we think might possibly be important. After much discussion, analysis of games, and personal experience playing the game, here are the metrics we believed to have possible significance over determining win/loss. All of these factors were determined based on the fact that these features gave either team (when accomplished) an advantage that helped them towards the ultimate goal of taking down the Nexus.

| Name | Description (All differences are blue team - red team)    
| :- |-------------:
|Diffcc| Difference in the amount of "crowd control" applied (a mechanic that immobilizes enemies)
|Diffdmg| Difference in total damage dealt by each team
|Diffgold| Difference in total gold earned per team
|Diffkda| Difference in the kill-death-assist ratio of each team
|Diffrange| Difference in number of ranged champions on each team
|Diffspree| Difference in number of times there was a killing streak
|Difftank| Difference in the number of "tank" champions on the each team
|Diffdrag| Difference in the number of dragons (a neutral objective) slain
|Diffbaron| Difference in the number of barons (a much more powerful neutral objective) slain
|Difftp| Difference in the number of champions who took the "teleport" summoners spell
|Fblood| Which team got the first kill of the game
|Fdrag| Which team took down the first dragon of the game
|Finhib| Which team took down the first inhib of the game
|Fturret| Which team took down the first turret of the game
|Fbaron| Which team took down the first baron the game
|Diffcs| Difference in creep score of each team (creep score is the number of AI-controlled minions killed)
|Win (dependent variable)| Which team won or lost (1 for blue win, 0 for red win)

What we quickly realized after running some analysis on the data was that simply having post-game metrics do determine game wins/losses was not very helpful in understanding what dynamics of the game influence victory. In other words, a lot of the interpretation we were able to make using post-game data was a bit obvious (we'll talk more on this in our analysis part). For now, we saved the generated csv file into the file "newdf." 

Please note that you must obtain your own api key and replace it in the "YOUR_API_KEY_HERE' portion in the code below.

In [57]:
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")
from cassiopeia import riotapi as ra
from cassiopeia.type.core.common import Queue
import pickle
import os
import time
# use personal API key given by Riot Games
ra.set_api_key('YOUR_API_KEY_HERE')
# all games will be taken from the North American server
ra.set_region('NA')
# the folder in which all of the data is held
path = 'match_data'

In [58]:
# initialize all column values
fblood = []
fdrag = []
finhib = []
fturret = []
fbaron = []
diffdrag = []
diffbaron = []
diffcs = []
diffkda = []
diffcc = []
diffspree = []
diffgold = []
diffdmg = []
difftank = []
diffrange = []
difftp = []
win = []

In [59]:
# list of column names
colnames = ['fblood','fdrag','finhib','fturret','fbaron','diffdrag','diffbaron','diffcs','diffkda','diffcc',
            'diffspree','diffgold','diffdmg','difftank','diffrange', 'difftp', 'win']
print(colnames, len(colnames))

['fblood', 'fdrag', 'finhib', 'fturret', 'fbaron', 'diffdrag', 'diffbaron', 'diffcs', 'diffkda', 'diffcc', 'diffspree', 'diffgold', 'diffdmg', 'difftank', 'diffrange', 'difftp', 'win'] 17


In [60]:
# loops through all 20000 matches in the folder
for filename in os.listdir(path):
    with open(path + '\\' + filename, 'rb') as f:
        match = pickle.load(f)
    print("Processing match", match.id)

    # for all indicator variables, 1=blue and 0=red
    # for all difference values, positive=blue and negative=red
    
    ## INDICATOR VARIABLES
    # winner
    win.append(match.blue_team.win)
    # first blood
    fblood.append(match.blue_team.first_blood)
    # first dragon
    fdrag.append(match.blue_team.first_dragon)
    # first inhib
    finhib.append(match.blue_team.first_inhibitor)
    # first turret
    fturret.append(match.blue_team.first_turret)
    # first baron
    fbaron.append(match.blue_team.first_baron)

    # difference in number of dragons killed
    diffdrag.append(match.blue_team.dragon_kills - match.red_team.dragon_kills)
    # difference in number of barons killed
    diffbaron.append(match.blue_team.baron_kills - match.red_team.baron_kills)
    
    # various team sums or averages of individual player stats (e.g. creep score, kda, gold earned, etc.)
    blueteamchamps = []
    redteamchamps = []
    bcs = 0
    rcs = 0
    bkda = 0
    rkda = 0
    bcc = 0
    rcc = 0
    bspree = 0
    rspree = 0
    bgold = 0
    rgold = 0
    bdmg = 0
    rdmg = 0
    bss = []
    rss = []
    
    # loop through each player in blue team to find total stats for blue
    for participant in match.blue_team.participants:
        # creep score
        bcs = bcs + participant.stats.minion_kills + participant.stats.monster_kills
        # kda
        bkda = bkda + participant.stats.kda
        # crowd control dealt
        bcc = bcc + participant.stats.crowd_control_dealt
        # number of killing sprees
        bspree = bspree + participant.stats.killing_sprees
        # total gold earned
        bgold = bgold + participant.stats.gold_earned
        # total damage dealt to champions
        bdmg = bdmg + participant.stats.damage_dealt_to_champions
        # list of summoner spells to mark Teleport later on in the code
        if participant.summoner_spell_d is not None:
            bss.append(participant.summoner_spell_d.name)
        if participant.summoner_spell_f is not None:
            bss.append(participant.summoner_spell_f.name)
        # champion names
        blueteamchamps.append(participant.champion.name)
    # everything above, except for red team
    for participant in match.red_team.participants:
        rcs = rcs + participant.stats.minion_kills + participant.stats.monster_kills
        rkda = rkda + participant.stats.kda
        rcc = rcc + participant.stats.crowd_control_dealt
        rspree = rspree + participant.stats.killing_sprees
        rgold = rgold + participant.stats.gold_earned
        rdmg = rdmg + participant.stats.damage_dealt_to_champions
        if participant.summoner_spell_d is not None:
            rss.append(participant.summoner_spell_d.name)
        if participant.summoner_spell_f is not None:
            rss.append(participant.summoner_spell_f.name)
        redteamchamps.append(participant.champion.name)
    # difference in team creep score
    diffcs.append(bcs-rcs)
    # difference in team kda
    diffkda.append(bkda-rkda)
    # difference in team crowd control dealt
    diffcc.append(bcc-rcc)
    # difference in team number of killing sprees
    diffspree.append(bspree-rspree)
    # difference in team gold earned
    diffgold.append(bgold-rgold)
    # difference in team damage dealt to champions
    diffdmg.append(bdmg-rdmg)
    # difference in number of teleports
    difftp.append(bss.count('Teleport')-rss.count('Teleport'))
    
    # comparing champion categories on each side - tanks/non-tanks are binaries and ranged/melee are binaries, so we will 
    # only find differences in tanks and ranged, which means the rest of the team can be categorized as "non-tanks" and "melee"
    
    # blue team tank
    bluetanks = 0
    for champ in blueteamchamps:
        if ra.get_champion_by_name(champ).info.defense >= 7:
            bluetanks = bluetanks + 1
    # red team tank
    redtanks = 0
    for champ in redteamchamps:
        if ra.get_champion_by_name(champ).info.defense >= 7:
            redtanks = redtanks + 1
    # diff of number of tanks
    difftank.append(bluetanks-redtanks)

    # blue ranged champs
    bluerange = 0
    for champ in blueteamchamps:
        if ra.get_champion_by_name(champ).stats.attack_range > 200 or champ == 'Jayce' or champ == 'Kayle':
                bluerange = bluerange + 1
    # red ranged champs
    redrange = 0
    for champ in redteamchamps:
        if ra.get_champion_by_name(champ).stats.attack_range > 200 or champ == 'Jayce' or champ == 'Kayle':
                redrange = redrange + 1
    # diff of number of ranged champions
    diffrange.append(bluerange - redrange)

Processing match 1364966345
Processing match 1365000117
Processing match 1365001477
Processing match 1365005581
Processing match 1365190124
Processing match 1365198605
Processing match 1365203523
Processing match 1365209089
Processing match 1365218072
Processing match 1365285451
Processing match 1365302794
Processing match 1365309714
Processing match 1365312773
Processing match 1365313183
Processing match 1365338247
Processing match 1365394610
Processing match 1365408640
Processing match 1365427812
Processing match 1365460378
Processing match 1365473036
Processing match 1365483205
Processing match 1365494761
Processing match 1365495923
Processing match 1365498486
Processing match 1365758207
Processing match 1365795032
Processing match 1365820876
Processing match 1365839894
Processing match 1365874263
Processing match 1366020180
Processing match 1366077082
Processing match 1366106882
Processing match 1366149750
Processing match 1366175749
Processing match 1366195007
Processing match 136

Now, let's create a dataframe with the pulled data by zipping the columns together. 

In [61]:
# create the dataframe
df = pd.DataFrame(dict(zip(colnames,[fblood, fdrag, finhib, fturret, fbaron, diffdrag, diffbaron, diffcs, diffkda, 
                                     diffcc, diffspree, diffgold, diffdmg, difftank, diffrange, difftp, win])))

Finally, let's put this dataframe into a csv file for later use.

In [62]:
# export dataframe to csv
df.to_csv('newdf.csv', index=False)