# Data Notebook: Predicting Boots as First Buy in League of Legend Matches

## Summary

  Goal: Binary prediction of whether a player buys boots as their first purchase after starting items in a League of Legends (LOL) match, using information up to the minute the player makes first purchase.
  
  Size: 112380 rows before balancing, 40 features before representation. 
  
  Gathering and Description: Game timeline data gathered with Riotwatcher, a wrapper for Riot (the game company) API. Timeline data snapshots each player's status every minute and records most major events in game. These matches are from a matchID dataset I downloaded from Kaggle.



## Introduction
League of Legends (LOL) is a 5 versus 5, player versus player game where we play as champions. To win, players must gather gold outside their base, then return base to buy items to improve their champion's power. Among the items, boots is a cheap yet critical item that improves a champion's movement speed. Although it is important, whether to buy boots the first time a player returns to base is highly situational. 

The goal of the project is to predict this situational choice and unveil the process, and untimately create a predictor that can predict on live to inform players in game.

Our dataset starts from 12,000 IDs of League of Legends matches, found at https://www.kaggle.com/datasets/sabrinasummers/league-of-legends-diamond-matches-preseason-12. For each match, I gathered the detailed match data through League of Legends API (https://developer.riotgames.com/). The process isn't easy and my data-gathering code can be found in loldata.ipynb.

Each row in our dataset represents a player in a match. The label is whether the player buys boots or not on his first return. The groups of features included are: 
- champion characteristics: How much does your champion need boots?
- opponent characteristics: Is buying boots first a good choice against this particular champion?
- early game performance: Before you buy boots, are you having a successful early game or falling behind?
- player's role in the team: Some roles (UTILITY in particular) requires more moving and an early boots can be helpful.
- player gold amount: Can you afford boots? Can you afford boots + another item?
- game time: How long the game has started.
- pre-game champion setup: Some decisions that affect champion statistics and playstyle. One setup gifts you a boots after several - minutes, but prevent you from buying it manually.

In each game there's 10 players, so we're supposed to have 120,000 rows in total.

## Data Gathering

### API setup

In [2]:
!pip3 install riotwatcher

Collecting riotwatcher
  Downloading riotwatcher-3.2.0-py3-none-any.whl (56 kB)
Installing collected packages: riotwatcher
Successfully installed riotwatcher-3.2.0


In [1]:
from riotwatcher import LolWatcher, ApiError
import pandas as pd
import timeit
import time

# golbal variables
api_key = 'RGAPI-e28e1022-5df1-44b3-a306-78143ae16e26'
watcher = LolWatcher(api_key)
my_region = 'na1'

### Gathering Dictionary from API to translate IDs to Names

In [46]:
# check league's latest version
latest = watcher.data_dragon.versions_for_region(my_region)['n']['item']
# Static information stored in "list"
static_item_list = watcher.data_dragon.items(latest,'en_US')
static_champion_list = watcher.data_dragon.champions(latest,'en_US')
static_runes_list = watcher.data_dragon.runes_reforged(latest,'en_US')
static_ss_list = watcher.data_dragon.summoner_spells(latest,'en_US')

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [49]:
#item static list data to dict for looking up
ssd = {}
for k,v in static_ss_list['data'].items():
    ssd[int(v['key'])] = k
ssd.to_csv('ssd.csv')

In [50]:
ssd # Summoner Spell Dictionary

{21: 'SummonerBarrier',
 1: 'SummonerBoost',
 14: 'SummonerDot',
 3: 'SummonerExhaust',
 4: 'SummonerFlash',
 6: 'SummonerHaste',
 7: 'SummonerHeal',
 13: 'SummonerMana',
 30: 'SummonerPoroRecall',
 31: 'SummonerPoroThrow',
 11: 'SummonerSmite',
 39: 'SummonerSnowURFSnowball_Mark',
 32: 'SummonerSnowball',
 12: 'SummonerTeleport',
 54: 'Summoner_UltBookPlaceholder',
 55: 'Summoner_UltBookSmitePlaceholder'}

In [52]:
runesd = { #Runes Dictionary
5001:'Scaling Health ',
5002:'Armor',
5003:'Magic Resist',
5005:'Attack Speed ',
5007:'CDR',
5008:'Adaptive Force'}
for type in static_runes_list:
    slots = type['slots']
    for s in slots:
        runes = s['runes']
        for r in runes:
            runesd[r['id']] = r['key']

runesd.to_csv('runesd.csv')
runesd

{5001: 'Scaling Health ',
 5002: 'Armor',
 5003: 'Magic Resist',
 5005: 'Attack Speed ',
 5007: 'CDR',
 5008: 'Adaptive Force',
 8112: 'Electrocute',
 8124: 'Predator',
 8128: 'DarkHarvest',
 9923: 'HailOfBlades',
 8126: 'CheapShot',
 8139: 'TasteOfBlood',
 8143: 'SuddenImpact',
 8136: 'ZombieWard',
 8120: 'GhostPoro',
 8138: 'EyeballCollection',
 8135: 'RavenousHunter',
 8134: 'IngeniousHunter',
 8105: 'RelentlessHunter',
 8106: 'UltimateHunter',
 8351: 'GlacialAugment',
 8360: 'UnsealedSpellbook',
 8369: 'FirstStrike',
 8306: 'HextechFlashtraption',
 8304: 'MagicalFootwear',
 8313: 'PerfectTiming',
 8321: 'FuturesMarket',
 8316: 'MinionDematerializer',
 8345: 'BiscuitDelivery',
 8347: 'CosmicInsight',
 8410: 'ApproachVelocity',
 8352: 'TimeWarpTonic',
 8005: 'PressTheAttack',
 8008: 'LethalTempo',
 8021: 'FleetFootwork',
 8010: 'Conqueror',
 9101: 'Overheal',
 9111: 'Triumph',
 8009: 'PresenceOfMind',
 9104: 'LegendAlacrity',
 9105: 'LegendTenacity',
 9103: 'LegendBloodline',
 8014: 

In [10]:
# static list data to dict for looking up
champ_list = []
for k,v in static_champion_list['data'].items():
    champ_list.append([k,v['stats']['movespeed'],v['stats']['attackrange']])
championdf = pd.DataFrame(champ_list) #Champion 
championdf.columns = ['championName','cmovespeed','cattackrange']
enemydf = pd.DataFrame(champ_list)
enemydf.columns = ['enemyName','emovespeed','eattackrange']

### sending API requests (~6 hours total)

The process gets interupted often. So I seperate 12,000 matches into 12 batches 1,000 matches, and parse a batch at a time.

Batch cells are repetitive.

In [3]:
matchesdf = pd.read_csv('match_data.csv') #https://www.kaggle.com/sabrinasummers/league-of-legends-diamond-matches-preseason-12
mids = matchesdf.match_id
mid1 = mids[0:1000]

In [4]:
midl = [mids[1000*i:1000*i + 1000] for i in range(12)]

In [5]:
def process_mids(mids): 
    '''
    Gathers data from Match IDs then put them into a big dataframe.
    Uses helper function match_to_frame(matchID), introduced later.
    '''
    results = []
    mids = mids.tolist()
    for i in range(len(mids)):
        df = match_to_frame(mids[i]) #The weightlifting
        results.append(df)
        time.sleep(1.2) #up to 50 per minute, API limit
        print('done with {}'.format(i))
        print(df.shape)
    rdf = pd.concat(results)
    return rdf

In [53]:
df0 = process_mids(midl[0])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(8, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(7, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(11, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)


done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(9, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 40

done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(11, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(9, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(8, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762
(10, 30)
done with 763

In [59]:
df1 = process_mids(midl[1])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(8, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(9, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)


done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(9, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(8, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405

done with 719
(10, 30)
done with 720
(8, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(8, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762

In [63]:
df2 = process_mids(midl[2])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(9, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)

done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(11, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(9, 30)
done with 380
(8, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(9, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(8, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405
(

done with 719
(10, 30)
done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(9, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(9, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(9, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(11, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762


In [64]:
df3 = process_mids(midl[3])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(11, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(9, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(11, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(8, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(8, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(8, 30)
done with 45
(10, 30)
do

done with 363
(8, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(8, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(9, 30)
done with 372
(8, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(8, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405
(10, 30)
done with 406
(1

done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(8, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(8, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(8, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(8, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(8, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(8, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762
(10, 30)
done with 763
(10

In [22]:
df4 = process_mids(midl[4])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(9, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(8, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)


done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(8, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(11, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(8, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405

done with 719
(10, 30)
done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(11, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 7

In [15]:
df5 = process_mids(midl[5])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(11, 30)
done with 17
(10, 30)
done with 18
(8, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(8, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)


done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(12, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(8, 30)
done with 392
(8, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(11, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405

done with 719
(8, 30)
done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(11, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 76

In [16]:
df6 = process_mids(midl[6])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(8, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)

done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(8, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(8, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(8, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405


done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(8, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(8, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762
(10, 30)
done with 763

In [17]:
df7 = process_mids(midl[7])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(8, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(8, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(8, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(8, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(9, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)
don

done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(8, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(9, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(8, 30)
done with 384
(11, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405
(10, 30)
done with 406


done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(8, 30)
done with 752
(11, 30)
done with 753
(10, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762
(10, 30)
done with 76

In [27]:
df8 = process_mids(midl[8])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(8, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(11, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(9, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(8, 30)
done with 28
(8, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)
do

done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(11, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(8, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(9, 30)
done with 393
(10, 30)
done with 394
(8, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(11, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405


done with 719
(10, 30)
done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(9, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(8, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(8, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762


In [29]:
df9 = process_mids(midl[9])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(11, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(8, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)

done with 362
(8, 30)
done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(8, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(8, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(8, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405
(

done with 719
(10, 30)
done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(8, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(8, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(10, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762

In [30]:
df10 = process_mids(midl[10])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(10, 30)
done with 2
(10, 30)
done with 3
(10, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(8, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(8, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(10, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(8, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(10, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(10, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)
d

done with 362
(10, 30)
done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(10, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(10, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(8, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 40

done with 720
(10, 30)
done with 721
(9, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(10, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(11, 30)
done with 754
(10, 30)
done with 755
(8, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762
(10, 30)
done with 763

In [32]:
df11 = process_mids(midl[11])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


done with 0
(10, 30)
done with 1
(9, 30)
done with 2
(10, 30)
done with 3
(8, 30)
done with 4
(10, 30)
done with 5
(10, 30)
done with 6
(10, 30)
done with 7
(10, 30)
done with 8
(10, 30)
done with 9
(10, 30)
done with 10
(10, 30)
done with 11
(10, 30)
done with 12
(10, 30)
done with 13
(10, 30)
done with 14
(10, 30)
done with 15
(10, 30)
done with 16
(11, 30)
done with 17
(10, 30)
done with 18
(10, 30)
done with 19
(10, 30)
done with 20
(10, 30)
done with 21
(10, 30)
done with 22
(10, 30)
done with 23
(10, 30)
done with 24
(10, 30)
done with 25
(10, 30)
done with 26
(10, 30)
done with 27
(10, 30)
done with 28
(10, 30)
done with 29
(10, 30)
done with 30
(10, 30)
done with 31
(10, 30)
done with 32
(10, 30)
done with 33
(10, 30)
done with 34
(10, 30)
done with 35
(10, 30)
done with 36
(10, 30)
done with 37
(8, 30)
done with 38
(10, 30)
done with 39
(10, 30)
done with 40
(8, 30)
done with 41
(10, 30)
done with 42
(10, 30)
done with 43
(10, 30)
done with 44
(10, 30)
done with 45
(10, 30)
do

done with 363
(10, 30)
done with 364
(10, 30)
done with 365
(10, 30)
done with 366
(10, 30)
done with 367
(10, 30)
done with 368
(10, 30)
done with 369
(10, 30)
done with 370
(10, 30)
done with 371
(10, 30)
done with 372
(10, 30)
done with 373
(10, 30)
done with 374
(10, 30)
done with 375
(10, 30)
done with 376
(10, 30)
done with 377
(10, 30)
done with 378
(10, 30)
done with 379
(10, 30)
done with 380
(10, 30)
done with 381
(10, 30)
done with 382
(10, 30)
done with 383
(10, 30)
done with 384
(10, 30)
done with 385
(10, 30)
done with 386
(10, 30)
done with 387
(10, 30)
done with 388
(10, 30)
done with 389
(10, 30)
done with 390
(10, 30)
done with 391
(10, 30)
done with 392
(9, 30)
done with 393
(10, 30)
done with 394
(10, 30)
done with 395
(10, 30)
done with 396
(10, 30)
done with 397
(9, 30)
done with 398
(10, 30)
done with 399
(10, 30)
done with 400
(10, 30)
done with 401
(10, 30)
done with 402
(10, 30)
done with 403
(10, 30)
done with 404
(10, 30)
done with 405
(10, 30)
done with 406

done with 720
(10, 30)
done with 721
(10, 30)
done with 722
(10, 30)
done with 723
(10, 30)
done with 724
(10, 30)
done with 725
(10, 30)
done with 726
(10, 30)
done with 727
(10, 30)
done with 728
(10, 30)
done with 729
(10, 30)
done with 730
(10, 30)
done with 731
(10, 30)
done with 732
(10, 30)
done with 733
(10, 30)
done with 734
(10, 30)
done with 735
(10, 30)
done with 736
(11, 30)
done with 737
(10, 30)
done with 738
(10, 30)
done with 739
(10, 30)
done with 740
(10, 30)
done with 741
(10, 30)
done with 742
(10, 30)
done with 743
(10, 30)
done with 744
(10, 30)
done with 745
(10, 30)
done with 746
(10, 30)
done with 747
(10, 30)
done with 748
(10, 30)
done with 749
(10, 30)
done with 750
(10, 30)
done with 751
(10, 30)
done with 752
(10, 30)
done with 753
(10, 30)
done with 754
(9, 30)
done with 755
(10, 30)
done with 756
(10, 30)
done with 757
(10, 30)
done with 758
(10, 30)
done with 759
(10, 30)
done with 760
(10, 30)
done with 761
(10, 30)
done with 762
(10, 30)
done with 76

### Save the Dataset

In [20]:
# a look at our gathered dataframe
df5

Unnamed: 0,championId,championName,firstBloodAssist,firstBloodKill,individualPosition,summoner1Id,summoner2Id,participantId,perks,enemyId,...,xp,minionsKilled,jungleMinionsKilled,attackDamage,armor,magicResist,healthMax,healthRegen,powerMax,lifesteal
0,92,Riven,False,False,TOP,12,4,1,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",122,...,1164,10,0,96,46,34,754,19,0,0
1,245,Ekko,False,False,JUNGLE,4,11,2,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",141,...,1346,0,27,64,44,34,927,22,438,0
2,777,Yone,False,False,MIDDLE,14,4,3,"{'statPerks': {'defense': 5003, 'flex': 5008, ...",4,...,908,13,0,76,33,41,755,17,500,0
3,202,Jhin,True,False,BOTTOM,7,4,4,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",81,...,468,8,0,99,32,30,651,8,336,0
4,43,Karma,False,True,UTILITY,14,4,5,"{'statPerks': {'defense': 5003, 'flex': 5002, ...",161,...,257,0,0,51,34,38,544,11,374,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5,887,Gwen,False,False,TOP,12,14,6,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",876,...,1361,20,0,69,54,34,833,31,420,0
6,245,Ekko,False,False,JUNGLE,4,11,7,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",59,...,1584,3,24,64,44,34,777,22,438,0
7,268,Azir,False,False,MIDDLE,4,14,8,"{'statPerks': {'defense': 5001, 'flex': 5008, ...",777,...,1355,21,0,58,25,31,937,17,702,0
8,18,Tristana,False,False,BOTTOM,4,1,9,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",222,...,995,16,0,89,36,30,688,9,297,0


In [None]:
data03 = pd.concat([df0,df1,df2,df3])
data03.to_csv('loldata03.csv')  #So that if my python dies, I don't lose them. It happened.

In [None]:
data47 = pd.concat([df4,df5,df6,df7])
data47.to_csv('loldata47.csv') 

In [33]:
data811 = pd.concat([df8,df9,df10,df11])
data811.to_csv('loldata811.csv') 

In [None]:
datafull = pd.concat([data03, data47, data811])
datafull.to_csv('loldatafull.csv')

### Parsing Data from Match ID

In [2]:
boots = [1001,3006,3009,3024,3047,3158] # IDs for Boots
CHECK = ['totalGold','level','xp','minionsKilled','jungleMinionsKilled'] # Features to record from a Frame
CHAMPIONCHECK = ['attackDamage','armor','magicResist','healthMax','healthRegen','powerMax','lifesteal']# Features to record from a Frame

In [12]:
def match_to_frame(matchid, championdf=championdf, enemydf=enemydf, boots=[1001, 3006, 3009, 3024, 3047, 3158]):
    '''
    Match to dataframe with hard-coded columns.
    '''
    m = watcher.match.by_id('americas',matchid)
    mt = watcher.match.timeline_by_match('americas',matchid)
    
    # match info dataframe
    mdf = pd.DataFrame(m['info']['participants'])[['championId', 'championName', 'firstBloodAssist',
                                                   'firstBloodKill', 'individualPosition', 'summoner1Id', 'summoner2Id', 'participantId',
                                                  'perks']]
    enemyId = mdf.championId[5:10].tolist() + mdf.championId[0:5].tolist()
    mdf['enemyId'] = enemyId
    enemyName = mdf.championName[5:10].tolist(
    ) + mdf.championName[0:5].tolist()
    mdf['enemyName'] = enemyName
    mdf = mdf.merge(championdf, 'inner', on='championName')
    mdf = mdf.merge(enemydf, 'inner', on='enemyName')
    mdf['matchId'] = matchid

    # purchase dataframe
    purchases = []
    for i in range(2, 10): #Frame 2 is scanned 2:00 I think. First 2 Frame skipped since our "First purchase" don't happen before 2:00.
        f = mt['info']['frames'][i]
        events = f['events']
        stats = f['participantFrames']

        for e in events:
            if e['type'] == 'ITEM_PURCHASED':
                buyer = e['participantId']
                e2 = e
                for c in CHECK:
                    e2[c] = stats['{}'.format(buyer)][c]
                for c in CHAMPIONCHECK:
                    e2[c] = stats['{}'.format(buyer)]['championStats'][c]
                purchases.append(e2)
                
    pdf = pd.DataFrame(purchases)

    pdf['minute'] = pdf.timestamp // 60000
    # https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby
    idx = pdf.groupby(['participantId']).minute.transform(min) == pdf.minute
    pdf2 = pdf[idx]
    pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)

    pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)
    pdf3 = pdf2[['participantId','minute','label']+ CHECK + CHAMPIONCHECK].drop_duplicates()

    # Merging Purchase dataframe and match info dataframe
    results = mdf.merge(pdf3, 'inner', on='participantId')
    
    return results

In [51]:
match_to_frame('NA1_4218868152')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['isboots'] = pdf2.itemId.apply(lambda x: x in boots)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pdf2['label'] = pdf2.groupby('participantId').isboots.transform(any)


Unnamed: 0,championId,championName,firstBloodAssist,firstBloodKill,individualPosition,summoner1Id,summoner2Id,participantId,perks,enemyId,...,xp,minionsKilled,jungleMinionsKilled,attackDamage,armor,magicResist,healthMax,healthRegen,powerMax,lifesteal
0,82,Mordekaiser,False,False,TOP,14,4,1,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",6,...,1361,17,0,70,49,34,888,13,266,0
1,60,Elise,True,False,JUNGLE,4,11,2,"{'statPerks': {'defense': 5003, 'flex': 5008, ...",163,...,1039,0,16,59,32,38,732,12,397,0
2,246,Qiyana,False,True,MIDDLE,4,14,3,"{'statPerks': {'defense': 5003, 'flex': 5008, ...",45,...,992,9,0,112,33,41,682,14,393,0
3,145,Kaisa,False,False,BOTTOM,4,3,4,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",22,...,1878,42,0,96,43,31,871,10,852,0
4,16,Soraka,False,False,UTILITY,7,4,5,"{'statPerks': {'defense': 5001, 'flex': 5002, ...",111,...,1160,0,0,56,46,31,899,7,665,0
5,6,Urgot,False,False,TOP,12,4,6,"{'statPerks': {'defense': 5003, 'flex': 5008, ...",82,...,1327,20,0,82,45,42,793,18,441,0
6,163,Taliyah,False,False,JUNGLE,11,4,7,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",60,...,1726,0,32,68,36,31,850,18,517,0
7,45,Veigar,False,False,MIDDLE,12,4,8,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",246,...,1267,21,0,58,37,33,787,15,648,0
8,22,Ashe,False,False,BOTTOM,4,7,9,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",145,...,1127,31,0,88,37,30,698,8,377,1
9,111,Nautilus,False,False,UTILITY,4,14,10,"{'statPerks': {'defense': 5002, 'flex': 5003, ...",16,...,921,5,0,65,50,41,732,23,469,0


In [4]:
# Basically, the function above extract our needed info from this huge JSON-like string and wrangle it into
# a neater dataframe we see above.

m1 = watcher.match.timeline_by_match('americas','NA1_4218868152')
m1

{'metadata': {'dataVersion': '2',
  'matchId': 'NA1_4218868152',
  'participants': ['jBWxWzzZAPxSS8CpKyP2cT5KwJD2L_bgBgeYJYBVqgnzjTj0uj0IEbEk4eJ7RaclT7c0_pxVUQwbxg',
   'CeYkE5Ez8LtK4WyLQqOSwh5xCdSAsckVfyMqtqiVP_Mmmt4ivEza_jbiUSvK4AWHbgpOzaoKj4ueXA',
   'ZFrP14_AdGdEuj8HZoGkI-PEBO3hZYY09wPwBHW4AHCaS70wCK69bfym80_cJ-x7vHD-ungHqx-s4A',
   'tA8i3uahnetx_HFwvg1WsQ6XNp0Xxf-b0cFcwqG6ZHRJHAOsF9ZGI3ieuhKp9X0L50KKVsaXn-n6pw',
   'rixBKOrRHGO3yo5l2t0jMoODUV8ISZBDkvXLWTfqdejUn5uAmrfVw4vkThlcw_m1qz2TEG6HABh3rQ',
   'VC2iSQMSU30gcgAgA7_VqEVZ443AmlCjqBZ8smzNobuI43-RwD4ZiRhLXJb1aFSCssuP172pOtBJRA',
   'HFx0RkoUmlFkCCTKvcc6cne-9Z93JJKb2C8W26h27U8WLB2e4KK5Lm6LGVzv7I4FWmdcMPdeaCYH-A',
   'Amis3p7XxAjnLx-l-1KBZLNThB7oGypVA4mqidEa0FV5Y8-r-geJpxM8p_hBNPPO-W6UHp_EZegP7Q',
   'epskaCG_2D6JpONAvu-dtjPoHtFcvia5QDuvZKBkrFBgPXHiMe4C6S1nPuMAP4FB9drgcbELENPfZg',
   'XBACzZhcoz13yHM47FExTIsA5HRM0Rfdc1n-_fkAOIbNQ3o573EgOej0lVxnBe0RkK8HXIXl3rnzBg']},
 'info': {'frameInterval': 60000,
  'frames': [{'events': [{'realTi

In [57]:
m1['info']['frames'][1]['participantFrames'] #Player status each frame

{'1': {'championStats': {'abilityHaste': 0,
   'abilityPower': 24,
   'armor': 43,
   'armorPen': 0,
   'armorPenPercent': 0,
   'attackDamage': 61,
   'attackSpeed': 110,
   'bonusArmorPenPercent': 0,
   'bonusMagicPenPercent': 0,
   'ccReduction': 14,
   'cooldownReduction': 0,
   'health': 645,
   'healthMax': 645,
   'healthRegen': 10,
   'lifesteal': 0,
   'magicPen': 0,
   'magicPenPercent': 0,
   'magicResist': 32,
   'movementSpeed': 335,
   'omnivamp': 0,
   'physicalVamp': 0,
   'power': 0,
   'powerMax': 100,
   'powerRegen': 0,
   'spellVamp': 0},
  'currentGold': 0,
  'damageStats': {'magicDamageDone': 0,
   'magicDamageDoneToChampions': 0,
   'magicDamageTaken': 0,
   'physicalDamageDone': 0,
   'physicalDamageDoneToChampions': 0,
   'physicalDamageTaken': 0,
   'totalDamageDone': 0,
   'totalDamageDoneToChampions': 0,
   'totalDamageTaken': 0,
   'trueDamageDone': 0,
   'trueDamageDoneToChampions': 0,
   'trueDamageTaken': 0},
  'goldPerSecond': 0,
  'jungleMinionsKilled

## Cleaning and Preprocessing
### Raw Data Overview

In [1]:
import numpy as np
import pandas as pd
data = pd.read_csv('loldatafull.csv',index_col = 0)
data.shape

(119046, 29)

In [2]:
# Balance on Label
data.label.value_counts()

False    81283
True     37763
Name: label, dtype: int64

In [3]:
data.head()

Unnamed: 0_level_0,championName,firstBloodAssist,firstBloodKill,individualPosition,summoner1Id,summoner2Id,participantId,perks,enemyId,enemyName,...,xp,minionsKilled,jungleMinionsKilled,attackDamage,armor,magicResist,healthMax,healthRegen,powerMax,lifesteal
championId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
85,Kennen,True,False,TOP,12,4,1,"{'statPerks': {'defense': 5003, 'flex': 5008, ...",17,Teemo,...,1572,24,0,56,37,39,951,13,200,0
245,Ekko,False,False,JUNGLE,4,11,2,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",120,Hecarim,...,320,0,4,60,40,32,646,19,330,0
157,Yasuo,False,False,MIDDLE,14,4,3,"{'statPerks': {'defense': 5003, 'flex': 5008, ...",7,Leblanc,...,2655,46,0,77,43,44,913,32,145,0
119,Draven,False,True,BOTTOM,4,7,4,"{'statPerks': {'defense': 5002, 'flex': 5008, ...",498,Xayah,...,536,13,0,85,37,30,749,8,413,0
12,Alistar,False,False,UTILITY,4,14,5,"{'statPerks': {'defense': 5001, 'flex': 5002, ...",350,Yuumi,...,566,3,0,67,52,32,728,22,378,0


List of Features.

Col 7 "perks" is entire mastery setup (Keystone, Primary, Secondary, Steriods).  

Col 16 "label" is what we try to predict. 

Col 15 - 28 are live stats from the minute they made their first purchase.

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 119046 entries, 85 to 43
Data columns (total 29 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   championName         119046 non-null  object
 1   firstBloodAssist     119046 non-null  bool  
 2   firstBloodKill       119046 non-null  bool  
 3   individualPosition   119046 non-null  object
 4   summoner1Id          119046 non-null  int64 
 5   summoner2Id          119046 non-null  int64 
 6   participantId        119046 non-null  int64 
 7   perks                119046 non-null  object
 8   enemyId              119046 non-null  int64 
 9   enemyName            119046 non-null  object
 10  cmovespeed           119046 non-null  int64 
 11  cattackrange         119046 non-null  int64 
 12  emovespeed           119046 non-null  int64 
 13  eattackrange         119046 non-null  int64 
 14  matchId              119046 non-null  object
 15  minute               119046 non-null 

### 'Missing' Data Handling
We're supposed to have 120,000 players from 12,000 matches, but only have 119046. My data gathering logs show that for some matches we got >10 or <10 players in them. Such matches are of questionable quality and player boots buying decisions there probably couldn't be explained well by our model. For example, if my teammates failed to connect (having less than 10 players in total), then I won't play seriously since I'm almost guaranteed to lose. If the API said there's more than 10 players in this match, there's definitely something wrong on the game or on the API, and data gathered there can't be trusted.

Since we don't want such matches to disturb our model we're dropping rows where the match that the row is part of does not have exactly 10 players. Dropping them seems the only possible option, since we can't fill entire rows and the 'survived' rows are possibly polluted.

In [5]:
goodmatch = data.matchId.value_counts().index[data.matchId.value_counts() == 10]
print(goodmatch.shape)
data = data[data.matchId.apply(lambda x: x in goodmatch)]
data.shape

(11238,)


(112380, 29)

In [6]:
# Dropping features we don't use
use = data.drop([ 'participantId','enemyId','matchId'],axis = 1)
use.shape

(112380, 26)

### Balancing

In [7]:
ntrue = use.label.value_counts()[1]
print(ntrue)
sampledfalse = use[use.label == False].sample(n = ntrue, random_state = 42)
print(sampledfalse.shape)

35557
(35557, 26)


In [8]:
balanced = pd.concat([use[use.label == True],sampledfalse],axis = 0)
print(balanced.shape)
print(balanced.label.value_counts())

(71114, 26)
True     35557
False    35557
Name: label, dtype: int64


### Feature Representation
We have some categorical feature that needs to be coded. I'm using one hot encoder here.

'Perks', a pre-match setup, is a multi-level nested json-like string that took some cleaning to break down into 9 features (before ohe).

'Summoner1Id' and 'Summoner2Id' are two summoner spells that a player chooses to bring, also a pre-game setup. Interesting part is the order of them doesn't matter. For example, say there are 3 summoner spells: Flash, Ignite, Ghost. ['Flash','Ignite'] should be represented the same way as ['Ignite','Flash']. You still have the same 2 spells under your sleeve. I implemented this characteristic in my representation.

In [16]:
#What I'm working with
balanced.perks.iloc[0]

"{'statPerks': {'defense': 5003, 'flex': 5008, 'offense': 5008}, 'styles': [{'description': 'primaryStyle', 'selections': [{'perk': 8214, 'var1': 1666, 'var2': 0, 'var3': 0}, {'perk': 8275, 'var1': 7, 'var2': 0, 'var3': 0}, {'perk': 8210, 'var1': 12, 'var2': 0, 'var3': 0}, {'perk': 8237, 'var1': 629, 'var2': 0, 'var3': 0}], 'style': 8200}, {'description': 'subStyle', 'selections': [{'perk': 8139, 'var1': 1126, 'var2': 0, 'var3': 0}, {'perk': 8135, 'var1': 2190, 'var2': 5, 'var3': 0}], 'style': 8100}]}"

In [17]:
import json
def perk2dict(s):
    '''
    Replaces single quote with double quote to meet json requirements, then convert perk strings into
    json-style dictionary.

    perks: string of perks from lol API.
    '''
    s = s.replace("'",'"')
    dict = json.loads(s)
    return dict

In [18]:
balanced['perks'] = balanced.perks.apply(perk2dict)

In [19]:
# 3 statperks
stats3s = pd.json_normalize(balanced.perks)
# Other perks nested deeper
styles = pd.json_normalize(stats3s.styles)
primary = pd.json_normalize(pd.json_normalize(styles[0]).selections).applymap(lambda x: x.get('perk'))
sub = pd.json_normalize(pd.json_normalize(styles[1]).selections).applymap(lambda x: x.get('perk'))

In [22]:
# df of perks
mastery = pd.concat([stats3s[stats3s.columns[1:4]],primary,sub],axis = 1)
mastery.columns = mastery.columns[0:3].tolist() + ['keystone','p1','p2','p3','s1','s2']
mastery.index = balanced.index #Lost index during json_normalize, adding it back
mastery.head()

Unnamed: 0_level_0,statPerks.defense,statPerks.flex,statPerks.offense,keystone,p1,p2,p3,s1,s2
championId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
85,5003,5008,5008,8214,8275,8210,8237,8139,8135
157,5003,5008,5005,8008,9111,9104,8014,8444,8242
17,5001,5008,5005,8214,8226,8233,8236,8139,8135
120,5001,5008,5005,8124,8143,8138,8134,8234,8232
60,5002,5008,5005,8005,9111,9104,8014,8126,8134


In [23]:
# Converting IDs to Names using dict for better interpretation
ssd = pd.read_csv('ssd.csv',index_col=0,squeeze=True) #From Riot API
ssd = ssd.to_dict()
runesd = pd.read_csv('runesd.csv',index_col=0,squeeze=True) 
runesd = runesd.to_dict() # dict for perks
mastery = mastery.replace(runesd)
mastery.head()

Unnamed: 0_level_0,statPerks.defense,statPerks.flex,statPerks.offense,keystone,p1,p2,p3,s1,s2
championId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
85,Magic Resist,Adaptive Force,Adaptive Force,SummonAery,NimbusCloak,Transcendence,Scorch,TasteOfBlood,RavenousHunter
157,Magic Resist,Adaptive Force,Attack Speed,LethalTempo,Triumph,LegendAlacrity,CoupDeGrace,SecondWind,Unflinching
17,Scaling Health,Adaptive Force,Attack Speed,SummonAery,ManaflowBand,AbsoluteFocus,GatheringStorm,TasteOfBlood,RavenousHunter
120,Scaling Health,Adaptive Force,Attack Speed,Predator,SuddenImpact,EyeballCollection,IngeniousHunter,Celerity,Waterwalking
60,Armor,Adaptive Force,Attack Speed,PressTheAttack,Triumph,LegendAlacrity,CoupDeGrace,CheapShot,IngeniousHunter


In [24]:
# perks ready for ohe, adding it to data
balanced = pd.concat([balanced,mastery], axis = 1)
balanced = balanced.drop('perks',axis = 1)


In [25]:
#Preprocessing summoner spells

balanced.summoner1Id = balanced.summoner1Id.replace(ssd)
balanced.summoner2Id = balanced.summoner2Id.replace(ssd)
summids = balanced.summoner1Id.unique().tolist()
print(summids)

['SummonerTeleport', 'SummonerDot', 'SummonerHaste', 'SummonerFlash', 'SummonerHeal', 'SummonerSmite', 'SummonerBoost', 'SummonerExhaust', 'SummonerBarrier']


In [26]:
# Combining two columns into 1
combined = balanced[['summoner1Id','summoner2Id']].values
combined

array([['SummonerTeleport', 'SummonerFlash'],
       ['SummonerDot', 'SummonerFlash'],
       ['SummonerDot', 'SummonerFlash'],
       ...,
       ['SummonerFlash', 'SummonerDot'],
       ['SummonerTeleport', 'SummonerFlash'],
       ['SummonerFlash', 'SummonerHeal']], dtype=object)

In [27]:
# Creating a Boolean Series for each summoner spell
for i in summids:
    result = []
    for c in combined:
        result.append(i in c)
    balanced[i] = result

In [28]:
#Check rightmost columns for newly added summoner spell data. Already ohe-ed.
balanced = balanced.drop(['summoner1Id','summoner2Id'],axis = 1)
balanced.head()

Unnamed: 0_level_0,championName,firstBloodAssist,firstBloodKill,individualPosition,enemyName,cmovespeed,cattackrange,emovespeed,eattackrange,minute,...,s2,SummonerTeleport,SummonerDot,SummonerHaste,SummonerFlash,SummonerHeal,SummonerSmite,SummonerBoost,SummonerExhaust,SummonerBarrier
championId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
85,Kennen,True,False,TOP,Teemo,335,550,330,500,4,...,RavenousHunter,True,False,False,True,False,False,False,False,False
157,Yasuo,False,False,MIDDLE,Leblanc,345,175,340,525,5,...,Unflinching,False,True,False,True,False,False,False,False,False
17,Teemo,False,False,TOP,Kennen,330,500,335,550,3,...,RavenousHunter,False,True,False,True,False,False,False,False,False
120,Hecarim,False,False,JUNGLE,Ekko,345,175,340,125,3,...,Waterwalking,False,False,True,False,False,True,False,False,False
60,Elise,False,True,JUNGLE,Shaco,330,550,345,125,2,...,IngeniousHunter,False,False,False,True,False,True,False,False,False


In [30]:
# Feature table before ohe
y = balanced.label
X = balanced.drop('label',axis = 1)
X.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 71114 entries, 85 to 202
Data columns (total 40 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   championName         71114 non-null  object
 1   firstBloodAssist     71114 non-null  bool  
 2   firstBloodKill       71114 non-null  bool  
 3   individualPosition   71114 non-null  object
 4   enemyName            71114 non-null  object
 5   cmovespeed           71114 non-null  int64 
 6   cattackrange         71114 non-null  int64 
 7   emovespeed           71114 non-null  int64 
 8   eattackrange         71114 non-null  int64 
 9   minute               71114 non-null  int64 
 10  totalGold            71114 non-null  int64 
 11  level                71114 non-null  int64 
 12  xp                   71114 non-null  int64 
 13  minionsKilled        71114 non-null  int64 
 14  jungleMinionsKilled  71114 non-null  int64 
 15  attackDamage         71114 non-null  int64 
 16  armor

In [31]:
#ohe all categorical features
X = pd.get_dummies(X)
X.shape

(71114, 508)

After this, we can do train-test split etc., and start modelling.