<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Overview" data-toc-modified-id="Overview-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Overview</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Loading-data" data-toc-modified-id="Loading-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Loading data</a></span></li><li><span><a href="#Feature-selection-/-engineering" data-toc-modified-id="Feature-selection-/-engineering-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Feature selection / engineering</a></span></li><li><span><a href="#Processing-features" data-toc-modified-id="Processing-features-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Processing features</a></span></li><li><span><a href="#Saving-data" data-toc-modified-id="Saving-data-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Saving data</a></span></li></ul></div>

# Overview

This file creates the features used for LoLwinner, using data scraped using the official League of Legends API

**Table of features** (TBC)

| Feature | Description | Dimension | Values |
| --- | --- | --- |
|`winners`           |Match winner|scalar|1: Blue wins, 0: red wins
|`current_gold`      ||
|`total_gold`        ||
|`xp`                ||
|`max_current_gold`  ||
|`max_total_gold`    ||
|`max_xp`            ||
|`champions`         |Champions (characters) selected by each player|Champion_ID integer
|`kills`             |Number of opponents killed by each team|
|`buildings`         ||
|`monsters`          ||
|`num_frames`        ||
|`match_ids`         ||
|`team_stats`        ||
|`duration`          ||
|`player_tiers`      ||
|`versions`          ||
|`game_types`        ||

# Imports

In [14]:
import glob
import json
import numpy as np
np.set_printoptions(suppress=True)

import warnings
warnings.filterwarnings('ignore')

# helper functions for extraction features
from lib import feature_extraction_helper

from importlib import reload
reload(feature_extraction_helper)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


<module 'lib.feature_extraction_helper' from '/Users/ccl/anaconda3/envs/LoL/lib/feature_extraction_helper.py'>

# Loading data

Loading the timeline data for *expert* ("challenger") level matches. This represents ~7.5K matches out of the downloaded set of ~55K.

Note that every match has two files associated with it:

1. **timeline** file:
    - Contains: Minute-level in-game data
    - Further details: [see documentation here](https://developer.riotgames.com/api-methods/#match-v3/GET_getMatchTimeline)  
<br/>
2. **match** file:
    - Contains: Metadata and other data about match (e.g., game mode, game map)
    - Further details: [see documentation here](https://developer.riotgames.com/api-methods/#match-v3/GET_getMatch)

In [2]:
# load the names of all timeline files

timeline_filenames_format = './dat_challenger_only/*timeline*'

timeline_filenames = glob.glob(timeline_filenames_format)

print(len(timeline_filenames), 'files')

7539 files


# Feature selection / engineering

Several features are selected / engineered using the **timeline** and **match** files.

The `features` hashtable stores features for each match, such that each `feature` in `features` has dimensions of

`(number of matches) x (dimension of feature)`

In [3]:
features = {
    'winners'           : [],
    'current_gold'      : [],
    'total_gold'        : [],
    'xp'                : [],
    'max_current_gold'  : [],
    'max_total_gold'    : [],
    'max_xp'            : [],
    'champions'         : [],
    'kills'             : [],
    'buildings'         : [],
    'monsters'          : [],
    'num_frames'        : [],
    'match_ids'         : [],
    'team_stats'        : [],
    'duration'          : [],
    'player_tiers'      : [],
    'versions'          : [],
    'game_types'        : []
}

In [4]:
for i, timeline_filename in enumerate(timeline_filenames):

    if i % 500 == 0: print(i)

    match_filename = timeline_filename.replace('timeline', 'match')

    with open(timeline_filename, 'r') as timeline_file, open(
            match_filename, 'r') as match_file:

        match_json = json.load(match_file)

        if not feature_extraction_helper.valid_match(match_json): continue

        timeline_json = json.load(timeline_file)

        try:
            team_current_gold, team_total_gold, team_xp, team_max_current_gold, team_max_total_gold, team_max_xp, frames = feature_extraction_helper.get_dat(
                timeline_json)
            current_tiers = [
                match_json['participants'][i]['highestAchievedSeasonTier']
                for i in range(10)
            ]
        except:
            continue

        features['winners'].append(
            (match_json['teams'][0]['win'] == 'Fail') * 1)
        features['current_gold'].append(team_current_gold)
        features['total_gold'].append(team_total_gold)
        features['xp'].append(team_xp)
        features['max_current_gold'].append(team_max_current_gold)
        features['max_total_gold'].append(team_max_total_gold)
        features['max_xp'].append(team_max_xp)
        features['champions'].append(
            feature_extraction_helper.get_champion_ids(match_json, 10))
        features['kills'].append(
            feature_extraction_helper.get_kills_by_match(timeline_json))
        features['buildings'].append(
            feature_extraction_helper.get_buildings_by_match(timeline_json))
        features['monsters'].append(
            feature_extraction_helper.get_monsters_by_match(timeline_json))
        features['num_frames'].append(frames)
        features['match_ids'].append(match_json['gameId'])
        features['team_stats'].append(
            feature_extraction_helper.get_team_stats(match_json))
        features['duration'].append(match_json['gameDuration'])
        features['player_tiers'].append(current_tiers)
        features['versions'].append(match_json['gameVersion'])
        features['game_types'].append(match_json['gameType'])

0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500


# Processing features

We pad the data tensors with zeros, to simplify our analysis

In [5]:
num_matches = len(features['winners'])
max_frames = max(features['num_frames'])

print(num_matches, 'valid matches')
print(max_frames, 'max frames')

7302 valid matches
72 max frames


In [15]:
tensor_features = {}

tensor_features['winners'] = np.array(features['winners'])
tensor_features['current_gold'] = feature_extraction_helper.convert_to_tensor(
   features['current_gold'], num_matches, max_frames)
tensor_features['total_gold'] = feature_extraction_helper.convert_to_tensor(
    features['total_gold'], num_matches, max_frames)
tensor_features['xp'] = feature_extraction_helper.convert_to_tensor(features['xp'], num_matches,
                                          max_frames)
tensor_features['max_current_gold'] = feature_extraction_helper.convert_to_tensor(
    features['max_current_gold'], num_matches, max_frames)
tensor_features['max_total_gold'] = feature_extraction_helper.convert_to_tensor(
    features['max_total_gold'], num_matches, max_frames)
tensor_features['max_xp'] = feature_extraction_helper.convert_to_tensor(features['max_xp'], num_matches,
                                              max_frames)
tensor_features['champions'] = np.array(features['champions'])
tensor_features['kills'] = feature_extraction_helper.convert_to_tensor(features['kills'], num_matches,
                                             max_frames)
tensor_features['buildings'] = feature_extraction_helper.convert_to_tensor(
    features['buildings'], num_matches, max_frames, 3)
tensor_features['monsters'] = feature_extraction_helper.convert_to_tensor(
    features['monsters'], num_matches, max_frames, 3)
tensor_features['num_frames'] = np.array(features['num_frames'])
tensor_features['team_stats'] = np.array(features['team_stats'])
tensor_features['duration'] = np.array(features['duration'])
tensor_features['versions'] = np.array(features['versions'])
tensor_features['game_types'] = np.array(features['game_types'])

map_player_tiers = {
    'UNRANKED': 0,
    'BRONZE': 1,
    'SILVER': 2,
    'GOLD': 3,
    'PLATINUM': 4,
    'DIAMOND': 5,
    'MASTER': 6,
    'CHALLENGER': 7
}
tensor_features['player_tiers'] = np.vectorize(map_player_tiers.get)(np.array(
    features['player_tiers']))

**Outputting dimensions of feature tensor.**

Dimensions are of the form

`(N, M_1, M_2, ...)`

where

`N` is number of matches and

`(M_1, M_2, ...)` is dimensions of the feature

(For scalar features, dimension is `(N,)`

In [21]:
for feature in tensor_features.keys():
    print('{:<20} {:<12}'.format(feature, str(tensor_features[feature].shape)))

winners              (7302,)     
current_gold         (7302, 72, 3)
total_gold           (7302, 72, 3)
xp                   (7302, 72, 3)
max_current_gold     (7302, 72, 3)
max_total_gold       (7302, 72, 3)
max_xp               (7302, 72, 3)
champions            (7302, 10)  
kills                (7302, 72, 3)
buildings            (7302, 72, 3, 5)
monsters             (7302, 72, 3, 7)
num_frames           (7302,)     
team_stats           (7302, 6)   
duration             (7302,)     
versions             (7302,)     
game_types           (7302,)     
player_tiers         (7302, 10)  


# Saving data

In [34]:
fout = 'lolwinner_data.npz'

np.savez(fout, **tensor_features)