<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Overview" data-toc-modified-id="Overview-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Overview</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Feature-calculation" data-toc-modified-id="Feature-calculation-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Feature calculation</a></span></li><li><span><a href="#Saving-features" data-toc-modified-id="Saving-features-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Saving features</a></span></li></ul></div>

# Overview

In this step of the pipeline, we **engineer the features** used for the predictive model, using data obtained from the earlier ingestion, aggregation, and consolidation steps of the pipeline.

The predictive model that we are building is a **binary classifier** to predict `y = winners` (i.e., whether blue team wins or loses).

We identify and calculate four features that are deemed to be predictive of the winning team: `total_gold`, `kills`, `buildings`, and `monsters`.

**Engineered features used in model**

| Field | Description | Dimension | Values |
| --- | --- | --- |
|`winners` (y)           |Match winner|scalar|1: Blue wins, 0: red wins
|`total_gold` (x)       |Total accumulated gold (total over team)|(`num_frames`,3)|integer 
|`kills` (x)             |Number of opponent players killed by each team|(`num_frames`,3)|integer
|`buildings` (x)         |Number of buildings destroyed by each team (5 kinds of buildings)|(`num_frames`,3,5)| integer
|`monsters` (x)          |Number of elite monsters killed by each team (7 kinds of monsters)|(`num_frames`,3,7)| integer

*Note: the second dimension is 3 for `total_gold`, `kills`, `buildings`, and `monsters` because these fields are calculated for (1) the Blue team, (2) the Red team, and (3) the difference between teams.*

**Other fields**

| Field | Description |
| --- | --- | --- |
|`current_gold`      |Current level of gold (total over team)|
|`xp`                |Total experience (total over team)|
|`max_current_gold`  |Current level of gold (max of single player on team)|
|`max_total_gold`    |Total accumulated gold (max of single player on team)|
|`max_xp`            |Total experience (max of single player on team)|
|`champions`         |Champions (characters) selected by each player|
|`num_frames`        |Number of frames in match; equivalently, number of minutes in match|
|`match_ids`         |Unique 10-digit match identifier|
|`team_stats`        |Identifies which team was first to certain achievements (e.g., first kill)
|`duration`          |Duration of match|
|`player_tiers`      |Skill level of player|
|`versions`          |Version of game|
|`game_types`        |Type of game (custom, tutorial, matched)|

# Imports

In [113]:
import numpy as np
np.set_printoptions(suppress=True)

import warnings
warnings.filterwarnings('ignore')

from lib import feature_calculators, utils, match_factory
from importlib import reload
reload(feature_calculators);
reload(utils);
reload(match_factory);

LOG_LEVEL = 'Off'

# Feature calculation

**Loading match ids**. We load the unique identifiers for matches comprising the consolidated dataset

In [114]:
folder = './dat_challenger_only/'

match_ids = utils.get_match_ids(folder)

utils.log(LOG_LEVEL, str(len(match_ids)) + ' files')

**Loading matches**. Load each match and calculate features for the given match

In [115]:
matches = []
for i, match_id in enumerate(match_ids):
    if i % 500 == 0: utils.log(LOG_LEVEL, str(i))
    match = match_factory.load(match_id)
    if match: matches.append(match)

**Reshaping data**. Reshape the data to get it into a format that is more easy-to-use in the analysis step

In [116]:
features = match_factory.transpose_matches_to_features(matches)
tensor_features = feature_calculators.calculate_tensor_features(features)
for feature in tensor_features:
    print(feature)

winners
current_gold
total_gold
xp
max_current_gold
max_total_gold
max_xp
champions
kills
buildings
monsters
num_frames
team_stats
duration
versions
game_types
player_tiers


# Saving features

In [117]:
fout = 'lolwinner_data.npz'

np.savez(fout, **tensor_features)