#WomboCombo: Optimizing Team Composition in League of Legends
### by Data of Legends (Sean Cha, Jaemin Cheun, Kenny Lei, Andrew Paek) 
###Assigned TF: Léandra King

<img src="http://36646d87786feafc0611-0338bbbce19fc98919c6293def4c5554.r0.cf1.rackcdn.com/images/FiGZ9r3D3E82.878x0.Z-Z96KYq.jpg" width=610 height=347/>

#Table of Contents

* [WomboCombo: Optimizing Team Composition in League of Legends](#WomboCombo: Optimizing Team Composition in League of Legends)
	* [Introduction](#Introduction)
		* [Different Roles](#Different Roles)
        * [Objectives](#Objectives)
	* [Part I: Data Retrieval & Cleaning](#Part-I-Data-Retrieval-&-Cleaning)
		* [League of Legends API](#LoL-API)
        * [Pulling and Saving JSON Files](#JSON-Files)
		* [Pre-processing Data](#pre-processing)
		* [Feature Selection](#feature-selection)
    * [Part II: Exploratory Data Analysis](#EDA)
		* [Making baseline predictions](#baseline-predictions)
			* [1.3 Make the prediction from the `baseline` model](#1.3-Make-the-prediction-from-the-baseline-model)
    * [Part III: Clustering Algorithms](#clustering)
		* [Making baseline predictions](#Making-baseline-predictions)
			* [1.3 Make the prediction from the `baseline` model](#1.3-Make-the-prediction-from-the-baseline-model)        

##Introduction <a id='Introduction'></a>

League of Legends (LoL) is the most popular game in the world. LoL is a fast-paced, competitive online game that blends the speed and intensity of an RTS with RPG elements. Two teams of 5 powerful champions, each with a unique design and playstyle, battle head-to-head across multiple battlefields and game modes. With an ever-expanding roster of champions, frequent updates and a thriving tournament scene, League of Legends offers endless replayability for players of every skill level.

The main objective of the game is to knock down the enemies' towers using the champions, while the enemy team is trying to do the same. Towers are difficult to destroy with the enemy defending them, so most of the game revolves around killing enemy champions in order to buy the necessary time to bring down enemy towers.

Team composition is extremely important in the game. Each champion out of the roster of 128 (and growing!) carries its own unique abilities, utility, and play style that it can contribute to the team. Simply put, a team of 5 champions that share similar roles in the game would be very ineffective against a well-balanced team. Let's look at some examples.

### Different Roles <a id='Different Roles'></a>
This is LeBlanc. Her primary role in the game is an assassin. She has an extremely low physical damage power (red bar) and low health (green bar). However, she has a HUGE ability power (blue bar), allowing her to use quick burst of ability power to kill enemies in a split second. She is extremely effective in assassinating enemy champions that are roaming around the map by themselves. Since her abilities have a very long cool-down (time for her to be able to cast spells again), she is in deep trouble if she cannot kill the enemy champion the first time around. In a team fight, LeBlanc focuses on killing high priority enemy champions. This is an old video from a few seasons back, but it really demonstrates what LeBlanc is capable of: https://www.youtube.com/watch?v=lsBJEvwi66k.
<img src=http://i.imgur.com/G5jbBpq.png>

This is Malphite. His primary role is a tank. He has high health and defensive statistics, allowing him to last a long time in a team fight. Although he cannot quickly kill enemy champions the way LeBlanc does, he is able to follow around and harrass/interrupt high-priority enemy champions in their attempts to kill ours. He is also an "utility" champion, meaning that he has various "crowd-controlling" abilities that slow down and knocks up enemy champions, making it easier for our champions to target them. This is an example of Malphite's specialty in initiating a team fight and tankiness: https://youtu.be/pE3aLnoB7oA?t=17m53s.
<img src=http://i.imgur.com/POWWt9u.png>

These are only 2 of the 128 champions that players can choose to play. As you can see, LeBlanc and Malphite have very distinctly different roles in the game. It would be a disaster for a team to have 5 "tank" champions, since tanky champions, despite their resilience, cannot deal enough damage to kill enemy champions. A team of only assassins will also have no luck in winning; without a reliable tanker to take the frontline and disrupt the enemy, the LeBlancs will quickly "evaporate to death."

This is where team composition comes in. A team that has both a Malphite and a LeBlanc will be extremely effective. While Malphite slows down enemies and takes all the damages, LeBlanc can focus on dealing massive magic damage to the enemy champions. Other notable roles include that of a fighter (a champion that can deal consistent damage over time while taking moderate damage), a support (a champion specializing in crowd control abilities to disable enemy champions and healing to protect our champions), a marksman (a range-attack champion who is extremely vulnerable but can deal a massive amount of damage from a distance), etc.

Certain set of champions complement each other really well, thus the coinage of the term, Wombo Combo. To look at some examples of Wombo Combos, please see: https://www.youtube.com/watch?v=TW6bdxnFcjs

###Objectives <a id='Objectives'></a>

We hope to gain insight on optimal team composition by analyzing match data.

First, we use Riot's League of Legends API (https://developer.riotgames.com/docs/rate-limiting) to download the necessary data. Then, we preprocess and clean the raw JSON data to a workable structure.

Next, we conduct Exploratory Data Analysis (EDA) to answer some basic questions regarding our data set. By visualizing and playing around with the data, we gain additional insight that will be helpful in later parts.

Then, we ... (WRITE ABOUT ALGORITHMS HERE)

In [2]:
import os
import json
import requests
import time
import datetime
import types
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

##Q1: Data Retrieval & Cleaning<a id='Part-I-Data-Retrieval-&-Cleaning'></a>

### League of Legends API <a id='LoL-API'></a>

Rate Limiting: The API allows one user to make 10 requests per 10 seconds and 500 requests per 10 minutes.

For documentation, see https://developer.riotgames.com/api/methods.

There are different "game modes" and "queue types" in League of Legends. We only want to look at the most popular type people play, which are "classic" 5v5 match on a map called Summoner's Rift. We also want to only look at "Ranked" games, which affect the player's scores and ranking within the gaming ecosystem. In Ranked games, people are motivated to put in their best efforts to advance to higher tier systems. Simply put, classic ranked game data will manifest champion selections and play styles that players themselves would find optimal. We want to capture this optimally conventionally play style and find patterns.

However...

API Limitations: The API does not allow us to specify what type of game mode or queue type we want to pull. There is no real distinguishable pattern within matchID's neither. We will have to pick a starting point (an arbitrary matchID), increment the ID by 1, pull the data, check to see whether it satisfies our requirements, and then decide to store it.

This process does take a long time. We have run the scripts to selectively pull data this way for days and nights on multiple machines.

In [3]:
# API KEY HERE
KEY = 'b00c992b-6fc1-4795-abcc-7db85c1b72fa' # API Key associated with Sean's LoL account (ch920425)

We define a function that makes a request for a match record by its matchID.

In [4]:
# MATCH-v2.2
REGION_ENDPOINT = "https://{0}.api.pvp.net/api/lol/{0}/"

# FUNCTION TO PULL MATCH DATA
def get_match(REGION, matchId, includeTimeline):
    """
    Retrieve match by match ID.
    """
    return requests.get(
        (REGION_ENDPOINT + "v2.2/match/{1}?"
         "api_key={2}&includeTimeline={3}").
        format(REGION, matchId, KEY, includeTimeline))

We establish constants that allow us to check whether we are within the API's rate limit. Also, our focus is on North American players.

We define several functions that allow us to filter out the data we want.

In [5]:
# Constants
REGION = 'na'                    # North American Server Data
STATUS_OK = 200                  # Shows that we have successfully pulled data
STATUS_RATE_LIMIT_EXCEEDED = 429 # Tells us that we have to wait 10 seconds before requesting again

# Only interested in 5v5 RANKED games
valid_rank_games = ['RANKED_SOLO_5x5',     # Ranked Solo 5v5 games
                    'RANKED_PREMADE_5x5',  # Ranked Premade 5v5 games
                    'RANKED_TEAM_5x5']     # Ranked Team 5v5 games

# Functions that check status of the data pulls
def isValid(match):
    if match.status_code != STATUS_OK:
        return False
    return True

# We only look at 'MATCHED_GAME' since 'CUSTOM_GAME' is where people try experimental champions and play styles
def isClassicMatch(match):
    j = match.json()
    if j['mapId'] == 11 and j['matchMode'] == "CLASSIC" and j['matchType'] == 'MATCHED_GAME':
        return True
    else:
        return False

# Checking if ranked game
def isRankMatch(match):
    j = match.json()
    if j['queueType'] in valid_rank_games:
        return True
    else:
        return False

# Check whether we went over the API Rate Limit                     
def rateLimitExceeded(match):
    if int(match.status_code) == STATUS_RATE_LIMIT_EXCEEDED:
        return True
    return False

### Pulling and Saving JSON Files <a id='JSON-Files'></a>
As mentioned above, matchID's do not have a defining pattern. For the most part, they seem to increase incrementally and similar game modes/types seem to be clustered around. Other times, they are long succession of empty numbers (404 status where there is no game stored for that matchID).

We first have a list of matchID's of our friends' recent match history. We will run a loop to increment over the first item in the list and keep retrieving data that meets our criteria. If the script returns status 429 (LIMITING RATE reached), it will wait 10 seconds before attempting to request data again.

If we have 30 consecutive "sleeps (5 min of failure to pull data), we assume we are in a "no man's land" of matchID's, where there are no games saved in these ID's. We then move onto the next starting_match_id in the match_id_list and restart the process.

The data is saved in JSON format one by one in '/dump' sub-directory. Each JSON file is named after the matchID.

Please do not run the script below since it has the potential to replace some of our existing data.

In [11]:
# We modified the code from https://github.com/LionTurtle/TeamCompML/

# List of starting MatchID's from recent gaming history of high-ranked friends
match_id_list = [2029016554, 2028998508, 2028697992, 2028410883, 2028610390]

# index for the while-loop
i = 0
# counting the number of data retrieved
n = 0
# counting how many times we sleep in case we hit a dead-end
sleep_counter = 0

while i < 100000:
    # break out of the loop when there are no more match_id's left
    if match_id_list == []:
        print "ran out of starting match ID's. ending the loop"
        break
    
    # incrementing loop index
    i += 1   
    current_match_id = match_id_list[0] + i
    m = get_match(REGION, current_match_id, True)
    
    if not rateLimitExceeded(m):
        if isValid(m) and isClassicMatch(m) and isRankMatch(m):
            f = open("dump/" + str(i + match_id_list[0])+".json", 'w+')
            f.write(m.text.encode('utf-8'))
            n += 1
            print 'n: %d, ts: %s, gamdID: %d, date: %s, season: %s, tier: %s, loop_index: %d' % (n, datetime.datetime.fromtimestamp(time.time()).strftime('%m-%d %H:%M:%S'), 
                                                                                                 current_match_id, 
                                                                                                 ('%s' % (time.strftime('%m-%d', time.localtime(m.json()['matchCreation']/1000)))),
                                                                                                 m.json()['season'], m.json()['participants'][1]['highestAchievedSeasonTier'], i)
            f.close()
            # reset sleep_counter once we starting getting good data again
            sleep_counter = 0
    else:
        sleep_counter += 1
        print("API rate limit reached - waiting 10 seconds, sleep count: %d, loop_index: %d") % (sleep_counter, i)
        time.sleep(10)
        
        # if we have to sleep 30 times, we will assume that we are in no man's land and move onto the next id on list
        if sleep_counter >= 30:
            del match_id_list[0]
            sleep_counter = 0
            i = 0
            print ("Seems like we hit a dead-end. Starting from the next match id on the list. new_starting_point: %d") % match_id_list[0]

### Pre-processing Data <a id='pre-processing'></a>

Let's take a look at what the JSON files look like. It has a bunch of metrics that we won't use. We will focus on the in-game stats broken down by participant (player) that give us a quantitative overview of the person's play style and performance. 

In [6]:
test_response = get_match('na', 2026610783, True).json()
print test_response

{u'queueType': u'RANKED_TEAM_5x5', u'matchVersion': u'5.23.0.239', u'platformId': u'NA1', u'season': u'PRESEASON2016', u'region': u'NA', u'matchId': 2026610783, u'mapId': 11, u'matchCreation': 1449099999881, u'teams': [{u'firstDragon': False, u'bans': [{u'pickTurn': 1, u'championId': 203}, {u'pickTurn': 3, u'championId': 41}, {u'pickTurn': 5, u'championId': 420}], u'firstInhibitor': False, u'baronKills': 0, u'firstRiftHerald': False, u'winner': True, u'firstBaron': False, u'riftHeraldKills': 1, u'firstBlood': True, u'teamId': 100, u'firstTower': False, u'vilemawKills': 0, u'inhibitorKills': 0, u'towerKills': 6, u'dominionVictoryScore': 0, u'dragonKills': 1}, {u'firstDragon': True, u'bans': [{u'pickTurn': 2, u'championId': 117}, {u'pickTurn': 4, u'championId': 16}, {u'pickTurn': 6, u'championId': 223}], u'firstInhibitor': False, u'baronKills': 0, u'firstRiftHerald': True, u'winner': False, u'firstBaron': False, u'riftHeraldKills': 1, u'firstBlood': False, u'teamId': 200, u'firstTower': 

Before we start making csv files of nicely organized data of the JSON files, we make a function that converts a championID to champion name. Our raw data denotes each champion by its championID, which is just a random number.

In [7]:
########## Champion ID to Champion Name converter ##########

# GET the static data on champion keys and names
static_champ_data = requests.get("https://global.api.pvp.net/api/lol/static-data/na/v1.2/champion?locale=en_US&api_key=%s"%KEY).json() 
champs = static_champ_data['data']
champ_names = static_champ_data['data'].keys()

# initializing index and empty dictionaries
i = 0
list_champs = []
list_champ_ids = []

# filling in the lists
for i in range(len(champ_names)):
    this_champ = champs[champ_names[i]]
    list_champs.append(this_champ['key'])
    list_champ_ids.append(this_champ['id'])
    
# champion ID and name dictionary    
champ_dict = dict(zip(list_champ_ids, list_champs))

# Wukong's name is not correct (MonkeyKing)
champ_dict[62] = 'Wukong'

# function that converts a champion ID to champion name
def champion_name(champion_id):
    if champion_id in champ_dict:
        return champ_dict[champion_id]
    else:
        return 'no_name'

Now that we have a bunch of JSON files. Let's put them into usable form. We will merge all the necessary data from the JSON files to a giant csv file for easy access. Each game has 10 players (thus champions) and thus will have 10 rows of data. Each game and its participants are identifiable by the match ID. 

This part takes a few minutes to run depending on how much data we have on hand.

In [19]:
# Helper functions to parse data into a csv file
def insertComma(fileName):
    fileName.write(' , ')

def writetoFile(fileName,data):
    fileName.write(str(data))
    insertComma(fileName)

#### Feature Selection

match_ID  
participant  
champ  
win  
kills  
assists  
deaths  
gold_earned  
p_damage_to_champs  
m_damage_to_champs  
max_multi_kill  
max_crit  
damage_taken  
no_killing_spree  
creep_score  
jungle_killed  
cc_dealt  
wards_placed  
total_heal  

In [21]:
DUMP_DIR = 'mini_dump/'
counter = 0
trainingData = open('unnormarlized_data.csv','w+')
smallConst = 1e-15   #Don't divide by zero
trainingData.write('match_ID, participant, champ, win, kills, assists, deaths, gold_earned, p_damage_to_champs, m_damage_to_champs, max_multi_kill, max_crit, damage_taken, no_killing_spree, creep_score, jungle_killed, cc_dealt, wards_placed, total_heal\n')        

In [22]:
for f in os.listdir(DUMP_DIR):
    if (counter % 1000) == 0:
        print "parsed %d files..." % counter
    
    if f.find('json') != -1:
        json_data = open(DUMP_DIR + f)
        data = json.load(json_data)
        json_data.close()

        if len(data['participants']) == 10:  #verify valid game with all players
            counter += 1
            teamKills = [smallConst, smallConst]
            teamAssists = [smallConst, smallConst]
            teamDeaths = [smallConst, smallConst]
            teamGoldEarned = [smallConst, smallConst]
            teamGoldSpent = [smallConst, smallConst]
            teamDamageDealt = [smallConst, smallConst]
            teamMagicDamageDealt = [smallConst, smallConst]
            teamPhysicalDamageDealt = [smallConst, smallConst]
            teamTotalDamageTaken = [smallConst, smallConst]
            teamMinionsKilled = [smallConst, smallConst]
            teamCrowdControl= [smallConst, smallConst]
            teamWards = [smallConst, smallConst]

            for i in range(0,10):
                teamID = 0 if data['participants'][i]['teamId'] == data['teams'][0]['teamId'] else 1
                stats = data['participants'][i]['stats']

                # General Info -- not used for clustering
                writetoFile(trainingData, data['matchId'])
                writetoFile(trainingData, data['participantIdentities'][i]['participantId'])
                writetoFile(trainingData, champ_dict[data['participants'][i]['championId']])
                
                # indicator for winner
                writetoFile(trainingData, int(data['teams'][teamID]['winner']))
                
                # KDA
                writetoFile(trainingData, stats['kills'])
                writetoFile(trainingData, stats['assists'])
                writetoFile(trainingData, stats['deaths'])
                
                writetoFile(trainingData, stats['goldEarned'])
                writetoFile(trainingData, stats['physicalDamageDealtToChampions'])
                writetoFile(trainingData, stats['magicDamageDealtToChampions'])
                writetoFile(trainingData, stats['largestMultiKill'])
                writetoFile(trainingData, stats['largestCriticalStrike'])
                
                writetoFile(trainingData, stats['totalDamageTaken'])
                writetoFile(trainingData, stats['killingSprees'])
                
                writetoFile(trainingData, stats['minionsKilled'])
                writetoFile(trainingData, stats['neutralMinionsKilled'])
                writetoFile(trainingData, stats['totalTimeCrowdControlDealt'])
                
                writetoFile(trainingData, stats['wardsPlaced'])
                writetoFile(trainingData, stats['totalHeal'])

                trainingData.write('\n')

parsed 0 files...
