# Smart Recommendation System for LoL Players


## Introduction

<img src="LoL.jpg">


The League of Legends (LoL) is a multiplayer online battle arena video game. Since its first release in 2009, the game has gained great popularity among players around the globe. In September 2016, it was estimated that there were over 100 million active players each month. From a larger scope, online gaming industry is consistently on the rise. Massively multiplayer(MMO) gaming generated revenue of roughly 19.9 billion U.S. dollars in 2016 [(1)](## References:). In this project, we obtains dynamic data including players and matches as well as static data from the official Riot API [(2)](## References:) through Cassiopeia, a framework dedicated to the Riot API [(3)](## References:). We then proceed onto further manipulation and analysis of the data and eventually building different functionalities like champion recommendation, win prediction and match scoring system for LoL players.

## Table of Content

[0. Project Preparation](#)

[1. Design and Implementation of Database](#Design-and-Loading-of Database)

[2. Data Crawling](#Data-Crawling)

[3. Basic Data Pre-process](#Basic-Data-Pre-process)

[4. Manipulation of Data](#Manipulation-of-Data)

[5. Prediction of Outcomes](#Prediction-of-Outcomes)

[6. Champion Relationship and Combination](#Champion-Relationship-and-Combination)

[7. Player Scoring System](#Player-Scoring-System)

[8. Champion Clustering and Recommendation System](#Champion-Recommendation-System)

[9. Conclusion](#Conclusion)

## 0. Project Preparation(environment installation)

## 1. Design and Implementation of  Database

The official Roit API has already provided a great many of different game data, and Cassiopeia helps to wrap everything up and have a great way to deal with rate limit. Everything seems good with these two ready. But before we beign to crawl real-time user data, it'll be of great help to design our database first to store all the crawled data.  The purpose of this project is to analyze game-level behavior, so instead of retrieving all the massive and diverse data type, we mainly focuses on game-related data including champions, summoners, items, every match's data including match attributes and statistics, participant data and participant statistics, team statistics and match frame events.

Considering the enormous data amount and limited time and space, we decide to focus on Season 8's 5v5 classic rank solo data with tier in bronze, silver, gold, platinum and diamond, which is the most common type of matches the players play. And the game behavior in this range is also the biggest concern for most users. Also, to make things easier we use Sqlite as our database system. We design our database scheme as follows:

<img src="schema.png">

## 2. Data Crawling

With database settled down, we can start to build our crawler. This crawler is based on Cassiopeia which wrap the official Roit API efficiently and works quite well with the rate limit problems. Detailed usage of this package can be seen at [Cassiopeia API](#). The idea behind this crawler is simple: To begin with we'll first collect some player's name uniformly distributed from tier bronze to diamond as our seed players. This is done manually by either asking friends for player name or looking them up at [op.gg](#). 

<img src="op.png">

We collected 50 players, each of the tier (bronze, silver, gold, platinum and diamond) contains 10 to make sure the data we get is balanced (We added some extra seed players from our friends since we found some player name doesn't exist possibly due to their modifying names in this season). Then we put the seed players in initial queue. We'll loop over the queue, for every player we'll first store it in database and mark it as crawled, then we'll get all this player's match history in Season's 5v5 solo rank, for every match, we'll first store all the players in this match in database, and mark them as uncrawled if they don't appear in database before. Then we'll store every participant's statistics, each team's statistics, and the match statistics as well, and after looping over this player's match, we'll move on to the next player in the queue. After we loop through the current queue, we'll use a SQL sentence to query all the players who haven't been crawled yet, and add them to the queue and begi next loop. The crawler will stop when either all the players in this season have been crawled or we've already obtained the total amount of data we want. Detailed code snippets can be refered as follows. The code below has clear structure and proper comments, and should be easy to understand.

In [9]:
import sqlite3
import pandas as pd
import math
import time
import random
import traceback
import datetime
import json
import numpy as np
from collections import Counter

import cassiopeia as cass
from cassiopeia import Summoner, Match, Champions, Champion
from cassiopeia.data import Season, Queue, Tier

In [None]:
DIAMOND = 10001
PLATINUM = 10001
GOLD = 10001
SLIVER = 10001
BRONZE = 10001
match_error = [0]


def is_resume(conn):
    try:
        result = pd.read_sql('SELECT * FROM Champion', conn).empty
    except:
        #traceback.print_exc()
        print('Cannot resume!')
        return
    return not result

def resume_dicts(conn):
    champions=list(pd.read_sql("SELECT name from Champion", conn)['name'])
    champion2idx={}
    for i,c in enumerate(champions):
        champion2idx[c]=i

    items=list(pd.read_sql("SELECT name from item", conn)['name'])
    item2idx = {}
    for i, item in enumerate(items):
        item2idx[item] = i

    spells=list(pd.read_sql("SELECT name from summoner_spell", conn)['name'])
    spell2idx = {}
    for i, spell in enumerate(spells):
        spell2idx[spell] = i
    return champion2idx,item2idx,spell2idx

def initializeSeed(filename):
    seeds = []
    raw_data = pd.read_csv(filename)
    for idx, data in raw_data.iterrows():
        for sommoner in data:
            seeds.append(sommoner)
    return seeds


def getChampionsItemsAndSpells(conn):
    c = conn.cursor()
    # Get champions, in case the id of champions change because of version change,
    # we sort it first
    champions = Champions(region="NA")
    champion_list = []
    for cham in champions:
        champion_list.append(cham.name)
    champion_list.sort()
    champion2idx = {}
    for i, cham in enumerate(champion_list):
        champion2idx[cham] = i
        c.execute("INSERT INTO Champion VALUES( ?,? )", (i, cham))
    # Get items
    items = cass.get_items(region="NA")
    item_list = []
    for item in items:
        item_list.append(item.name)
    item_list.sort()
    item2idx = {}
    for i, item in enumerate(item_list):
        item2idx[item] = i
        c.execute("INSERT INTO item VALUES(?,?)", (i, item))
    # Get Sommoner Spells
    sspells = cass.get_summoner_spells(region="NA")
    spell_list = []
    for spell in sspells:
        spell_list.append(spell.name)
    spell_list.sort()
    spell2idx = {}
    for i, spell in enumerate(spell_list):
        spell2idx[spell] = i
        c.execute("INSERT INTO summoner_spell VALUES( ?,? )", (i, spell))
    conn.commit()
    return champion2idx, item2idx, spell2idx


def is_summoner_duplicate(summoner, conn):
    try:
        result = pd.read_sql('SELECT * FROM Summoner WHERE id={}'.format(summoner.id), conn).empty
    except:
        #traceback.print_exc()
        print('error during duplicating summoner!')
        return True

    return not result


def insertSommoner(sommoner, conn, counts):
    c = conn.cursor()
    try:
        league = sommoner.leagues
        # get rank in this season
        if league == None or len(league) == 0 or league.fives == None or len(league.fives) == 0:
            rank_this_season = Tier.unranked.value.lower()
        else:
            rank_this_season = league.fives[0].tier.value.lower()
        # get rank in last season
        rank_last_season = sommoner.rank_last_season.value.lower()
        c.execute("INSERT INTO SUMMONER VALUES(?,?,?,?,?,?,?)",
                  (
                      sommoner.id, sommoner.name, sommoner.region.value, sommoner.level, rank_this_season,
                      rank_last_season,
                      0))
        counts[rank_this_season] += 1
        conn.commit()
    except:
        #traceback.print_exc()
        print('Insert Sommner {} failed!'.format(sommoner.name))


def insertMatch(match, conn, champion2idx, item2idx, spell2idx):
    c = conn.cursor()
    try:
        # Insert the match data first
        c.execute("INSERT INTO MATCH VALUES(?,?,?,?,?,?,?,?,?)", (
            match.id, match.duration.total_seconds(), match.version, match.season.value, match.region.value,
            match.queue.value, match.creation.timestamp, int(match.is_remake), 'unknown'))
        # insert teams info
        insertTeams(conn, match)
        # insert team ban info
        insertTeamBan(match, conn, champion2idx)
        # Then insert participants
        participants = match.participants
        for p in participants:
            # insert every participant with its stats
            insertParticipant(p, conn, champion2idx, item2idx, spell2idx, match)
            # insert participant timeline
            insertParticipantTimeline(p, conn, match)

        # After inserting participant, calculate the match's rank by selecting the most common one
        # among 10 participants
        try:
            ranks = list(pd.read_sql(
                "select rank_this_season from Summoner where Summoner.id in (select summoner_id from Participants where match_id={})".format(
                    match.id), conn)['rank_this_season'])
            match_rank=Counter(ranks)
            match_rank=match_rank.most_common(1)[0][0]
            print('match {} has average tier of {}.'.format(match.id,match_rank))
            c.execute("UPDATE match SET tier = ? where id=?",(match_rank,match.id))
        except:
            #traceback.print_exc()
            print('error when updating match {} tier!'.format(match.id))

        if match.timeline == None or match.timeline.frames == None:
            print('This match{} does no have events!'.format(match.id))
            return
        else:
            for frame in match.timeline.frames:
                events = frame.events
                # insert kill champion and kill monster event
                insertEvent(events, conn, match)
        conn.commit()
    except:
        #traceback.print_exc()
        print('Insert match {} failed!'.format(match.id))


def insertEvent(events, conn, match):
    c = conn.cursor()
    try:
        for event in events:
            if event == None:
                continue
            elif event.type == 'CHAMPION_KILL' and event.victim_id != None and event.killer_id != None:
                c.execute(
                    'INSERT INTO kill_champion_event (match_id,victim_id,killer_id, happen_time) VALUES (?,?,?,?)',
                    (match.id, event.victim_id, event.killer_id, event.timestamp))
            elif event.type == 'ELITE_MONSTER_KILL' and event.monster_type != None and event.killer_id != None:
                c.execute('INSERT INTO kill_monster_event (match_id,timestamp,killer_id,monster_type) VALUES (?,?,?,?)',
                          (match.id, event.timestamp, event.killer_id, event.monster_type))
        conn.commit()
    except:
        #traceback.print_exc()
        print('Event error in match{} !'.format(match.id))

    pass


def is_match_duplicate(match, conn):
    try:
        result = pd.read_sql('SELECT * FROM MATCH WHERE id={}'.format(match.id), conn).empty
    except:
        #traceback.print_exc()
        print('error during duplicating macth!')
        match_error[0] += 1
        return True
    return not result


"""
Blue team id=1
read tema id=2
"""


def insertTeams(conn, match):
    c = conn.cursor()
    try:
        # Insert the blue team
        blueteam = match.blue_team
        c.execute("INSERT INTO Team VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?)", (
            1, match.id, 'blue', int(blueteam.win), blueteam.dragon_kills, blueteam.baron_kills,
            blueteam.inhibitor_kills, blueteam.tower_kills, blueteam.first_blood, blueteam.first_dragon,
            blueteam.first_baron, blueteam.first_tower, blueteam.first_rift_herald))
        # Insert red team
        redteam = match.red_team
        c.execute("INSERT INTO Team VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?)", (
            2, match.id, 'red', int(redteam.win), redteam.dragon_kills, redteam.baron_kills,
            redteam.inhibitor_kills, redteam.tower_kills, redteam.first_blood, redteam.first_dragon,
            redteam.first_baron, redteam.first_tower, redteam.first_rift_herald))
        conn.commit()
    except:
        #traceback.print_exc()
        print('Insert match {} Team failed!'.format(match.id))


def insertTeamBan(match, conn, champion2idx):
    c = conn.cursor()
    try:
        # blue team ban
        blueteam = match.blue_team
        blue_bans = blueteam.bans
        for bb in blue_bans:
            if bb != None:
                champion = Champion(id=bb.id)
                c.execute("INSERT INTO team_ban (team_id,match_id,ban_champion) VALUES (?,?,?)",
                          (1, match.id, champion2idx[champion.name]))
        # red team ban
        redteam = match.red_team
        red_bans = redteam.bans
        for rb in red_bans:
            if rb != None:
                champion = Champion(id=rb.id)
                c.execute("INSERT INTO team_ban (team_id,match_id,ban_champion) VALUES (?,?,?)",
                          (2, match.id, champion2idx[champion.name]))
        conn.commit()
    except:
        #traceback.print_exc()
        print('Insert Match {} Team ban failed!'.format(match.id))


def insertParticipantTimeline(participant, conn, match):
    c = conn.cursor()
    try:
        timeline = participant.timeline
        creeps_per_min_deltas = json.dumps(timeline.creeps_per_min_deltas)
        cs_diff_per_min_deltas = json.dumps(timeline.cs_diff_per_min_deltas)
        damage_taken_diff_per_min_deltas = json.dumps(timeline.damage_taken_diff_per_min_deltas)
        gold_per_min_deltas = json.dumps(timeline.gold_per_min_deltas)
        damage_taken_per_min_deltas = json.dumps(timeline.damage_taken_per_min_deltas)
        xp_diff_per_min_deltas = json.dumps(timeline.xp_diff_per_min_deltas)
        xp_per_min_deltas = json.dumps(timeline.xp_per_min_deltas)
        c.execute("INSERT INTO participant_timeline VALUES (?,?,?,?,?,?,?,?,?,?)", (
            participant.id, participant.summoner.id, match.id, creeps_per_min_deltas, cs_diff_per_min_deltas,
            damage_taken_diff_per_min_deltas, gold_per_min_deltas, damage_taken_per_min_deltas, xp_diff_per_min_deltas,
            xp_per_min_deltas))
        conn.commit()
    except:
        #traceback.print_exc()
        print('Insert Participant {} {}timeline info failed!'.format(participant.summoner.name, match.id))


def insertParticipant(participant, conn, champion2idx, counts, spell2idx, match):
    c = conn.cursor()
    try:
        summoner = participant.summoner
        # insert this summoer if it's not included before
        if not is_summoner_duplicate(summoner, conn):
            insertSommoner(summoner, conn, counts)
        stats = participant.stats
        pid = participant.id
        summoner_id = summoner.id
        match_id = match.id
        champion_id = champion2idx[participant.champion.name]
        side = participant.side.name
        win = participant.team.win
        role = participant.role
        if role != None and type(role) != str:
            role = role.value
        try:
            lane = participant.lane.value
        except:
            lane=None
        sspell1 = spell2idx[participant.summoner_spell_d.name]
        sspell2 = spell2idx[participant.summoner_spell_f.name]
        level = stats.level
        items = []
        for it in stats.items:
            if it != None:
                items.append(it.name)
        items = ",".join(items)
        kills = stats.kills
        deaths = stats.deaths
        assist = stats.assists
        kda = stats.kda
        turret_kills = stats.turret_kills
        first_tower_kill = int(stats.first_tower_kill)
        damage_dealt_to_turrets = stats.damage_dealt_to_turrets
        first_blood_kill = int(stats.first_blood_kill)
        double_kills = stats.double_kills
        triple_kills = stats.triple_kills
        quadra_kills = stats.quadra_kills
        penta_kills = stats.penta_kills
        killing_sprees = stats.killing_sprees
        inhibitor_kills = stats.inhibitor_kills
        gold_earned = stats.gold_earned
        gold_spent = stats.gold_spent
        largest_killing_spree = stats.largest_killing_spree
        largest_critical_strike = stats.largest_critical_strike
        largest_multi_kill = stats.largest_multi_kill
        longest_time_spent_living = stats.longest_time_spent_living
        magic_damage_dealt_to_champions = stats.magic_damage_dealt_to_champions
        magical_damage_taken = stats.magical_damage_taken
        neutral_minions_killed = stats.neutral_minions_killed
        neutral_minions_killed_enemy_jungle = stats.neutral_minions_killed_enemy_jungle
        physical_damage_dealt_to_champions = stats.physical_damage_dealt_to_champions
        physical_damage_taken = stats.physical_damage_taken
        sight_wards_bought_in_game = stats.sight_wards_bought_in_game
        total_damage_dealt_to_champions = stats.total_damage_dealt_to_champions
        total_damage_taken = stats.total_damage_taken
        total_heal = stats.total_heal
        total_minions_killed = stats.total_minions_killed
        true_damage_dealt_to_champions = stats.true_damage_dealt_to_champions
        true_damage_taken = stats.true_damage_taken
        vision_wards_bought_in_game = stats.vision_wards_bought_in_game
        wards_killed = stats.wards_killed
        wards_placed = stats.wards_placed
        time_CCing_others = stats.time_CCing_others

        c.execute(
            "INSERT INTO Participants VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",
            (
                pid, summoner_id, match_id, champion_id, side, win, role, lane, sspell1, sspell2, level, items, kills,
                deaths,
                assist,
                kda, turret_kills, first_tower_kill, damage_dealt_to_turrets, first_blood_kill, double_kills,
                triple_kills,
                quadra_kills, penta_kills, killing_sprees, inhibitor_kills, gold_earned, gold_spent,
                largest_killing_spree, largest_critical_strike, largest_multi_kill, longest_time_spent_living,
                magic_damage_dealt_to_champions, magical_damage_taken, neutral_minions_killed,
                neutral_minions_killed_enemy_jungle, physical_damage_dealt_to_champions, physical_damage_taken,
                sight_wards_bought_in_game, total_damage_dealt_to_champions, total_damage_taken, total_heal,
                total_minions_killed, true_damage_dealt_to_champions, true_damage_taken, vision_wards_bought_in_game,
                wards_killed, wards_placed, time_CCing_others))
        conn.commit()
    except:
        #traceback.print_exc()
        print('Insert Participant {} , {}, {} failed!'.format(participant.id, participant.champion.name,
                                                              participant.role))


def enough(counts):
    result = True
    for i in counts.values():
        result = result and i > 9999
    return result


def main():
    cass.set_riot_api_key("RGAPI-29e6ff93-af85-449e-b884-47fface545a8")
    cass.set_default_region("NA")
    conn = sqlite3.connect('lol.db')
    counts = {}
    counts[Tier.diamond.value.lower()] = 0
    counts[Tier.platinum.value.lower()] = 0
    counts[Tier.gold.value.lower()] = 0
    counts[Tier.silver.value.lower()] = 0
    counts[Tier.bronze.value.lower()] = 0
    counts[Tier.unranked.value.lower()] = 0
    invalid_summon = 0
    total_sommoner = 0
    match_total_num = 0
    match_repeat_num = 0
    match_invalid_num = 0
    match_valid_num = 0

    is_seed=True
    """
    1. Initialize seedfiles
    2. Begin crawling
    """
    if is_resume(conn):
        print('Crawler resume. Count Restart.')
        unpulled_summoners = list(pd.read_sql("SELECT name from Summoner where is_crawler=0", conn)['name'])
        champion2idx, item2idx, spell2idx=resume_dicts(conn)
    else:
        unpulled_summoners = initializeSeed('seed.csv')
        # get champion, item, spells maps
        champion2idx, item2idx, spell2idx = getChampionsItemsAndSpells(conn)
    while len(unpulled_summoners) > 0:
        is_enough = False
        random.shuffle(unpulled_summoners)
        for summoner in unpulled_summoners:
            current_summoner = Summoner(name=summoner)
            try:
                # we only need S8 5v5 solo rank data
                allmatches = current_summoner.match_history(seasons={Season.season_8}, queues={Queue.ranked_solo_fives})
                if allmatches == None or len(allmatches) == 0:
                    print('The summoner {} has no matches in {}! Continue.'.format(current_summoner, Season.season_8))
                    invalid_summon += 1
                    continue
            except:
                #traceback.print_exc()
                print('The summoner {} not exist!'.format(current_summoner))
                invalid_summon += 1
                continue
            print('Begin crawl Summoner {}, he has {} matches in S8.'.format(current_summoner.name, len(allmatches)))
            # insert the current summoner into database if this is first loop
            if is_seed and not is_summoner_duplicate(current_summoner,conn):
                insertSommoner(current_summoner, conn, counts)
            # begin to visit all matches
            for match in allmatches:
                match_total_num += 1
                # None match, invalid, just skip
                if match == None:
                    print('None match!')
                    match_invalid_num += 1
                    continue
                # Duplicate match, skip
                elif is_match_duplicate(match, conn):
                    match_repeat_num += 1
                    print('match duplicate!')
                    continue
                # This is what we want
                else:
                    match_valid_num += 1
                    # insert match
                    insertMatch(match, conn, champion2idx, counts, spell2idx)
                    # whether we have got enough data
            # update summoner to be already crawled
            c = conn.cursor()
            c.execute("UPDATE Summoner SET is_crawler=1 WHERE id={}".format(current_summoner.id))
            conn.commit()
            if enough(counts):
                is_enough = True
                break
        if is_enough:
            break
        is_seed=False
        unpulled_summoners = list(pd.read_sql("SELECT name from Summoner where is_crawler=0", conn)['name'])

    print('Finish crawling!')
    print(
        'We have crawled {} summoners in total,{} diamond, {} platinum, {} gold, {} silver, {} bronze, {} unranked.'.format(
            sum(counts.values()), counts['diamond'], counts['platinum'], counts['gold'], counts['silver'],
            counts['bronze'], counts['unranked']))
    print('We have crawled {} matches in total, {} error match, {} duplicate match, {} normal match'.format(
        match_total_num, match_invalid_num + match_error[0], match_repeat_num - match_error[0], match_valid_num))
    conn.close()



Let's try to run it:

In [None]:
main()

Crawler resume. Count Restart.
Making call: https://na1.api.riotgames.com/lol/summoner/v3/summoners/by-name/QwertyLynx
Making call: https://na1.api.riotgames.com/lol/match/v3/matchlists/by-account/204796261?beginIndex=0&endIndex=100&season=11&queue=420
Begin crawl Summoner QwertyLynx, he has 72 matches in S8.
Making call: https://ddragon.leagueoflegends.com/realms/na.json
Making call: https://na1.api.riotgames.com/lol/match/v3/matches/2773768097
Making call: https://ddragon.leagueoflegends.com/cdn/8.9.1/data/en_US/championFull.json
Making call: https://ddragon.leagueoflegends.com/cdn/8.8.1/data/en_US/championFull.json
Making call: https://ddragon.leagueoflegends.com/cdn/8.8.1/data/en_US/summoner.json
Making call: https://ddragon.leagueoflegends.com/cdn/8.8.1/data/en_US/item.json
Making call: https://na1.api.riotgames.com/lol/league/v3/positions/by-summoner/53406823
Making call: https://na1.api.riotgames.com/lol/league/v3/leagues/c8820a70-ff33-11e7-9727-c81f66cf135e
Making call: https:/

Since the crawler will run for a really long time, we'd like to just show you what it is like. And you should find a lol.db in the current directory with some data newly crawled. Till May 10th, 2018, we've already crawled xxx data. We expect to collect more in the future.

## Basic Data Pre-process

After data crawling, we have got plenty of raw data. The next step is to pre-process attributes and transfer them into features that are useful to us. First, in the basic data preprocess, we only collect the average attributes of each champion in differnet position, such as win rate, ban rate, kda and etc, based on data from all participants and matches. 

<img src="champ.png">

In [10]:
# Get data from sqlite database
def get_data(conn, table_name):
    res = []
    query = 'SELECT * FROM ' + table_name
    c = conn.cursor()
    data = c.execute(query)
    for d in data:
        res.append(d)
    return res

# Get match data
def get_match_data(match, participants):
    # kda, gold, kill
    match_data = []
    for m in match:
        match_id = m[0]
        curr_match = {'red':{'total':{'kills':0, 'kda': 0, 'income':0}, 'by_lane':{}}, 'blue':{'total':{'kills':0, 'kda': 0, 'income':0}, 'by_lane':{}}}
        
        win = ''
        for p in participants:
            if p[2] == match_id:
                curr_match[p[4]]['total']['kills'] += p[12]
                curr_match[p[4]]['total']['kda'] += p[15]
                curr_match[p[4]]['total']['income'] += p[26]
                curr_match[p[4]]['total']['income'] -= p[27]
                
                if p[7] not in curr_match[p[4]]['by_lane'].keys():
                    curr_match[p[4]]['by_lane'][p[7]] = {}
                    curr_match[p[4]]['by_lane'][p[7]]['kills'] = 0
                    curr_match[p[4]]['by_lane'][p[7]]['kda'] = 0
                    curr_match[p[4]]['by_lane'][p[7]]['income'] = 0
                    curr_match[p[4]]['by_lane'][p[7]]['income'] = 0
                    
                curr_match[p[4]]['by_lane'][p[7]]['kills'] += p[12]
                curr_match[p[4]]['by_lane'][p[7]]['kda'] += p[15]
                curr_match[p[4]]['by_lane'][p[7]]['income'] += p[26]
                curr_match[p[4]]['by_lane'][p[7]]['income'] -= p[27]
                
                if p[5] == 1:
                    win = p[4]
                    
        curr_match['win_side'] = win
        match_data.append(curr_match)
    
    return match_data


# get a participant's position
def get_position(p):
    if p[7] == 'BOT_LANE' or None:
        if p[6] == 'DUO':
            return None
        else:
            return p[6]
    else:
        return p[7]

# Get the average ban rate for a champion. No difference bewteen positions.
def ban_rate(champion_id, team_ban):
    champions = []
    for ban in team_ban:
        champions.append(ban[-1])
    rate = champions.count(champion_id)/len(team_ban)
    return rate

# Get all useful average values of atrributes for a champion
def champion_data(champion_id, participants):
    chosen_rate = {}
    win_rate = {}
    kill = {}
    death = {}
    assist = {}
    physical_damage_to = {}
    physical_damage_taken = {}
    magic_damage_to = {}
    magic_damage_taken = {}
    true_damage_to = {}
    true_damage_taken = {}
    gold_earned = {}
    gold_spent = {}
    tower_kill = {}
    minions_kill = {}
    minions_kill_enemy = {}
    first_blood = {}
    total_heal = {}
    time_CCing = {}
    sight_ward = {}
    vision_ward = {}
    wards_killed = {}
    wards_placed = {}
    largest_killing_spree = {}
    largest_critical_strike = {}
    largest_multi_kill = {}
    longest_living_time = {}
    res = {}
    
    for p in participants:
        if p[3] == champion_id:
            pos = get_position(p)
            if pos is None:
                continue
            else:
                if pos in chosen_rate:
                    chosen_rate[pos]+=1
                    win_rate[pos].append(p[5])
                    kill[pos].append(p[12])
                    death[pos].append(p[13])
                    assist[pos].append(p[14])
                    physical_damage_to[pos].append(p[36])
                    physical_damage_taken[pos].append(p[37])
                    magic_damage_to[pos].append(p[32])
                    magic_damage_taken[pos].append(p[33])
                    true_damage_to[pos].append(p[-6])
                    true_damage_taken[pos].append(p[-5])
                    gold_earned[pos].append(p[26])
                    gold_spent[pos].append(p[27])
                    tower_kill[pos].append(p[16])
                    minions_kill[pos].append(p[34])
                    minions_kill_enemy[pos].append(p[35])
                    first_blood[pos].append(p[19])
                    total_heal[pos].append(p[-8])
                    time_CCing[pos].append(p[-1])
                    sight_ward[pos].append(p[-11])
                    vision_ward[pos].append(p[-4])
                    wards_killed[pos].append(p[-3])
                    wards_placed[pos].append(p[-2])
                    largest_killing_spree[pos].append(p[28])
                    largest_critical_strike[pos].append(p[29])
                    largest_multi_kill[pos].append(p[30])
                    longest_living_time[pos].append(p[31])
                    
                    
                else:
                    chosen_rate[pos] = 1
                    win_rate[pos] = []
                    kill[pos] = []
                    death[pos] = []
                    assist[pos] = []
                    physical_damage_to[pos] = []
                    physical_damage_taken[pos] =[]
                    magic_damage_to[pos] = []
                    magic_damage_taken[pos] = []
                    true_damage_to[pos] = []
                    true_damage_taken[pos] = []
                    gold_earned[pos] = []
                    gold_spent[pos] = []
                    tower_kill[pos] = []
                    minions_kill[pos] = []
                    minions_kill_enemy[pos] = []
                    first_blood[pos] = []
                    total_heal[pos] = []
                    time_CCing[pos] = []
                    sight_ward[pos] = []
                    vision_ward[pos] = []
                    wards_killed[pos] = []
                    wards_placed[pos] = []
                    largest_killing_spree[pos] = []
                    largest_critical_strike[pos] = []
                    largest_multi_kill[pos] = []
                    longest_living_time[pos] = []

                    chosen_rate[pos]+=1
                    win_rate[pos].append(p[5])
                    kill[pos].append(p[12])
                    death[pos].append(p[13])
                    assist[pos].append(p[14])
                    physical_damage_to[pos].append(p[36])
                    physical_damage_taken[pos].append(p[37])
                    magic_damage_to[pos].append(p[32])
                    magic_damage_taken[pos].append(p[33])
                    true_damage_to[pos].append(p[-6])
                    true_damage_taken[pos].append(p[-5])
                    gold_earned[pos].append(p[26])
                    gold_spent[pos].append(p[27])
                    tower_kill[pos].append(p[16])
                    minions_kill[pos].append(p[34])
                    minions_kill_enemy[pos].append(p[35])
                    first_blood[pos].append(p[19])
                    total_heal[pos].append(p[-8])
                    time_CCing[pos].append(p[-1])
                    sight_ward[pos].append(p[-11])
                    vision_ward[pos].append(p[-4])
                    wards_killed[pos].append(p[-3])
                    wards_placed[pos].append(p[-2])
                    largest_killing_spree[pos].append(p[28])
                    largest_critical_strike[pos].append(p[29])
                    largest_multi_kill[pos].append(p[30])
                    longest_living_time[pos].append(p[31])            
                    
    for r in chosen_rate:
        rate = chosen_rate[r]/len(participants)
        res[r] = []
        res[r].append({'chosen_rate': rate})
    for r in win_rate:
        rate = sum(win_rate[r])/len(win_rate[r])
        res[r].append({'win_rate': rate})
    for k in kill:
        res[k].append({'kills': sum(kill[k])/len(kill[k])})
    for d in death:
        res[d].append({'deaths': sum(death[d])/len(death[d])})
    for a in assist:
        res[a].append({'assists': sum(assist[a])/len(assist[a])})
    for d in physical_damage_to:
        res[d].append({'physical_damage_to': sum(physical_damage_to[d])/len(physical_damage_to[d])})
    for d in physical_damage_taken:
        res[d].append({'physical_damage_taken': sum(physical_damage_taken[d])/len(physical_damage_taken[d])})
    
    for d in magic_damage_to:
        res[d].append({'magic_damage_to': sum(magic_damage_to[d])/len(magic_damage_to[d])})
    for d in magic_damage_taken:
        res[d].append({'magic_damage_taken': sum(magic_damage_taken[d])/len(magic_damage_taken[d])})
    for d in true_damage_to:
        res[d].append({'true_damage_to': sum(true_damage_to[d])/len(true_damage_to[d])})
    for d in true_damage_taken:
        res[d].append({'true_damage_taken': sum(true_damage_taken[d])/len(true_damage_taken[d])})
    for g in gold_earned:
        res[g].append({'gold_earned': sum(gold_earned[g])/len(gold_earned[g])})
    for g in gold_spent:
        res[g].append({'gold_spent': sum(gold_spent[g])/len(gold_spent[g])})
    for t in tower_kill:
        res[t].append({'tower_kill': sum(tower_kill[t])/len(tower_kill[t])})
    for t in minions_kill:
        res[t].append({'minions_kill': sum(minions_kill[t])/len(minions_kill[t])})
    for t in minions_kill_enemy:
        res[t].append({'minions_kill_enemy': sum(minions_kill_enemy[t])/len(minions_kill_enemy[t])})
    for t in first_blood:
        res[t].append({'first_blood': sum(first_blood[t])/len(first_blood[t])})
    
    for t in total_heal:
        res[t].append({'total_heal': sum(total_heal[t])/len(total_heal[t])})
    for t in time_CCing:
        res[t].append({'time_CCing': sum(time_CCing[t])/len(time_CCing[t])})
    for t in sight_ward:
        res[t].append({'sight_ward': sum(sight_ward[t])/len(sight_ward[t])})
    for t in vision_ward:
        res[t].append({'vision_ward': sum(vision_ward[t])/len(vision_ward[t])})
    for t in wards_killed:
        res[t].append({'wards_killed': sum(wards_killed[t])/len(wards_killed[t])})
    for t in wards_placed:
        res[t].append({'wards_placed': sum(wards_placed[t])/len(wards_placed[t])})
    for t in largest_killing_spree:
        res[t].append({'largest_killing_spree': sum(largest_killing_spree[t])/len(largest_killing_spree[t])})
    for t in largest_critical_strike:
        res[t].append({'largest_critical_strike': sum(largest_critical_strike[t])/len(largest_critical_strike[t])})
    for t in largest_multi_kill:
        res[t].append({'largest_multi_kill': sum(largest_multi_kill[t])/len(largest_multi_kill[t])})
    for t in longest_living_time:
        res[t].append({'longest_living_time': sum(longest_living_time[t])/len(longest_living_time[t])})
    
    return res

For each champion, we considered their positions, and only collect participants' data with position to analyze. So the an exmaple output is shown as the following (For the first champion, Aatrox):

In [14]:
conn = sqlite3.connect('lol.db')
participants = get_data(conn, 'Participants')
champions = get_data(conn, 'Champion')
mean_data = champion_data(0, participants)
match = get_data(conn, 'Match')
js = json.dumps(mean_data, sort_keys=True, indent=4, separators=(',', ':'))
print(js)

{
    "JUNGLE":[
        {
            "chosen_rate":0.0010184006480731397
        },
        {
            "win_rate":0.5813953488372093
        },
        {
            "kills":6.488372093023256
        },
        {
            "deaths":6.558139534883721
        },
        {
            "assists":6.5813953488372094
        },
        {
            "physical_damage_to":13182.372093023256
        },
        {
            "physical_damage_taken":22329.86046511628
        },
        {
            "magic_damage_to":2260.8372093023254
        },
        {
            "magic_damage_taken":10444.651162790698
        },
        {
            "true_damage_to":1654.2558139534883
        },
        {
            "true_damage_taken":1200.4651162790697
        },
        {
            "gold_earned":12196.976744186046
        },
        {
            "gold_spent":10914.60465116279
        },
        {
            "tower_kill":0.9534883720930233
        },
        {
            "minions_kill":88.651

We can see clearly that this champion has only been used for three positions. Among them, MID_LANE is rarely used; in TOP_LANE and JUNGLE positions this champion performs much better. 

## Manipulation of Data

In [16]:
all_champion_data = []
for c in champions:
    all_champion_data.append(champion_data(c[0], participants))
    
def predict_lane():
    ret = []
    for i, champion_data in enumerate(all_champion_data):
        best_rate = 0.0
        best_lane = ""
        for lane in champion_data.keys():
            win_rate = champion_data[lane][1]['win_rate']
            if win_rate > best_rate:
                best_rate = win_rate
                best_lane = lane
        ret.append(lane)
    return ret

lane = predict_lane()
rand = np.random.randint(0,len(champions),size=5)
for i in rand:
    print("{} in {} has the best chance of winning the game.".format(champions[i][1], lane[i]))

Sivir in TOP_LANE has the best chance of winning the game.
Draven in MID_LANE has the best chance of winning the game.
Zilean in SOLO has the best chance of winning the game.
Jax in DUO_SUPPORT has the best chance of winning the game.
Tryndamere in DUO_SUPPORT has the best chance of winning the game.


## Prediction of Outcomes
With the data curated from the previous section, we build a binary classification model to predict the outcome of the match using support vector machine (SVM).

In [64]:
def predict_result(match_id, match_data, verbose=False):
    X = []
    y = []
    for i, match in enumerate(match_data):
        X.append([])
        # Avoid division by zero
        X[i].append(match['red']['total']['kda'] / (match['blue']['total']['kda'] + 1e-6))
        X[i].append(match['red']['total']['income'] / (match['blue']['total']['income'] + 1e-6))
        X[i].append(match['red']['total']['kills'] / (match['blue']['total']['kills'] + 1e-6))

        if match['win_side'] == 'red':
            y.append(0)
        else:
            y.append(1)
   
    scaler = MinMaxScaler()
    clf = svm.SVC(C=1e10, kernel='linear')
    X = np.array(X)
    y = np.array(y)
    scaler.fit(X)
    X_train = scaler.transform(X)
    clf.fit(X_train, y)
    if verbose:
        ret = clf.predict(X_train)
        tot = len(ret)
        hit = 0
        for pred, true in zip(ret, y):
            if pred == true:
                hit += 1
        print("train_accuracy:{:2.2f}%".format(hit * 100 / tot))
    
    ret = clf.predict(X_train[match_id].reshape([1, -1]))
    return 'red' if ret == 0 else 'blue', clf

ret, clf = predict_result(0, match_data, verbose=True)

print("Our prediction for match {} is WIN: {} and the true outcome is WIN: {}".format(0, ret, match_data[0]['win_side']))

train_accuracy:93.95%
Our prediction for match 0 is WIN: red and the true outcome is WIN: red


We can further infer which lane has the most impact on the outcome of the match using the model trained.

In [65]:
def which_lane(clf, match_data, verbose=False):
    
    lane_dict = {}
    for match in match_data:
        for lane in match['red']['by_lane'].keys():
            if lane not in match['blue']['by_lane'].keys():
                continue
            if lane not in lane_dict.keys():
                lane_dict[lane] = []
            
            # Avoid division by zero
            kda = match['red']['by_lane'][lane]['kda'] / (match['blue']['by_lane'][lane]['kda'] + 1e-6)
            income = match['red']['by_lane'][lane]['income'] / (match['blue']['by_lane'][lane]['income'] + 1e-6)
            kill = match['red']['by_lane'][lane]['kills'] / (match['blue']['by_lane'][lane]['kills'] + 1e-6)
            win = 0 if match['win_side'] == 'red' else 1
            lane_dict[lane].append([kda, income, kill, win])
    lanelist = [lane for lane in lane_dict.keys()]
    best_acc = 0.0
    the_lane = ''
    for lane, lane_data in lane_dict.items():
        if lane == None:
            continue
        lane_data = np.array(lane_data)
        X = lane_data[:,:3]
        y = lane_data[:,-1]
        scaler = MinMaxScaler()
        scaler.fit(X)
        X_train = scaler.transform(X)
        ret = clf.predict(X_train)
        tot = len(ret)
        hit = 0
        for pred, true in zip(ret, y):
            if pred == true:
                hit += 1
        acc = hit/tot
        if verbose:
            print("{} : {:2.2f}%".format(lane, acc * 100))
        if acc > best_acc:
            best_acc = acc
            the_lane = lane
    return the_lane

# Print our inference
lane = which_lane(clf, match_data)
print("{} has the most impact on the outcome of the match.".format(lane))

TOP_LANE has the most impact on the outcome of the match.


## Champion Relationship and Combination

Now there are 140 champions in this game, and they all have different specialities and attributes. Therefore, the relationship bewteen them is very complicated. Some of them restrain each other, while some are cooperate within a team. Except the basic participants and match data, we have also collected killing champion events, and this can be used to analyze the restraint bewteen champions. 

In [4]:
# Get killing events from the database
kill_event = get_data(conn, 'kill_champion_event')

# Have a look about the data structure
print(kill_event[0])

(1, '2723373045', 4, 6, 234079)


In order, the data is 'id‘, 'match_id', 'victim_id', 'killer_id', 'happen_time'. But the 'victim_id' and 'killer_id' are ids within a match, which range from 0-10. Therefore, we need to transfer these ids into the champions names by calling the following 'kill_pairs' function. 

In [5]:
# Return two dicts, champion_name & frequency killed by this champion, and champion_name & frequency has ever killed by this champion
# The champion inputted restrain the champion with highest frequency in kills most
# The champion inputted is restrained most by the champion with highest frequency in be_killed_by 

def kill_pairs(champion_id, kill_event, participants, champions):
    kills = {}
    be_killed_by = {}
    matches = []
    ids_inmatch = []
    for p in participants:
        if p[3] == champion_id:
            matches.append(p[2])
            ids_inmatch.append(p[0])

    killer_id_inmatch = {}
    killed_id_inmatch = {}
    for i in range(len(matches)):
        for k in kill_event:
            if int(k[1]) == matches[i] and k[2] == ids_inmatch[i]:
                if matches[i] in killer_id_inmatch:
                    killer_id_inmatch[matches[i]].append(k[3])
                else:
                    killer_id_inmatch[matches[i]] = []
                    killer_id_inmatch[matches[i]].append(k[3])
            if int(k[1]) == matches[i] and k[3] == ids_inmatch[i]:
                if matches[i] in killed_id_inmatch:
                    killed_id_inmatch[matches[i]].append(k[2])
                else:
                    killed_id_inmatch[matches[i]] = []
                    killed_id_inmatch[matches[i]].append(k[2])
    
    for each in killer_id_inmatch:
        for k in killer_id_inmatch[each]:
            for p in participants:
                if p[2] == each:
                    if p[0] == k:
                        champion_name = champions[p[3]][1]
                        if champion_name in be_killed_by:
                            be_killed_by[champion_name] += 1
                        else:
                            be_killed_by[champion_name] = 1

    for each in killed_id_inmatch:
        for k in killed_id_inmatch[each]:
            for p in participants:
                if p[2] == each:
                    if p[0] == k:
                        champion_name = champions[p[3]][1]
                        if champion_name in kills:
                            kills[champion_name] += 1
                        else:
                            kills[champion_name] = 1
    
    
    return kills, be_killed_by

Example of the second champion, Ahri:

In [6]:
# Kills and be_killed_by are champions and frequency pairs
kills, be_killed_by = kill_pairs(1, kill_event, participants, champions)
print('kills: ', kills)
print('be_killed_by: ', be_killed_by)

kills:  {'Taric': 22, 'Camille': 22, 'Akali': 18, 'Miss Fortune': 29, 'Karma': 25, 'Udyr': 23, 'Janna': 61, 'Graves': 24, 'Jinx': 39, 'Jax': 51, 'Twisted Fate': 20, 'Nami': 45, 'Aatrox': 7, 'Caitlyn': 55, 'Vi': 18, 'Maokai': 21, 'Brand': 44, 'Lee Sin': 38, 'Gangplank': 17, 'Warwick': 64, 'Talon': 19, 'Skarner': 10, 'Braum': 21, 'LeBlanc': 28, 'Katarina': 15, 'Nasus': 12, 'Rengar': 19, 'Xerath': 39, 'Kindred': 9, 'Ziggs': 20, 'Blitzcrank': 29, 'Poppy': 5, 'Alistar': 23, 'Viktor': 14, 'Annie': 20, 'Teemo': 22, 'Ornn': 9, 'Thresh': 40, 'Zac': 16, 'Renekton': 21, 'Zed': 37, 'Xayah': 25, 'Pantheon': 9, "Kha'Zix": 46, 'Draven': 21, 'Kassadin': 44, 'Azir': 16, 'Vladimir': 32, 'Olaf': 16, 'Yasuo': 56, 'Lulu': 31, 'Jayce': 6, 'Tristana': 40, 'Lux': 38, 'Sivir': 14, "Vel'Koz": 9, 'Soraka': 15, 'Darius': 20, 'Kayn': 35, 'Quinn': 9, 'Vayne': 28, 'Fizz': 25, 'Morgana': 48, 'Wukong': 21, 'Ezreal': 68, 'Twitch': 25, 'Sion': 19, 'Varus': 44, 'Cassiopeia': 10, 'Anivia': 9, 'Rumble': 6, 'Galio': 18, 'Ry

In [7]:
# Get the champion with highest frequency
restrain_most = max(kills.items(), key=lambda x: x[1])
be_restrained_most = max(be_killed_by.items(), key=lambda x: x[1])
print(restrain_most, be_restrained_most)

('Ezreal', 68) ('Warwick', 81)


This information is quite useful to summoners. Who is going to use Arhi can ban Warwick before a match, in order to increase the probability to win.

Except the restraint bewteen champions, which champions can be combined better within a team is also an important question to summoners. If a summoner wants to use Arhi for 'TOP_LANE', he may want to know which champions should he choose as teammates? So we analyze the champion performace in over 8000 teams, in order to find the teams combination with top three performance.

In [64]:
# Get all teammates this champion has ever grouped with, and this champion's performance of those matches
# Team_mates & match_performance are corresponding, match_id is the first element of match_performance
# Different positions have different evalutation of performance. Here consider minions number for JUNGLE, wards number
# for SUPPORT
def get_teammate_and_performance(champion_id, position, participants, champions):
    data = []
    team_mates = []
    
    for p in participants:
        if p[3] == champion_id:
            if p[6] == position or p[7] == position:
                match = []
                match.extend([p[2], p[4], p[5], p[15], p[39], p[48]])
                if position == 'DUO_SUPPORT':
                    match.extend([(p[38]+p[45]+p[46]+p[47]), 0])
                elif position == 'JUNGLE':
                    match.extend([0, (p[34]+p[35])])
                else:
                    match.extend([0, 0])
                data.append(match)

    for d in data:
        team = []
        for p in participants:
            if p[2] == d[0] and p[4] == d[1]:
                if p[3] != champion_id:
                    pos = get_position(p)
                    team.append([pos,p[3]])
        
        team_mates.append(team)
    match_performance = [([d[0]]+d[2:]) for d in data]
    return team_mates, match_performance

# Normalize all data into range 0-5
def normalize(data):
    norms = []
    maxi = max(data)
    mini = min(data)
    unit = (maxi - mini)/5
    for d in data:
        l = (d-mini)/unit
        norms.append(l)
    return norms

# Get top three teams with highest normalized performance 
def top_three(l, teammates):
    m = {}
    for a in l:
        if a in m:
            m[a].append(l.index(a))
        else:
            m[a] = [l.index(a)]
    l = sorted(l, reverse=True)
    index = l[0:3]
    res = []
    for i in [m[i] for i in index]:
        for j in i:
            res.append(teammates[j])
            
    return res
    
    
# Get the best performance group by the normalized total performance of this champion
def best_performance_group(data, position, teammates):
    wins = [d[1] for d in data]
    kda = [d[2] for d in data]
    total_damage_to = [d[3] for d in data]
    time_CCing = [d[4] for d in data]
    wards = [d[5] for d in data]
    minions = [d[6] for d in data] 
    norms = []
    average = []
    
    if position == 'DUO_SUPPORT':
        
        # Normalize all attributes
        norms.extend([normalize(wins), normalize(kda), normalize(total_damage_to),
                       normalize(time_CCing),normalize(wards)])
        
        # Distributed the normalized attributes to the champion in each match
        for i in range(len(wins)):
            average.append((norms[0][i]+norms[1][i]+norms[2][i]+norms[3][i]+norms[4][i])/5)
        
        # Get the top three with highest performance
        return top_three(average, teammates)
        
    
    elif position == 'JUNGLE':
        norms.extend([normalize(wins), normalize(kda), normalize(total_damage_to),
                       normalize(time_CCing), normalize(minions)])
        for i in range(len(wins)):
            average.append((norms[0][i]+norms[1][i]+norms[2][i]+norms[3][i]+norms[4][i])/5)
        return top_three(average, teammates)
    
    else:
        norms.extend([normalize(wins), normalize(kda), normalize(total_damage_to),
                       normalize(time_CCing)])
        for i in range(len(wins)):
            average.append((norms[0][i]+norms[1][i]+norms[2][i]+norms[3][i])/4)
        return top_three(average, teammates)


So let us try this method, for the second champion, Arhi, in the 'MID_LANE' position again:

In [65]:
teammates, performance_data = get_teammate_and_performance(1, 'MID_LANE', participants, champions)
best_group = best_performance_group(performance_data, 'MID_LANE', teammates)
print('No.1: ', best_group[0], 'No.2: ', best_group[1], 'No.3: ', best_group[2])

No.1:  [['DUO_SUPPORT', 3], ['JUNGLE', 54], ['DUO_CARRY', 119], ['TOP_LANE', 58]] No.2:  [['DUO_SUPPORT', 65], ['TOP_LANE', 91], ['DUO_CARRY', 46], ['JUNGLE', 61]] No.3:  [['TOP_LANE', 110], ['DUO_CARRY', 71], ['DUO_SUPPORT', 103], ['JUNGLE', 61]]


## Player Scoring System

We have also developed a scoring system for summoners. After each match, our system will consider the performance of summoners, and compare with the average data of the champions they use and the position they take. Different positions have different weights on different attributes.

In [17]:
def calculate_score(data_set, position, participant_id):
    scores = []
    sums = []
    jungle = []
    top = []
    mid = []
    adc = []
    support = []
    duo=[]
    
    # In order 0-25 in each element of data_set:
    # 0.win_rate
    # 1.kills
    # 2.deaths
    # 3.assists
    # 4. physical_damage_to
    # 5. physical_damage_taken
    # 6. magic_damage_to
    # 7. magic_damage_taken
    # 8. true_damage_to
    # 9. true_damage_taken
    # 10. gold_earned
    # 11. gold_spent
    # 12. tower_kill
    # 13. minions_kill
    # 14. minions_kill_enemy
    # 15. first_blood
    # 16. total_heal
    # 17. time_CCing
    # 18. sight_ward
    # 19. vision_ward
    # 20. wards_killed
    # 21. wards_placed
    # 22. largest_killing_spree
    # 23. largest_critical_strike
    # 24. largest_multi_kill
    # 25. longest_living_time
    # gold_earned(d[10])is not considered in calcuating score，only gold_spent(d[11]) is enough

    for i in range(len(data_set)):
        d = data_set[i]
        try:
            # JUNGLE focuses more on 'minions_kill' and 'minions_kill_enemy'
            if position[i] == 'JUNGLE':
                s = 30*d[0]+28*d[1]-20*d[2]+25*d[3]+28*(d[4]+d[6]+d[8])+ 15*(d[5]+d[7]+d[9])
                +5*d[10]+d[11]+10*d[12]+ 15*d[13]+40*d[14]+6*d[15]+2*d[16]+20*d[17]
                +15*(d[18]+d[19]+d[20]+d[21])+10*d[22]+15*d[23]+5*d[24]+10*d[25]
                sums.append(s)
                jungle.append(s)

            # TOP_LANE focuses more on total damage and kills
            elif position[i] == 'TOP_LANE':
                s = 30*d[0]+30*d[1]-23*d[2]+15*d[3] + 30*(d[4]+d[6]+d[8]) + 17*(d[5]+d[7]+d[9])
                +6*d[10]+d[11] + 10*d[12]+10*d[13]+8*d[14] +6*d[15]+2*d[16]+20*d[17]
                +5*(d[18]+d[19]+d[20]+d[21])+15*d[22]+20*d[23]+5*d[24]+10*d[25]
                sums.append(s)
                top.append(s)

            # MID_LANE focuses more on kills, damage, and time_CCing
            elif position[i] == 'MID_LANE':
                s = 30*d[0]+30*d[1]-23*d[2]+20*d[3] + 30*(d[4]+d[6]+d[8])+ 5*(d[5]+d[7]+d[9])
                +6*d[10]+d[11] + 10*d[12]+10*d[13]+8*d[14] +6*d[15]+2*d[16]+30*d[17]
                +10*(d[18]+d[19]+d[20]+d[21])+15*d[22]+20*d[23]+5*d[24]+15*d[25]
                sums.append(s)
                mid.append(s)

            # DUO_CARRY is relatively balanced
            elif position[i] == 'DUO_CARRY':
                s = 30*d[0]+30*d[1]-23*d[2]+10*d[3]+30*(d[4]+d[6]+d[8])-2*(d[5]+d[7]+d[9])
                +6*d[10]+d[11] + 10*d[12]+10*d[13]+6*d[14] +6*d[15]+d[16]+20*d[17]
                +3*(d[18]+d[19]+d[20]+d[21])+15*d[22]+20*d[23]+5*d[24]+20*d[25]
                sums.append(s)
                adc.append(s)

            # DUO_SUPPORT focuses much more on the wards placed and killed
            elif position[i] == 'DUO_SUPPORT':
                s = 30*d[0]+20*d[1]-15*d[2]+30*d[3]+10*(d[4]+d[6]+d[8])+ 20*(d[5]+d[7]+d[9])
                +4*d[10]+d[11] + 10*d[12]+5*d[13]+4*d[14] +3*d[15]+15*d[16]+30*d[17]
                +30*(d[18]+d[19]+d[20]+d[21])+5*d[22]+5*d[23]+3*d[24]+5*d[25]
                sums.append(s)
                support.append(s)
            else:
                sums.append('unknown')
        except TypeError:
            sums.append('unknown')
            
    # Then we get the max and min value of each position dataset, in order to give summoners a normalized score
    jungle_max = max(jungle)
    jungle_min = min(jungle)
    top_max = max(top)
    top_min = min(top)
    mid_max = max(mid)
    mid_min = min(mid)
    adc_max = max(adc)
    adc_min = min(adc)
    support_max = max(support)
    support_min = min(support)
    
    # Normalize the summoner score within the range of 1-99, based on his/her position
    # The min score is 1, because we believe every try from summoners worths a positive score
    # The max score is 99, because we believe there is no perfect performance. Summoners can improve themselves,
    # even get the highest scores.  
    participant_index = participant_id - 1
    if sums[participant_index] != 'unknown':

        if position[participant_index] == 'JUNGLE':
            unit = 98/(jungle_max-jungle_min)
            scores = ((sums[i]-jungle_min)*unit + 1)
        elif position[i] == 'TOP_LANE':
            unit = 98/(top_max-top_min)
            scores = ((sums[i]-top_min)*unit + 1)
        elif position[i] == 'MID_LANE':
            unit = 98/(mid_max-mid_min)
            scores = ((sums[i]-mid_min)*unit + 1)
        elif position[i] == 'DUO_CARRY':
            unit = 98/(adc_max-adc_min)
            scores = ((sums[i]-adc_min)*unit + 1)
        elif position[i] == 'DUO_SUPPORT':
            unit = 98/(support_max-support_min)
            scores = ((sums[i]-support_min)*unit + 1)
        else:
            scores = 0
    else:
        scores = 0
    return scores    

# A help method to normalize difference bewteen summoners data and average data, into range of 0-1.
def normalization(data):
    for i in range(len(data[0])):
        original = []
        norms = []
        for r in data:
            if r != 0:
                original.append(r[i])
            else:
                original.append(0)
        maxi = max(original)
        mini = min(original)

        for o in original:
            if float(maxi-mini) != 0:
                norms.append((o-mini)/float(maxi-mini))
            else:
                norms.append(0)

        for j in range(len(data)):
            if data[j] != 0:
                data[j][i] = norms[j]
    return data

Let us try to calculate the first summoner's score according to our rules and the current database. 

In [18]:
# Calculate all participants scores
all_champion_data = []

for c in champions:
    all_champion_data.append(champion_data(c[0], participants))
difference = []
positions = []

# Calculate all participants performance difference with champions average data
for each in participants:
    champion_datas = all_champion_data[each[3]]
    if each[6] in champion_datas:
        champion_datas = champion_datas[each[6]]
        positions.append(each[6])
    elif each[7] in champion_datas:
        champion_datas = champion_datas[each[7]]
        positions.append(each[7])
    else:
        difference.append(0)
        positions.append(each[6])
        continue
    parti_data = [each[5], each[12], each[13], each[14], each[-9], each[-8], each[-13], each[-12], each[-6], each[-5],
                 each[26], each[27], each[16], each[35], each[-10], each[19], each[-8], each[-1], each[-11], each[-4]
                 , each[-3], each[-2], each[28], each[29], each[30], each[31]]
    champion_datas = champion_datas[1:]

    rate = []
    for i in range(len(champion_datas)):
        rate.append(parti_data[i] - list(champion_datas[i].values())[0]) 
    difference.append(rate)

print(difference[0])


[0.5368421052631579, 10.284210526315789, 0.5789473684210522, -0.4526315789473685, 18132.389473684212, -5837.768421052631, 27092.8, 18800.673684210524, 1259.4421052631578, -1197.9052631578948, 6625.473684210527, 6476.968421052632, 3.2736842105263158, -1.7894736842105265, 36096.58947368421, -0.18947368421052632, 4556.557894736842, 17.821052631578947, 0.0, -0.7263157894736842, -2.389473684210526, -4.621052631578948, 2.336842105263158, 173.31578947368416, -0.4736842105263157, 14.684210526315837]


It is clear that the gap bewteen elements is very large. So we need to normalize them into 0-1 range.

In [68]:
# Normalization
difference = normalization(difference)

In [69]:
# Calculate the first summoner's score according to our rules and the current database
summoner_score = calculate_score(difference, positions, 1)
print(summoner_score)

17.949152037695352


We will optimize our scoring system in the future with more data. 

## 8. Champion Clustering and Recommendation System

### 8.1 Champion Clustering

Different champions have their different properties and will fall into different groups for players to choose from for different positions. So we'd like to first explore the relationship between champions. This is a unsupervised clustering task, and we use Agglomerative algorithm to get better results. Agglomerative clustering is a kind of hierarchical clustering in a bottom-up way. It starts with every data point as a single cluster, at every step, two most similar clusters will be merged into a new cluster, and this step is repeated until we reach the number of cluster we want or all data points come to a one huge cluster. This clutering should be better than Kmeans++ which might suffer from random starting points. The rough flow diagram is as follows:

<img src="agg.png">

We have 27 features for every data points including chosen_rate, wwin_rate, kills, deaths, assists, physical_damage_to, physical_damage_taken, magic_damage_to, magic_damage_taken, true_damage_to, true_damage_taken, gold_earned, gold_spent, tower_kill, minions_kill, minions_kill_enemy, first_blood, total_heal, time_CCing, sight_ward, vision_ward, wards_killed, wards_placed, largest_killing_spree, largest_critical_strike, largest_multi_kill, longest_living_time. The evaluation metrics for distance between different data points is Euclidean distance. And we expect to cluster all the 140 champions into 6 clusters. The clustering code is as follows:

In [19]:
from sklearn.cluster import AgglomerativeClustering

def clusterChampions(cluster_num,conn,store_type=0):
    # 1. Get all champions with their match statistic and forms the features that're fed into the cluster
    champion = get_data(conn, 'Champion')
    matches=get_data(conn, 'Participants')
    all_champion_datas = []
    for c in champion:
        all_champion_datas.append(champion_data(c[0], matches))
    # from all_champion_data integrate a champion's all data into a single average one:
    all_features=[]
    fea_num=len(all_champion_datas[0]['JUNGLE'])
    j=0
    for cham_data in all_champion_datas:
        current_cham_feature=np.zeros(fea_num)
        for cur_pos in cham_data:
            current_pos_data=cham_data[cur_pos]
            for i,fea in enumerate(current_pos_data):
                current_cham_feature[i]+=list(fea.values())[0]
        lanes=len(cham_data)
        current_cham_feature=list(current_cham_feature/lanes)
        all_features.append(current_cham_feature)
        j+=1
    # 2. Begin Agglomerative
    agg=AgglomerativeClustering(cluster_num,linkage='complete')
    agg.fit(all_features)
    # 3. fetch cluters with data
    clusters={}
    for i, label in enumerate(agg.labels_):
        if label not in clusters:
            clusters[int(label)]=[champion[i][store_type]]
        else:
            clusters[int(label)].append(champion[i][store_type])
    return clusters

Let's try it:

In [20]:
conn = sqlite3.connect('lol.db')
clusters=clusterChampions(6,conn,1)
count=0
for c in clusters:
    print('cluster',c,clusters[c])
    count+=len(clusters[c])
print('clustered champion num',count)
print('cluster number:', len(clusters))


cluster 5 ['Aatrox', 'Illaoi', 'Nocturne', "Rek'Sai", 'Rengar', 'Vi', 'Xin Zhao']
cluster 2 ['Ahri', 'Alistar', 'Annie', 'Azir', 'Bard', 'Brand', 'Braum', 'Corki', 'Diana', 'Ezreal', 'Fizz', 'Heimerdinger', 'Ivern', 'Janna', 'Karma', 'Kassadin', 'Kennen', "Kog'Maw", 'LeBlanc', 'Lissandra', 'Lulu', 'Lux', 'Malphite', 'Malzahar', 'Nami', 'Nautilus', 'Orianna', 'Ornn', 'Rakan', 'Ryze', 'Shaco', 'Sona', 'Teemo', 'Thresh', 'Twisted Fate', 'Varus', 'Veigar', "Vel'Koz", 'Xerath', 'Zoe', 'Zyra']
cluster 0 ['Akali', 'Amumu', 'Anivia', 'Aurelion Sol', 'Cassiopeia', "Cho'Gath", 'Ekko', 'Elise', 'Evelynn', 'Fiddlesticks', 'Galio', 'Gragas', 'Katarina', 'Kayle', 'Maokai', 'Mordekaiser', 'Morgana', 'Nidalee', 'Nunu', 'Rammus', 'Shyvana', 'Singed', 'Soraka', 'Swain', 'Tahm Kench', 'Taric', 'Vladimir', 'Zac', 'Zilean']
cluster 1 ['Ashe', 'Caitlyn', 'Camille', 'Darius', 'Draven', 'Fiora', 'Gangplank', 'Garen', 'Gnar', 'Graves', 'Irelia', 'Jarvan IV', 'Jax', 'Jayce', 'Jhin', 'Jinx', "Kai'Sa", 'Kalista',

The results above seems reasonable. Champions that usually go in the bottom line as carriers are clustered together, such as Ashe, Caitlyn and Ezreal; Support champions are in a same cluster like Bard, LuLu and Nami; Champions that take the role of assasin like Diana, Evelynn and Kassadin are grouped together; So do Tank champions, which contains Cho'Gath, Dr. Mundo and Nunu, and Fighter champions like Aatrox, Darius and Fiora. This result is overall satisfying compared with the players' common choice, but some small mistakes also exist like.. We expect to get more accurate result with more data and a decrease of input features which we'll try in the future.

### 8.2 Champion Recommendataion System

One biggest concern that most players (summoners) have is when they don't want to play the same champions all the time and want to try more new ones but are worried to fail the match, which champions should they choose considering their own habits and preference? Combined with the clustering results and all the data we have, we decide to make a personalized champion recommendataion system to make personalized recommendataion for every player considering his own preference.  

In order to personalize the result, for a given player, we'll first retrieve all his match history, and obtain the times he uses each champion at different roles, as well as that champion's cluster. Then we sort the results at each role according to the use times of each champions (we don't care other data like the kda or whether the player win this match or not, because the important thing is what he want to play not what he need to play :) ). Then we'll get all the champions with their clusters and sort them in the order of winning rate for each role. And we'll begin to recommend champions at each role for that player: for each role, if this player hasn't tried this role before, then we'll just recommend the champions with top winning rate; if he has played this role, then we'll recommend those from our sorted champion-winning_rate results that:  
(1). Don't appear in the player's top 10 use time history at this role  
(2). Recommend those whose cluster is the same as the player's most commonly used cluster first, if the count is not enough, then we'll recommend the second popular cluster and so on.  
(3). We'll filter the champions at one role whose chosen_rate is really low(<0.0005) to ensure the result is reliable.  
We'll keep outputing recommendataion until we've reached the total recommendation number we want for each positions. 

The code snippet is as follows:

Some helpful functions:

In [21]:
"""
Input:
cluster: the trained total cluster
champion: the champion that we'd like to assign cluster
Return:
int: The cluster id of this champion
"""
def getChampionCluster(cluster,champion):
    for cid in cluster:
        if champion in cluster[cid]:
            return cid
    raise BaseException("Champion {} not found!".format(champion))


"""
Get champion list on one position ranked by winning rate and corresponding cluster type 
"""
def getChampionDataForLane(conn,pos,cluster):
    champions = get_data(conn, 'Champion')
    # 1. get this summoner's match histories, only need to know the times he use each champion, the cluster and lane of that champion
    participants = get_data(conn, 'Participants')
    champion_win=[]
    cham_cluster=[]
    for c in champions:
        cur_data=champion_data(c[0], participants)
        if pos not in cur_data:
            champion_win.append(0)
        else:
            target=cur_data[pos]
            winrate=float(target[1]['win_rate'])
            choserate=float(target[0]['chosen_rate'])
            if choserate>=0.0005:
                champion_win.append(winrate)
            else:
                champion_win.append(0)
    cham_ids=np.argsort(np.array(champion_win))[::-1]
    for id in cham_ids:
        cham_cluster.append(getChampionCluster(cluster,id))
    return cham_ids,cham_cluster


# print recommend result
def printRecommendResult(res):
    result=[]
    for pos in res:
        heros=res[pos]
        num=len(heros)
        output="Top " + str(num) + "recommended champions in "+pos+":\t"+", ".join(heros)
        result.append(output)
    return '\n'.join(result)


The recommendation system:

In [22]:
"""
Given a summoner, give him personalized recommended heros for different positions (each top n)
Return: a dict with position as key champios as values 
"""
def RecommendChampionsForUser(summoner,recommend_num,conn):

    champions=get_data(conn, 'Champion')
    # 1. get this summoner's match histories, only need to know the times he use each champion, the cluster and lane of that champion
    participants=get_data(conn, 'Participants')
    all_champion_datas = []
    for c in champions:
        all_champion_datas.append(champion_data(c[0], participants))

    clusters=clusterChampions(6,conn)
    summoner_matches={} # key is lane, values are {champion: used_time}
    # get count of all heros in each lane first
    for p in participants:
        if p[1]==summoner:
            cur_pos=get_position(p)
            if cur_pos!=None:
                champion=int(p[3])
                if cur_pos not in summoner_matches:
                    summoner_matches[cur_pos]=np.zeros(len(champions))
                    summoner_matches[cur_pos][champion]+=1
                else:
                    summoner_matches[cur_pos][champion]+=1
    pos_top10_used={}
    # take the top 10 most commonly used heros for each lanes
    for pos in summoner_matches:
        chams=np.argsort(summoner_matches[pos])[::-1][:10]
        #print('original champion order',summoner_matches[pos])
        #print('top 10',chams)
        cluster=np.zeros(6)
        for cham_id in chams:
            if summoner_matches[pos][cham_id]!=0:
                cluster[getChampionCluster(clusters,cham_id)]+=1
        #print('original cluster num',cluster)
        cluster=np.argsort(cluster)[::-1]
        #print('ordered cluster',cluster)
        pos_top10_used[pos]=(chams,cluster)
    res={'JUNGLE':[],'TOP_LANE':[],'MID_LANE':[],'DUO_CARRY':[],'DUO_SUPPORT':[]}
    """
    recommend
    """
    for pos in res:
        pos_cham_by_winrate, pos_cham_cluster = getChampionDataForLane(conn, pos, clusters)
        if pos not in pos_top10_used:
            for i,hero in enumerate(pos_cham_by_winrate):
                if i==recommend_num:
                    break
                res[pos].append(champions[hero][1])
        else:
            topheros,topcluster=pos_top10_used[pos]
            topused=set(topheros)
            #  1. Got all champions ordered by win rate in that lane
            count=0
            # in case one cluster num not enough
            for prefered_cluster in topcluster:
                found = False
                for idx, champ in enumerate(pos_cham_by_winrate):
                    if count==recommend_num:
                        found=True
                        break
                    # don't recommend commonly used heros
                    if champ in topused:
                        continue
                    if pos_cham_cluster[idx]==prefered_cluster:
                        res[pos].append(champions[champ][1])
                        count+=1
                if found:
                    break
    return res

Let's try to run it:

In [23]:
conn = sqlite3.connect('lol.db')
print(printRecommendResult(RecommendChampionsForUser(82119724,5,conn)))

Top 5recommended champions in JUNGLE:	Yasuo, Dr. Mundo, Tryndamere, Sion, Aatrox
Top 5recommended champions in TOP_LANE:	Mordekaiser, Ekko, Singed, Yasuo, Sion
Top 5recommended champions in MID_LANE:	Riven, Karthus, Jayce, Sion, Kayle
Top 5recommended champions in DUO_CARRY:	Kai'Sa, Sivir, Ashe, Tristana, Ezreal
Top 5recommended champions in DUO_SUPPORT:	Malphite, Teemo, Nautilus, Fiddlesticks, Vel'Koz


```python
```

This result is reasonable, but we expect to get more reliable result if we can get more data in the future!

## Conclusion

## References:
[1]https://www.statista.com/topics/1551/online-gaming/

[2]https://developer.riotgames.com/

[3]https://cassiopeia.readthedocs.io/en/latest/