In this article, the new way to gather official matches data will be presented, as well as some analysis oriented toward players.

You can find the notebook with all the code and the dataset on Github : https://github.com/HextechLab/Worlds2020

## Gathering LoLEsports data

Following the shutdown of some important API from the lolesports website that allowed us to get hashes to gather data from official matches, the solution is to rely on Leaguepedia which has everything we need.

Here is an example of how to do it in Python.

We will use the [leaguepdia-parser](https://pypi.org/project/leaguepedia-parser/) package to gather what we need from Leaguepedia.

### Find the tournament

In [1]:
import leaguepedia_parser as lp

lp.get_regions()

['',
 'Africa',
 'Asia',
 'Brazil',
 'China',
 'CIS',
 'Europe',
 'International',
 'Japan',
 'Korea',
 'LAN',
 'LAS',
 'Latin America',
 'LMS',
 'MENA',
 'North America',
 'Oceania',
 'PCS',
 'SEA',
 'Turkey',
 'Unknown',
 'Vietnam',
 'Wildcard']

Pick your favorite region to get the list of tournaments in this region : 

In [2]:
tournaments = lp.get_tournaments('Europe', year=2020)
[t["name"] for t in tournaments]

['LEC 2020 Spring',
 'LEC 2020 Spring Playoffs',
 'LEC 2020 Summer',
 'LEC 2020 Summer Playoffs',
 'EU Face-Off 2020']

We will create a custom method for the leaguepedia_parser to get only the information we need : 

In [3]:
import types

def get_games_hashes(self, tournament_name=None, **kwargs):
    """
    Returns the list server, gameId and hashes of games played in a tournament.

    :param tournament_name
                Name of the tournament, which can be gotten from get_tournaments().
    :return:
                A list of game dictionaries.
    """
    games = self._cargoquery(tables='ScoreboardGames',
                             fields='Tournament = tournament, '
                                    'MatchHistory = match_history_url, ',
                             where="ScoreboardGames.Tournament='{}'".format(tournament_name),
                             order_by="ScoreboardGames.DateTime_UTC",
                             **kwargs)
    data = [
        {
            "tournament":game["tournament"],
            "server":game["match_history_url"].split("/")[5],
            "gameId":game["match_history_url"].split("/")[6].split("?gameHash=")[0],
            "hash":game["match_history_url"].split("/")[6].split("?gameHash=")[1],
        }
        for game in games
    ]
    return data

lp.get_games_hashes = types.MethodType(get_games_hashes, lp)

### Getting the hashes

Getting the hashes for LEC 2020 Summer : 

In [4]:
games = lp.get_games_hashes(tournaments[3]['name'])
games[:3]

[{'tournament': 'LEC 2020 Summer',
  'server': 'ESPORTSTMNT04',
  'gameId': '1230688',
  'hash': '25cb7e1966cbcdb5'},
 {'tournament': 'LEC 2020 Summer',
  'server': 'ESPORTSTMNT04',
  'gameId': '1220706',
  'hash': 'c3f45e5bb2a65c80'},
 {'tournament': 'LEC 2020 Summer',
  'server': 'ESPORTSTMNT04',
  'gameId': '1220728',
  'hash': '4bfd5c00f9292be3'}]

Requesting the match data from all those games : 

In [7]:
import requests

base_match_history_stats_url = "https://acs.leagueoflegends.com/v1/stats/game/{}/{}?gameHash={}"
base_match_history_stats_timeline_url = "https://acs.leagueoflegends.com/v1/stats/game/{}/{}/timeline?gameHash={}"

all_games_data = []

for g in games:
    url = base_match_history_stats_url.format(g["server"],g["gameId"],g["hash"])
    timeline_url = base_match_history_stats_timeline_url.format(g["server"],g["gameId"],g["hash"])
    
    game_data = requests.get(url).json()
    game_data["timeline"] = requests.get(timeline_url).json()
    
    all_games_data.append(game_data)

If you get rate limited (errors 429), follow this guide : https://www.hextechdocs.dev/lol/esportsapi/13.esports-match-data#in-need-of-cookies

# Analysis

Once the data is ehre, let's crunch some numbers. First of all, you'll need to select a few specific pieces of information out of the whole match data. These functions will help for that : 

In [9]:
# Team stats
# Get each team total damages dealt to champions
def get_team_damages_to_champions(g):
    t_dam = {100:0,200:0}
    for p in g["participants"]:
        t_dam[p["teamId"]] += p["stats"]["totalDamageDealtToChampions"]
    return t_dam

# Get the total kills of the team, as this is only available player by player
def get_team_kills(g):
    t_kills = {100:0,200:0}
    for p in g["participants"]:
        t_kills[p["teamId"]] += p["stats"]["kills"]
    return t_kills

# Participant stats
# Get the Gold Advantage of a player at the 15th minute
def ga_at_15(g, pId):
    g_at_15 = [p["totalGold"] for p in g["timeline"]["frames"][15]["participantFrames"].values() if p["participantId"] == pId][0]
    opp_pId = (pId+5)%10 if not pId == 5 else 10
    opp_g_at_15 = [p["totalGold"] for p in g["timeline"]["frames"][15]["participantFrames"].values() if p["participantId"] == opp_pId][0]
    return g_at_15 - opp_g_at_15

# Get the CS Difference of a player at the 15th minute
def cs_diff_at_15(g, pId):
    cs_at_15 = [p["minionsKilled"]+p["jungleMinionsKilled"] for p in g["timeline"]["frames"][15]["participantFrames"].values() if p["participantId"] == pId][0]
    opp_pId = (pId+5)%10 if not pId == 5 else 10
    opp_cs_at_15 = [p["minionsKilled"]+p["jungleMinionsKilled"] for p in g["timeline"]["frames"][15]["participantFrames"].values() if p["participantId"] == opp_pId][0]
    return cs_at_15 - opp_cs_at_15

Loading ddragon module to translate championId to their name. Note :  you need nest_asyncio only if you run this in a Jupyter Notebook.

In [10]:
import nest_asyncio
nest_asyncio.apply()

import static_data
dd = static_data.ddragon()

  def noop(*args, **kwargs):  # type: ignore
  hosts = await asyncio.shield(self._resolve_host(
  hosts = await asyncio.shield(self._resolve_host(
  hosts = await asyncio.shield(self._resolve_host(
  hosts = await asyncio.shield(self._resolve_host(
  hosts = await asyncio.shield(self._resolve_host(
  hosts = await asyncio.shield(self._resolve_host(
  self._event = asyncio.Event(loop=loop)
  self._event = asyncio.Event(loop=loop)
  self._event = asyncio.Event(loop=loop)
  self._event = asyncio.Event(loop=loop)
  self._event = asyncio.Event(loop=loop)
  self._event = asyncio.Event(loop=loop)


Gather stats from each player in each game.

In [11]:
players_stats = []

for g in all_games_data:
    
    # Get the teams stats
    teams_damages_to_champions = get_team_damages_to_champions(g)
    teams_kills = get_team_kills(g)
    
    
    # Translate participantId to the player's name
    pId_to_name = {}
    for p in g["participantIdentities"]:
        pId_to_name[p["participantId"]] = p["player"]["summonerName"]
    
    for p in g["participants"]:
        players_stats.append({
            "name":pId_to_name[p["participantId"]],
            "champion":dd.getChampion(p["championId"]).name,
            "damage_share":p["stats"]["totalDamageDealtToChampions"] / teams_damages_to_champions[p["teamId"]],
            "kill_participation":(p["stats"]["kills"] + p["stats"]["assists"]) / teams_kills[p["teamId"]] if teams_kills[p["teamId"]] > 0 else 0,
            "cs_diff_at_15":cs_diff_at_15(g, p["participantId"]),
            "ga_at_15":ga_at_15(g, p["participantId"]),
            "kills":p["stats"]["kills"],
            "deaths":p["stats"]["deaths"],
            "assists":p["stats"]["assists"],
            "vision_score":p["stats"]["visionScore"],
            "damage_per_gold":p["stats"]["totalDamageDealtToChampions"] / p["stats"]["goldEarned"],
            "win":p["stats"]["win"]
        })

Put the stats into a pandas DataFrame.

In [12]:
import pandas as pd
df = pd.DataFrame(players_stats)
df.head()

Unnamed: 0,name,champion,damage_share,kill_participation,cs_diff_at_15,ga_at_15,kills,deaths,assists,vision_score,damage_per_gold,win
0,G2 Wunder,Mordekaiser,0.207249,0.454545,8,955,6,1,4,32,0.87716,True
1,G2 Jankos,Rek'Sai,0.103866,0.318182,10,-67,1,2,6,64,0.540224,True
2,G2 Caps,Zoe,0.184098,0.636364,-30,37,5,1,9,34,0.844762,True
3,G2 Perkz,Varus,0.421698,0.818182,37,1680,9,0,9,26,1.516857,True
4,G2 Mikyx,Tahm Kench,0.083089,0.545455,8,650,1,1,11,80,0.566123,True


Group the mean of each stat and the number of games they played in.

In [13]:
df_results = pd.concat([df.groupby("name").mean(), df.groupby("name").agg(Games=("kills","count"))], axis=1)
df_results

Unnamed: 0_level_0,damage_share,kill_participation,cs_diff_at_15,ga_at_15,kills,deaths,assists,vision_score,damage_per_gold,win,Games
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
FNC Bwipo,0.223127,0.593547,-9.722222,-194.111111,2.722222,3.833333,5.166667,32.777778,1.13696,0.5,18
FNC Hylissang,0.083543,0.59657,-5.888889,140.444444,0.833333,4.555556,7.055556,84.555556,0.667025,0.5,18
FNC Nemesis,0.247389,0.562223,-3.166667,-190.666667,2.722222,2.388889,4.611111,32.777778,1.105412,0.5,18
FNC Rekkles,0.258534,0.775209,5.777778,72.444444,3.055556,1.333333,6.333333,36.944444,1.107452,0.5,18
FNC Selfmade,0.187407,0.710503,11.444444,201.055556,3.0,2.444444,5.5,44.611111,0.961258,0.5,18
G2 Caps,0.293738,0.700959,2.333333,384.222222,4.777778,2.111111,5.277778,31.5,1.289828,0.611111,18
G2 Jankos,0.135096,0.701737,0.722222,4.722222,2.166667,2.666667,6.833333,50.777778,0.795349,0.611111,18
G2 Mikyx,0.085227,0.574378,8.388889,217.055556,0.777778,2.944444,7.388889,90.888889,0.647563,0.611111,18
G2 P1noy,0.147956,0.560606,-67.5,-1493.5,3.5,2.5,5.5,43.5,1.070342,0.5,2
G2 Perkz,0.279655,0.557734,-3.4375,-260.1875,3.875,2.5625,4.8125,37.0,1.195827,0.625,16


Then find out who is the best support

In [15]:
df_results.sort_values("vision_score", ascending=False)[:5]

Unnamed: 0_level_0,damage_share,kill_participation,cs_diff_at_15,ga_at_15,kills,deaths,assists,vision_score,damage_per_gold,win,Games
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
S04 Nukes,0.062412,0.79798,5.0,199.0,0.666667,2.666667,3.0,106.666667,0.403134,0.0,3
VIT Labrov,0.073434,0.792618,0.611111,-145.833333,0.388889,2.611111,8.222222,103.111111,0.552762,0.388889,18
MSF denyk,0.079974,0.631178,2.0,-363.0,0.8,4.6,6.6,98.8,0.739018,0.4,5
OG Jactroll,0.068711,0.688606,-5.555556,-293.555556,0.222222,3.777778,5.444444,97.222222,0.60841,0.222222,9
SK LIMIT,0.089615,0.711859,-8.055556,-138.944444,0.888889,2.5,7.833333,95.555556,0.69994,0.5,18


You can also filter out players that didn't played a lot

In [16]:
df_results[df_results["Games"] > 9]

Unnamed: 0_level_0,damage_share,kill_participation,cs_diff_at_15,ga_at_15,kills,deaths,assists,vision_score,damage_per_gold,win,Games
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
FNC Bwipo,0.223127,0.593547,-9.722222,-194.111111,2.722222,3.833333,5.166667,32.777778,1.13696,0.5,18
FNC Hylissang,0.083543,0.59657,-5.888889,140.444444,0.833333,4.555556,7.055556,84.555556,0.667025,0.5,18
FNC Nemesis,0.247389,0.562223,-3.166667,-190.666667,2.722222,2.388889,4.611111,32.777778,1.105412,0.5,18
FNC Rekkles,0.258534,0.775209,5.777778,72.444444,3.055556,1.333333,6.333333,36.944444,1.107452,0.5,18
FNC Selfmade,0.187407,0.710503,11.444444,201.055556,3.0,2.444444,5.5,44.611111,0.961258,0.5,18
G2 Caps,0.293738,0.700959,2.333333,384.222222,4.777778,2.111111,5.277778,31.5,1.289828,0.611111,18
G2 Jankos,0.135096,0.701737,0.722222,4.722222,2.166667,2.666667,6.833333,50.777778,0.795349,0.611111,18
G2 Mikyx,0.085227,0.574378,8.388889,217.055556,0.777778,2.944444,7.388889,90.888889,0.647563,0.611111,18
G2 Perkz,0.279655,0.557734,-3.4375,-260.1875,3.875,2.5625,4.8125,37.0,1.195827,0.625,16
G2 Wunder,0.220917,0.571255,12.277778,404.777778,2.555556,2.722222,5.055556,31.277778,1.105366,0.611111,18


You can also target a specific player and get stats on the champion he played

In [17]:
df[df["name"] == "G2 Caps"].groupby("champion").agg(
    played=("name","count"), 
    kill_participation=("kill_participation","mean"), 
    cs_diff_at_15=("cs_diff_at_15","mean"), 
    ga_at_15=("ga_at_15","mean"), 
    winrate=("win","mean")
)

Unnamed: 0_level_0,played,kill_participation,cs_diff_at_15,ga_at_15,winrate
champion,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Azir,1,0.833333,13.0,955.0,0.0
Cassiopeia,1,0.714286,37.0,128.0,0.0
Ekko,1,0.75,-2.0,38.0,1.0
Galio,1,1.0,-18.0,-1312.0,0.0
Kassadin,1,0.2,-4.0,-102.0,0.0
Kog'Maw,2,0.787879,-3.5,1119.5,0.5
LeBlanc,1,0.56,-19.0,-163.0,1.0
Orianna,1,0.777778,6.0,-307.0,0.0
Syndra,2,0.663534,4.5,428.5,1.0
Twisted Fate,2,0.760417,16.0,855.5,1.0


Now that you have the data and the keys to analyze esports data, make your own stats with more metrics, extend the games analyzed to Playoffs, and even to more regions like LCK and LPL. For other metric, I advise to take a look at this sheet made by LEC analyst that explains the metrics and how to compute them : https://docs.google.com/spreadsheets/d/1hBzgxIPpoBqinOoXCnPvpqUDz4Ja9Lmfm7tqgU3kHRw/edit#gid=1424953934

You may also want to look at last year article : https://hextechlab.com/2019/10/25/worlds2019/ <br /> The first part to gather the data is outdated, but the analysis on champions is still working and compatible with the data gathered earlier in this article.