This notebook is used to gather the data about the games played during the World 2018

To access to this data, we rely on the lolesports.com website, and some undocumented endpoints it uses. The data from these endpoints is quite messy and it can be tricky to get to what we want.

# 1 - Get the matches

We will get started in your data collection by finding all the matches of the tournament.

The first endpoint we will use is the one giving data about the tournaments. More precisely, tournament with leagueId 9 is the one about Worlds since 2015.

In [1]:
import requests, json

urlTournament="http://api.lolesports.com/api/v1/scheduleItems?leagueId=9"
r  = requests.get(urlTournament)
rawTournamentData = json.loads(r.text)

In [2]:
rawTournamentData.keys()

dict_keys(['teams', 'players', 'highlanderTournaments', 'scheduleItems', 'highlanderRecords'])

The response for this endpoint has multiple fields, and the one that interests us is the "highlanderTournaments" containing a list of all instances of the tournament, here the 4 Worlds championships since 2015

In [3]:
rawTournamentData["highlanderTournaments"][3].keys()

dict_keys(['platformIds', 'league', 'roles', 'queues', 'id', 'leagueId', 'liveMatches', 'startDate', 'rosters', 'published', 'leagueReference', 'endDate', 'gameIds', 'title', 'brackets', 'description', 'breakpoints'])

The "id" field of a tournament will be usefull later, we need to save it.

In [4]:
tournamentId = rawTournamentData['highlanderTournaments'][3]["id"]

Among a lot of data, the "brackets" field is the one we want. The tournament follows this structure : 
 - A tournament is a list of brackets, usually more or less independant. It can be a group from the group phase (each group has its own bracket) or a whole phase like the elimination (the tree with final, semi-finals...)
 - A bracket is a list of matches
 - A match is a confrontation between two teams, and handles the "Best Of" format (BO2, BO3, BO5...). So a match is a list of games.

From now, we can parse all games of all matches of all brackets. We will need two specific informations : the gameId for the match history, the gameId for lolesports which we will call gameUuid, and the matchId

In [5]:
#Get the bracket for Worlds 2018
brackets = rawTournamentData['highlanderTournaments'][3]["brackets"]

matches = []
games = {}

#Get the list of matches and each of their games
for bracketId in brackets:
    bracket = brackets[bracketId]
    
    for matchId in bracket["matches"]:
        match = bracket["matches"][matchId]
        
        for gameUuid in match['games']:
            
            game = match['games'][gameUuid]
            
            if 'gameId' in game:
                matches.append(matchId)
                games[gameUuid] = {"matchHistoryId":game['gameId'], "realm":game["gameRealm"]}

# 2 - Get the hashes

Now we have some gameId, we could use this : 

In [6]:
games[next(iter(games.keys()))]

{'matchHistoryId': '2002860116', 'realm': 'TRLH2'}

to go to the matchHistory website and have all we want by folowing this link : 

In [7]:
baseMatchHistoryUrl = "https://matchhistory.na.leagueoflegends.com/en/#match-details/{}/{}"
baseMatchHistoryUrl.format(games[next(iter(games.keys()))]["realm"],games[next(iter(games.keys()))]["matchHistoryId"])

'https://matchhistory.na.leagueoflegends.com/en/#match-details/TRLH2/2002860116'

However, the match history pages are protected by a hash.

We can obtain the hashes by using a second lolesports endpoint serving matches details, which will need the tournamentId we saved earlier

In [8]:
matchId = next(iter(games.keys()))
baseMatchUrl = "http://api.lolesports.com/api/v2/highlanderMatchDetails?tournamentId={}&matchId={}"
r  = requests.get(baseMatchUrl.format(tournamentId,matches[0]))
matchData = json.loads(r.text)

This give some data about the teams, the links to videos of the games, and more important for us, the games hashes. We will be able to link games of the match to their respective hash

In [9]:
for i in matchData["gameIdMappings"]:
    games[i["id"]]["hash"] = i["gameHash"]

Now repeat the process for every match

In [10]:
for matchId in matches:
    r  = requests.get(baseMatchUrl.format(tournamentId,matchId))
    matchData = json.loads(r.text)
    for i in matchData["gameIdMappings"]:
        games[i["id"]]["hash"] = i["gameHash"]

And access to the match history page

In [11]:
baseMatchHistoryUrl = "https://matchhistory.na.leagueoflegends.com/en/#match-details/{}/{}?gameHash={}"
baseMatchHistoryUrl.format(games[next(iter(games.keys()))]["realm"],games[next(iter(games.keys()))]["matchHistoryId"],games[next(iter(games.keys()))]["hash"])

'https://matchhistory.na.leagueoflegends.com/en/#match-details/TRLH2/2002860116?gameHash=b2247ea535b6dc2f'

# 3 - Get the games data

We now have access to the game data. Like the Riot Games API, game data is split in to endpoint, one for the general data, and one about the timeline data.

In [12]:
baseMatchHistoryStatsUrl = "https://acs.leagueoflegends.com/v1/stats/game/{}/{}?gameHash={}"
baseMatchHistoryStatsUrl.format(games[next(iter(games.keys()))]["realm"],games[next(iter(games.keys()))]["matchHistoryId"],games[next(iter(games.keys()))]["hash"])

'https://acs.leagueoflegends.com/v1/stats/game/TRLH2/2002860116?gameHash=b2247ea535b6dc2f'

In [13]:
baseMatchHistoryStatsUrl.format(games[next(iter(games.keys()))]["realm"],games[next(iter(games.keys()))]["matchHistoryId"]+"/timeline",games[next(iter(games.keys()))]["hash"])

'https://acs.leagueoflegends.com/v1/stats/game/TRLH2/2002860116/timeline?gameHash=b2247ea535b6dc2f'