# League of Legends Data Analysis Tutorial

*Andrew Sarama, Elisabeth Jang*  
*May 15, 2022*  
*University of Maryland, College Park*  
*Department of Computer Science and Mathematics*  

## Brief Introduction

League of Legends is a team-based 5v5 competitive Esport. Various data from previously played matches is saved to the LoL database. However, access to the database needs to be manually authenticated by the Riot Games Development Team. Luckily, authenticated API users have downloaded relevant data from thousands of games and uploaded the files for public access. This tutorial will be an effective introduction to analyzing LoL match data regardless of the user's authentication level.


### Important Game Mechanics

*Feel free to skip if you are already familiar with basic LoL gameplay*


#### Primary Game Objectives
The LoL map (also known as "The Rift") has 2 teams, *red* and *blue*. The goal of the game is to destroy the opposing team's *nexus*, located at the center of the opposing team's *base*. The nexus is protected by 4 layers of *towers*, which are non-regenerating defense turrets preventing players from simply walking into the enemy base. All 4 layers of towers must be penetrated before the nexus becomes attackable. 

#### Champions
Before the start of any match, all 10 players choose *champions* from a pool of ~150 playable characters. Champions can attack opposing champions and towers. Champions that die in combat will respawn in their base. Note that the only way to win is to destroy the opposing nexus--killing the enemy champions simply prevents them from protecting their towers and nexus. Each player gets to pick and ban exactly 1 champion.

#### Secondary Game Objectives
There are 3 secondary objectives on The Rift: killing the *Dragon*, killing the *Rift Herald*, and killing the *Baron*. All of these objectives spawn in set times and locations on The Rift, and the team that completes the objectives first gains bonuses that make destroying the opposing team's base easier.

## Data Analysis Opportunities

Using the data that is stored for LoL matches and some data analysis techniques, we can answer some of the following interesting questions (and many, many more):

1. Is the team that gets the "first blood" (first champion kill in the match) more likely to win? How good of a predictor is the first blood for the outcome of the game?
2. How does completing secondary objectives impact a team's chances of winning? Is killing the dragon a waste of player's time, or is it a predictor of success?
3. What is the relationship between a champion's win rate, pick rate, and ban rate?
4. Is focusing more on secondary objectives than primary objectives a viable strategy?

## Data Collection and Parsing

Without a private Riot Games development key, finding LoL match data with very specific paramaters is challenging. Luckily, if we want to analyze a large quantity of general matches, many such public datasets exist. This tutorial will use a free-to-download dataset containing ~50,000 matches played on the European Server. See [1] under *Resources* for a download link.

### Importing Necessary Libraries

The data will be stored in our Python code using the *pandas* library. The first step will be to import the library itself:

In [1]:
import pandas as pd

The data from this specific source has 4 files: games.csv (contains the game data) and 3 .json files for converting champion/spell IDs to their names. Let's first get the data from the games into a pandas DataFrame and observe what other data parsing needs to take place.

In [11]:
games = pd.read_csv("/home/jovyan/notebooks/Data/games.csv")

# there are a very large number of data columns, so let's look at the names of the first 11
display_cols = 11
i = 1
for col in games.columns:
    if i <= display_cols:
        display(col)
        i += 1

'gameId'

'creationTime'

'gameDuration'

'seasonId'

'winner'

'firstBlood'

'firstTower'

'firstInhibitor'

'firstBaron'

'firstDragon'

'firstRiftHerald'

These columns have very useful information about the events that took place in the match. From the description of the dataset at link [1], the 2 teams are given numerical identifiers 1 and 2. For all columns that show the 'first' (objective), they represent which team completed the objective first. However, not all objectives are necessary to win a game, so identifier 0 implies that neither team completed this objective.

Looking at the .csv file, there are a lot of columns that are unneccessary and are needlessly cluttering the data. Here we will drop all of the columns that will not be used throughout this tutorial:

In [12]:
games.drop(columns=['gameId', 'creationTime', 'seasonId'], inplace = True)

### Using JSON to Clean Up

The data

## Resources

[1] https://www.kaggle.com/datasets/datasnaek/league-of-legends?resource=download