# Summoner Insights: Exploring League of Legends Through Data

**Name(s)**: Alan Xia

**Website Link**: (your website link)

In [2]:
import pandas as pd
import numpy as np
from pathlib import Path


import plotly.express as px
pd.options.plotting.backend = 'plotly'

from dsc80_utils import * 

## Step 1: Introduction

### **Introduction**

League of Legends (LoL) is one of the world's most watched esports, bringing in millions of viewers across the globe every year. In addition, it has also consistently remained as one of the most popular games ever since its initial release in 2009. Each match is a thrilling 5-on-5 battle on Summoner's Rift, a map with three lanes where each team tries to destroy the opposing Nexus in order to win. At the start of every game, players draft from a pool of over 150 unique champions and counting. League of Legends is deeply rooted in strategy and teamwork — even before the game begins, each team has to strategically balance their champion picks across 5 different roles (top lane, jungle, mid lane, bot lane, and support) to not only effectively synergize with each other but also counter the enemy team. When the game starts, players must farm minions and monsters to earn gold, which they can spend on items that upgrade their champion's power. Gaining control over neutral objectives on the map, such as dragons, Baron Nashor, and Rift Herald, can tremendously swing the momentum of the game in favor of one team.

League of Legends' competitive scene spans multiple regions, including LCK in Korea, LPL in China, LEC in Europe, LCS in America, and many more. All of these leagues eventually meet each year in the World Championship, which is an esports event watched by over 100 million peak viewers. Each professional match creates a wealth of statistics, including kills, gold earned, objective control, vision score, and more, all of which influence coaches and analysts in shaping in-game tactics. By analyzing match-level statistics, we can attempt to deduce which in-game factors most strongly influence victory over the other team. Understanding these factors holds immense potential for allowing teams to refine their approach to the game, ranging from champion drafting to map control, and ultimately gain a competitive edge through data-driven insights.

In this project, we analyze a season's worth of professional match data from Oracle's Elixir, which contains a total of 150588 observations. Although the dataset includes a plethora of different in-game statistics, we will narrow our focus to only a subset of the available columns. Here is a brief description of the relevant columns for this analysis:

- `gameid`: Unique identifier for each match, used to distinguish rows belonging to different games.
- `league`: The regional league or tournament in which the match was played (e.g. LCK, LPL, LEC, LCS, etc.).
- `position`: Role or lane played by the team or player (`top`, `jng`, `mid`, `bot`, `sup`, or `team` for aggregated data).
- `kills`: Number of enemy champions eliminated during the match.
- `assists`: Number of assists credited by participating in kills without getting the final blow.
- `deaths`: Number of times player (or team) was eliminated by enemy champions.
- `cspm`: Average creep score (minions killed) per minute.
- `monsterkills`: Total neutral monsters (jungle camps, monsters like Dragon or Baron) slain.
- `minionkills`: Total minions killed over the course of the game.
- `damagetochampions`: Total damage dealt to enemy champions during the match.
- `firstbaron`: Binary indicator (`1`/`0`) for whether the team secured the first Baron Nashor kill in the  match.
- `dragons`: Total number of dragon kills secured during the match. 
- `result`: Match outcome for the team (`1` = win; `0` = loss).

The primary objective of this project will be to predict match outcome (`result`) using the in-game statistics listed above. Specifically, we will examine which of the following factors are the strongest predictors of a team's chance to win:
- individual performance (`kills`, `deaths`, `assists`, `cspm`, `damagetochampions`)
- objective control (`monsterkills`, `minionkills`, `firstbaron`, `dragons`)

To accomplish this, we will:
1. Clean and explore these columns to understand their distributions and pairwise relationships with `result`.
2. Train a classification model using these features to predict `result`.
3. Assess the importance of each feature to identify which in-game factors are the best predictors of victory.

By the end of this analysis, we'll be able to determine how much each metric correlates with winning as well as which ones carry the most weight when predicting match outcomes in professional League of Legends scene.


## Step 2: Data Cleaning and Exploratory Data Analysis

### **Data Cleaning**
To begin, we should first filter the dataset to keep only the relevant columns. Since we should model at the team level, we should drop any rows where `position == "team"`, as these represent aggregated team summaries instead of individual players.

In [20]:
df = pd.read_csv('data/data.csv', low_memory = False)
df = df[['gameid', 'league', 'position', 'kills', 'assists', 'deaths', 'damagetochampions', 'cspm', 'monsterkills', 'minionkills', 'firstbaron', 'dragons', 'gamelength', 'result']]
df = df[df['position'] == 'team'].reset_index(drop=True)
df

Unnamed: 0,gameid,league,position,kills,...,firstbaron,dragons,gamelength,result
0,ESPORTSTMNT01_2690210,LCKC,team,9,...,0.0,1.0,1713,0
1,ESPORTSTMNT01_2690210,LCKC,team,19,...,0.0,3.0,1713,1
2,ESPORTSTMNT01_2690219,LCKC,team,3,...,0.0,1.0,2114,0
...,...,...,...,...,...,...,...,...,...
25095,9687-9687_game_4,DCup,team,7,...,,1.0,2111,0
25096,9687-9687_game_5,DCup,team,8,...,,0.0,1778,0
25097,9687-9687_game_5,DCup,team,21,...,,4.0,1778,1


## Step 3: Assessment of Missingness

In [4]:
# TODO

## Step 4: Hypothesis Testing

In [5]:
# TODO

## Step 5: Framing a Prediction Problem

In [6]:
# TODO

## Step 6: Baseline Model

In [7]:
# TODO

## Step 7: Final Model

In [8]:
# TODO

## Step 8: Fairness Analysis

In [9]:
# TODO