# League of Legends, does the ban stage matter?

**Name(s)**: Landon Marchetti and Lyn Mansfield

**Website Link**: (your website link)

In [9]:
import pandas as pd
import numpy as np
from pathlib import Path
import os
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'iframe'

pd.options.plotting.backend = 'plotly'
from dsc80_utils import * # Feel free to uncomment and use this.
from data_cleaner import *

## Step 1: Introduction

Provide an introduction to your dataset, and clearly state the one question your project is centered around. Why should readers of your website care about the dataset and your question specifically? Report the number of rows in the dataset, the names of the columns that are relevant to your question, and descriptions of those relevant columns.

Which legends are the most meta-defining? Difference in win rate between when it's banned vs when it's not

Which characters are the most banned? Which characters are the best?

What archetype of character's performance matters the most? Which archetype get banned most?

Central Question: "How do character bans and archetypes influence competitive performance and define the meta in League of Legends?"

Paragraph 1(General intro): League of Legends is one of the most famous multiplayer online battle arena games developed by Riot Games on October 27, 2009. With close to 120 million monthly active users and 40 million daily users worldwide, the game has become one of the most dominant games in the MOBA world. Well be working with a dataset developed by Oracle’s Elixir, which contains match data from professional League of Legends esports games played in 2022. It’s filled with detailed statistics from real esports games, giving us insight into how players perform, how teams operate, and what affects the outcome of matches.



Paragraph 2(question intro): One of the main aspects of League of Legends professional play is the initial ban phase. In professional League of Legends, the ban phase involves both teams selecting champions to be removed from the game's pool, preventing them from being picked by either side. This phase alternates between each team banning a champion, with each team having a total of 10 bans. Teams use the ban phase to prevent the enemy team from picking strong or counter-picking champions. 

The central question we will be looking at further is: How do character bans and archetypes influence competitive performance and define the meta in League of Legends? This question aims to explore how the types of champions that are banned or picked—especially based on their archetypes—affect individual and team performance in professional League of Legends matches.


Paragraph 3(column intro):
The Oracle’s Elixir dataset provides an extensive number of columns, as well as 60180 rows that consist of a group of 10 rows, 5 for each team's players and then 2 afterwards of the team stats for each team. The columns the most relevant to our central question are the following:
game id: This column is a code that is used as a unique identifier for each match that is played. 
damagemitigatedperminute: This column refers to the average of the amount of damage that is reduced or blocked by defensive abilities, shields, or other mechanisms by a player or team per minute.
player id: This column is a code that is used as a unique identifier for each player that has participated in 1 or more matches. 
team id: This column is a code that is used as a unique identifier for each team that has participated in 1 or more matches. 
champion: A champion is a playable character that players control during the game. There are 170 champions with unique abilities, stats, and roles.
ban1: This column represents the team’s first champion that they picked to not be able to be used for the match. 
ban2: This column represents the team’s second champion that they picked to not be able to be used for the match. 
ban3: This column represents the team’s third champion that they picked to not be able to be used for the match. 
ban4: This column represents the team’s fourth champion that they picked to not be able to be used for the match. 
ban5: This column represents the team’s fifth champion that they picked to not be able to be used for the match. 
result: This column is either Won or Lost and represents whether or not the individual player or team won the match. 
kills: This column gives a quantity of the number of enemy champions the player, or collective team, defeated in the match. 
deaths: This column gives a quantity of the amount of times the enemy team defeats the players, or collective teams champions in a match. 
assists: This column gives us a quantity of the amount of times the player, or collective team, contributed to defeating an enemy champion while not getting the kill themselves. 
dpm: This column shows us the average damage a player, or collective team, does to an enemy champion per minute. 
earned gold: This column shows us the amount of gold a champion earned, which is earned by killing or assisting in killing enemy champions. Champions that have killed many enemies without dying are worth more gold.
position: This column tells us the archetype of the champion that the player has chosen, can be Top, Jungle, Mid, Bot (ADC), and Support. 
damaged share: This column refers to the percentage of total damage a champion deals to enemy champions within a team fight or over the course of a game. 


## Step 2: Data Cleaning and Exploratory Data Analysis

In [10]:
dataset_path = Path("LOL.csv")
# /data/2025_LoL_clean_stats.csv
df = pd.read_csv(dataset_path, dtype={2: str}) 
df = clean_lol_data(df)[0]

## Bivariate Analysis

## Step 3: Assessment of Missingness

In [14]:
df['missing'] = df['teamid'].isna()

actual_missing = df[df['missing'] == True]['league'].value_counts(normalize=True)
actual_present = df[df['missing'] == False]['league'].value_counts(normalize=True)


combined = pd.concat([actual_missing, actual_present], axis=1).fillna(0)
combined.columns = ['missing', 'present']

observed = 0.5 * np.sum(np.abs(combined['missing'] - combined['present']))

n_permutations = 1000
tvd_stats = []

for _ in range(n_permutations):
    
    shuffled = np.random.permutation(df['missing'])
    shuffled_df = df.copy()
    shuffled_df['shuffled_missing'] = shuffled

    dist_missing = shuffled_df[shuffled_df['shuffled_missing'] == True ]['league'].value_counts(normalize=True)
    dist_present = shuffled_df[shuffled_df['shuffled_missing'] == False ]['league'].value_counts(normalize=True)
    
    comb = pd.concat([dist_missing, dist_present], axis=1).fillna(0)
    comb.columns = ['missing', 'present']
    stat = 0.5 * np.sum(np.abs(comb['missing'] - comb['present']))
    tvd_stats.append(stat)


p_val = np.mean(np.array(tvd_stats) >= observed)
p_val

np.float64(0.0)

In [17]:
combined.plot(kind='barh', title='League by Missingness of teamid', barmode='group')

## Missingness Dependency, not dependent

In [15]:
df['missing'] = df['teamid'].isna()

actual_missing_2 = df[df['missing'] == True]['side'].value_counts(normalize=True)
actual_present_2 = df[df['missing'] == False]['side'].value_counts(normalize=True)


combined_2 = pd.concat([actual_missing_2, actual_present_2], axis=1).fillna(0)
combined_2.columns = ['missing', 'present']

observed_2 = 0.5 * np.sum(np.abs(combined_2['missing'] - combined_2['present']))

n_permutations = 1000
tvd_stats_2 = []

for _ in range(n_permutations):
    
    shuffled_2 = np.random.permutation(df['missing'])
    shuffled_df_2 = df.copy()
    shuffled_df_2['shuffled_missing'] = shuffled_2

    dist_missing_2 = shuffled_df_2[shuffled_df_2['shuffled_missing'] == True ]['side'].value_counts(normalize=True)
    dist_present_2 = shuffled_df_2[shuffled_df_2['shuffled_missing'] == False ]['side'].value_counts(normalize=True)
    
    comb_2 = pd.concat([dist_missing_2, dist_present_2], axis=1).fillna(0)
    comb_2.columns = ['missing', 'present']
    stat_2 = 0.5 * np.sum(np.abs(comb_2['missing'] - comb_2['present']))
    tvd_stats_2.append(stat_2)


p_val_2 = np.mean(np.array(tvd_stats_2) >= observed_2)
p_val_2

np.float64(0.13)

In [18]:
combined_2.plot(kind='barh', title='League by Missingness of teamid', barmode='group')

## Step 4: Hypothesis Testing

In [None]:
# TODO

## Step 5: Framing a Prediction Problem

In [None]:
# will start tommorrow

## Step 6: Baseline Model

In [None]:
# TODO

## Step 7: Final Model

In [None]:
# TODO

## Step 8: Fairness Analysis

In [None]:
# TODO