# League of Legends: Exploring the Relationship Between Kills and Win Rate

## Introduction to League of Legends

Since its release in 2009 by Riot Games, League of Legends has become a massively popular multiplayer online battle arena (MOBA) game. It features intense team-based gameplay where two teams of five players compete to destroy the opposing team's Nexus, the main structure. Each player takes control of a unique Champion, each with their own distinct abilities and playstyle, adding depth and strategic complexity to the game. But League of Legends is more than just a game—it has been a driving force behind the meteoric rise of competitive esports.

Esports has revolutionized the world of gaming, transforming it into a realm of professional competition with massive prize pools and devoted fan bases. League of Legends has been at the forefront of this esports revolution, attracting millions of viewers worldwide and offering astonishing prize pools for its tournaments. The League of Legends World Championship, for example, boasted a staggering prize pool of $2,225,000 USD in 2022, solidifying the game's position as a cornerstone of the esports industry.

## Goal of the Project

Our project's goal is to discover meaningful relationships that can help us predict match outcomes in League of Legends. Specifically, we aim to answer the question: "What is the connection between a player's average number of kills per game and their corresponding win rate?"

## Data Preparation

The data we will be using for this project is complied by Pedro Cesar on Kaggle and is scrapped from https://lol.fandom.com/ to get historical match data, historical player data and, historical champion data from the 2011 worlds to 2022 worlds. For this project we will be using the historical player data.

In [8]:
df = pd.read_csv("E:\ModelDiversity\data\players_stats.csv")

In [10]:
df.shape

(1283, 21)

## Variables Dictionary

We have 21 different variables in this dataset which are defined below:

`season` - the championship season. 1 = 2011 which is the first Worlds
`event` - Main stage event or play-in
`team` - team name
`player` - player name
`games_played` - number of games played in the tournament season
`wins` - number of wins throughout the tournament season
`loses` - number of loses throughout the tournament season
`win_rate` - how many successful wins throughout all the matches played throughout the tournament
`kills` - average number of kills per game throughout the tournament
`deaths` - average number of deaths per game throughout the tournament
`assists` - average number of assists per game throughout the tournament
`kill_death_assist_ratio` - KDA kills + assists divided by deaths
`creep_score` - average number of "minion's" that were killed per game throughout the tournament
`cs/min` - average creep score per minute per game throughout the tournament
`gold` - average gold earned per game throughout the tournament
`gold/min` - average gold earned per minute per game throughout the tournament
`damage` - average damage done per game throughout the tournament
`damage/min` - average damage done per minute per game throughout the tournament
`kill_participation` - average kills + assists / total team kills pre game throughout the tournament
`kill_share` - average kills / total team kills per game throughout the tournament
`gold_share` - average gold / total team gold per game throughout the tournament

## Packages

This project will utilize the following packages for data analysis, visualization, and model creation:

1. pandas - Data wrangling
2. numpy - Filling in missing data
3. seaborn - data visualization

In [None]:
import pandas as np
import numpy as np
import pandas as pd
import seaborn as sns

## Missing Data and Outliers

We can use pandas to get a summary of the missing values using the `.info()`

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1283 entries, 0 to 1282
Data columns (total 21 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   season                   1283 non-null   int64  
 1   event                    1283 non-null   object 
 2   team                     1283 non-null   object 
 3   player                   1283 non-null   object 
 4   games_played             1283 non-null   int64  
 5   wins                     1283 non-null   int64  
 6   loses                    1283 non-null   int64  
 7   win_rate                 1283 non-null   float64
 8   kills                    1283 non-null   float64
 9   deaths                   1283 non-null   float64
 10  assists                  1283 non-null   float64
 11  kill_death_assist_ratio  1283 non-null   float64
 12  creep_score              1283 non-null   float64
 13  cs/min                   1283 non-null   float64
 14  gold                    

We can get the percentage of missing values using `isnull()` and `sum()`

In [15]:
(df.isnull().sum() / df.shape[0]).sort_values(ascending=False) * 100

damage/min                 68.12159
damage                     68.12159
gold_share                 13.40608
kill_share                 13.40608
kill_participation         13.40608
kill_death_assist_ratio     0.00000
gold/min                    0.00000
gold                        0.00000
cs/min                      0.00000
creep_score                 0.00000
season                      0.00000
event                       0.00000
deaths                      0.00000
kills                       0.00000
win_rate                    0.00000
loses                       0.00000
wins                        0.00000
games_played                0.00000
player                      0.00000
team                        0.00000
assists                     0.00000
dtype: float64

In this dataset, data for `kill_participation`, `kill_share`, and `gold_share` was not calculated until 4th worlds tournament in 2014. We are missing 13.4% of the data in these 3 categories. We can easily calculate each of these parameters using numpy.

Damage and Damage/min are a different story. These two variables were not tracked until recently starting at the 10th worlds tournament in 2020. We are missing 68.1% of these metrics in our current dataset. We can see if we can find other sources of data to fill in this statistic.

> Note: This is not necessary for our current research question but will be once we look at other models such a multi-linear regression.

In [7]:
(df.isnull().sum() / df.shape[0]).sort_values(ascending=False) * 100

damage/min                 68.12159
damage                     68.12159
gold_share                 13.40608
kill_share                 13.40608
kill_participation         13.40608
kill_death_assist_ratio     0.00000
gold/min                    0.00000
gold                        0.00000
cs/min                      0.00000
creep_score                 0.00000
season                      0.00000
event                       0.00000
deaths                      0.00000
kills                       0.00000
win_rate                    0.00000
loses                       0.00000
wins                        0.00000
games_played                0.00000
player                      0.00000
team                        0.00000
assists                     0.00000
dtype: float64