# Identifying Win or Loss of League of Legends Games at the 10 Minute Marker

League of Legends is an extremely competitive game and currently dominates the esports scene, bringing in 1.2 million peak viewers, which is more than double that of esports' second most viewed game Minecraft, with 516 thousand peak viewers. "High-elo" games are those in which the best players compete to further increase their rank. As the competition increases towards the top of the rank scoreboard, these games are won or loss at a much faster pace, with the highest rank, Challenger, averaging a duration of 4 minutes less than the lowest rank. Therefore, one could posit that the initial moves of higher ranked elo games are characteristically more important to the win or loss of the game. This analysis serves to answer how these teams are generating a large victory margin so early on by building a highly interpretable model that can classify the win or loss of a game by the game stats at the 10 minute mark. This is 19 minutes before the average end of game time at the highest elo.

### Data Preprocessing

In [46]:
# Imports
import opendatasets as od
import pandas as pd
import numpy as np

In [4]:
# Download dataset
od.download("https://www.kaggle.com/datasets/bobbyscience/league-of-legends-diamond-ranked-games-10-min")

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username:

  dylanwadsworth


Your Kaggle Key:

  ········


Downloading league-of-legends-diamond-ranked-games-10-min.zip to .\league-of-legends-diamond-ranked-games-10-min


100%|███████████████████████████████████████████████████████████████████████████████| 539k/539k [00:00<00:00, 8.92MB/s]







In [25]:
# Read csv file into pandas dataframe
data = pd.read_csv('league-of-legends-diamond-ranked-games-10-min/high_diamond_ranked_10min.csv')

In [26]:
# Examing head of data file
data.head(10)

Unnamed: 0,gameId,blueWins,blueWardsPlaced,blueWardsDestroyed,blueFirstBlood,blueKills,blueDeaths,blueAssists,blueEliteMonsters,blueDragons,...,redTowersDestroyed,redTotalGold,redAvgLevel,redTotalExperience,redTotalMinionsKilled,redTotalJungleMinionsKilled,redGoldDiff,redExperienceDiff,redCSPerMin,redGoldPerMin
0,4519157822,0,28,2,1,9,6,11,0,0,...,0,16567,6.8,17047,197,55,-643,8,19.7,1656.7
1,4523371949,0,12,1,0,5,5,5,0,0,...,1,17620,6.8,17438,240,52,2908,1173,24.0,1762.0
2,4521474530,0,15,0,0,7,11,4,1,1,...,0,17285,6.8,17254,203,28,1172,1033,20.3,1728.5
3,4524384067,0,43,1,0,4,5,5,1,0,...,0,16478,7.0,17961,235,47,1321,7,23.5,1647.8
4,4436033771,0,75,4,0,6,6,6,0,0,...,0,17404,7.0,18313,225,67,1004,-230,22.5,1740.4
5,4475365709,1,18,0,0,5,3,6,1,1,...,0,15201,7.0,18060,221,59,-698,-101,22.1,1520.1
6,4493010632,1,18,3,1,7,6,7,1,1,...,0,14463,6.4,15404,164,35,-2411,-1563,16.4,1446.3
7,4496759358,0,16,2,0,5,13,3,0,0,...,0,17920,6.6,16938,157,54,2615,800,15.7,1792.0
8,4443048030,0,16,3,0,7,7,8,0,0,...,0,18380,7.2,19298,240,53,1979,771,24.0,1838.0
9,4509433346,1,13,1,1,4,5,5,1,1,...,0,16605,6.8,18379,247,43,1548,1574,24.7,1660.5


In [27]:
# Make sure all rows are unique games by their id
data.loc[:,'gameId'].unique().size == data.shape[0]

True

In [22]:
# View all columns
print(data.columns)

Index(['gameId', 'blueWins', 'blueWardsPlaced', 'blueWardsDestroyed',
       'blueFirstBlood', 'blueKills', 'blueDeaths', 'blueAssists',
       'blueEliteMonsters', 'blueDragons', 'blueHeralds',
       'blueTowersDestroyed', 'blueTotalGold', 'blueAvgLevel',
       'blueTotalExperience', 'blueTotalMinionsKilled',
       'blueTotalJungleMinionsKilled', 'blueGoldDiff', 'blueExperienceDiff',
       'blueCSPerMin', 'blueGoldPerMin', 'redWardsPlaced', 'redWardsDestroyed',
       'redFirstBlood', 'redKills', 'redDeaths', 'redAssists',
       'redEliteMonsters', 'redDragons', 'redHeralds', 'redTowersDestroyed',
       'redTotalGold', 'redAvgLevel', 'redTotalExperience',
       'redTotalMinionsKilled', 'redTotalJungleMinionsKilled', 'redGoldDiff',
       'redExperienceDiff', 'redCSPerMin', 'redGoldPerMin'],
      dtype='object')


Here I remove some columns when splitting into features and target
1. **redGoldDiff** because it is -1 * 'blueGoldDiff'
2. **redGoldPerMin** because it is 'redTotalGold' / 10
3. **blueGoldPerMin** because it is 'blueTotalGold' / 10
4. **blueAvgLevel** because it is a division of 'blueTotalExperience'
5. **redAvgLevel** because it is a division of 'redTotalExperience'
6. **gameId** because every gameId is unique

In [35]:
# Split data into features and target
feature_cols = [x for x in data.columns if x != 'redGoldDiff' and x != 'redGoldPerMin' and x != 'blueGoldPerMin' and
               x != 'blueAvgLevel' and x != 'redAvgLevel' and x != 'gameId' and x != 'blueWins']
X = data.loc[:, feature_cols]
y = data.loc[:, 'blueWins']

In [44]:
# Check that all data types are numerical
obj_cols = [x for x in X.dtypes if x == object]
print(len(obj_cols))
print(y.dtypes == object)

0
False


In [52]:
# Check for how many target classes there are (should be just two, 1 for blue wins, 0 for red wins)
y.nunique()

2

In [54]:
# Check for balance of the classes
y.value_counts()

blueWins
0    4949
1    4930
Name: count, dtype: int64