# **Predicting Match Outcomes in League of Legends Based on Team Performance and Player Statistics**

## **Introduction**

_League of Legends_ is a popular multiplayer online battle arena (MOBA) game developed and published by Riot Games that is played by well over 30 million players everyday on average. As a result of this, there is a wealth of game data available to gain insights about trends seen in League of Legends matches, as well as build a model that can predict the outcome of a game based on different match and player statistics.

In this analysis, I will begin by cleaning and pre-processing a League of Legends match dataset, after which I will perform an exploratory data analysis on the data, visualizing some key trends. Finally, I will use a CatBoost Classifier to predict match outcomes using the data available.

## **Data Description**

_License:_ **_CC BY 4.0 (Creative Commons Attribution 4.0 International)_**. \
This license allows sharing, modification, and use of the dataset as long as proper attribution is given.

The data that will be used is the `League of Legends Match Dataset (2025)` by user Jacob Krasuski from Kaggle.com: https://www.kaggle.com/datasets/jakubkrasuski/league-of-legends-match-dataset-2025

The dataset consists of 40412 rows and 94 columns, with each row representing a unique player in a match and each column representing information about various aspects regarding the match and the player's performance within the match. All of the data was taken from matches on the Europe Nordic and East server.

## **Data Cleaning**

remove unncessary columns, change monkeyking to wukong, change utility to support, remove nan, split into aram/sr/swiftplay,

In [44]:
# importing relevant libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [45]:
# loading in the dataset

matches = pd.read_csv("league_data.csv", dtype={17: 'str'})

In [46]:
matches.head(5)

Unnamed: 0,game_id,game_start_utc,game_duration,game_mode,game_type,game_version,map_id,platform_id,queue_id,participant_id,...,final_magicPen,final_magicPenPercent,final_magicResist,final_movementSpeed,final_omnivamp,final_physicalVamp,final_power,final_powerMax,final_powerRegen,final_spellVamp
0,3727443000.0,2025-01-15 14:56:00,1714.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,5.0,...,0.0,0.0,48.0,385.0,0.0,0.0,799.0,1134.0,147.0,0.0
1,3726377000.0,2025-01-13 10:50:00,1300.0,CLASSIC,MATCHED_GAME,15.1.648.3927,11.0,EUN1,420.0,5.0,...,0.0,0.0,38.0,390.0,0.0,0.0,970.0,970.0,105.0,0.0
2,3729644000.0,2025-01-19 18:15:00,2019.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,2.0,...,0.0,0.0,121.0,431.0,0.0,0.0,10000.0,10000.0,0.0,0.0
3,3729916000.0,2025-01-20 01:27:00,1625.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,8.0,...,12.0,0.0,47.0,380.0,0.0,0.0,1122.0,1596.0,37.0,0.0
4,3729902000.0,2025-01-20 00:40:00,1542.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,10.0,...,0.0,0.0,40.0,534.0,0.0,0.0,1025.0,1025.0,109.0,0.0


#### **1) Removing Unnecessary Columns**

In [47]:
filter_cols = ['game_id', 'game_start_utc', 'game_duration', 'game_mode', 'game_version', 'summoner_id',
               'summoner_level', 'champion_name', 'team_id', 'individual_position', 'kills', 'deaths', 
               'assists', 'gold_earned', 'gold_spent', 'total_damage_dealt', 'physical_damage_dealt_to_champions',
               'magic_damage_dealt_to_champions', 'true_damage_dealt_to_champions', 'damage_dealt_to_objectives', 
               'damage_dealt_to_turrets', 'physical_damage_taken', 'magic_damage_taken', 'true_damage_taken', 'time_ccing_others',
               'vision_score', 'wards_placed', 'wards_killed', 'solo_tier', 'solo_rank', 'solo_lp', 'solo_wins', 'solo_losses',
               'champion_mastery_level', 'champion_mastery_points', 'final_abilityPower', 'final_armor', 'final_attackDamage',
               'final_attackSpeed', 'final_healthMax', 'final_magicResist', 'final_movementSpeed', 'win']
matches_filtered = matches[filter_cols]
matches_filtered.head()

Unnamed: 0,game_id,game_start_utc,game_duration,game_mode,game_version,summoner_id,summoner_level,champion_name,team_id,individual_position,...,champion_mastery_level,champion_mastery_points,final_abilityPower,final_armor,final_attackDamage,final_attackSpeed,final_healthMax,final_magicResist,final_movementSpeed,win
0,3727443000.0,2025-01-15 14:56:00,1714.0,CLASSIC,15.1.649.4112,8w6pbOajcaSi7ASzTLsuCmg8jCpAVUJN3uxW2FUzUTE3x6g,2065.0,Nami,100.0,UTILITY,...,77.0,889610.0,155.0,97.0,91.0,131.0,2138.0,48.0,385.0,True
1,3726377000.0,2025-01-13 10:50:00,1300.0,CLASSIC,15.1.648.3927,8w6pbOajcaSi7ASzTLsuCmg8jCpAVUJN3uxW2FUzUTE3x6g,2064.0,Lulu,100.0,UTILITY,...,69.0,852655.0,117.0,59.0,64.0,115.0,1680.0,38.0,390.0,False
2,3729644000.0,2025-01-19 18:15:00,2019.0,CLASSIC,15.1.649.4112,CeGCyYCjlzI8yy-yLBJ20FOiO239D1m6M4F3XK6Y52RsgqI,1851.0,Viego,100.0,JUNGLE,...,148.0,1756910.0,0.0,125.0,275.0,204.0,2672.0,121.0,431.0,False
3,3729916000.0,2025-01-20 01:27:00,1625.0,CLASSIC,15.1.649.4112,gL-HcTa7QMJYprHrrqums9w7wVj5P7vYq9uITGlIRMaOv2Y,1796.0,Malzahar,200.0,MIDDLE,...,6.0,35230.0,233.0,105.0,94.0,126.0,2369.0,47.0,380.0,False
4,3729902000.0,2025-01-20 00:40:00,1542.0,CLASSIC,15.1.649.4112,gL-HcTa7QMJYprHrrqums9w7wVj5P7vYq9uITGlIRMaOv2Y,1796.0,Lulu,200.0,UTILITY,...,56.0,642297.0,66.0,63.0,67.0,117.0,1807.0,40.0,534.0,False


#### **2) Changing Column Data Types**

Our next step is to ensure that each column has a suitable data type before we can modify their data.

In [41]:
matches_filtered.dtypes

game_id                               float64
game_start_utc                         object
game_duration                         float64
game_mode                              object
game_version                           object
summoner_id                            object
summoner_level                        float64
champion_name                          object
team_id                               float64
individual_position                    object
kills                                 float64
deaths                                float64
assists                               float64
gold_earned                           float64
gold_spent                            float64
total_damage_dealt                    float64
physical_damage_dealt_to_champions    float64
magic_damage_dealt_to_champions       float64
true_damage_dealt_to_champions        float64
damage_dealt_to_objectives            float64
damage_dealt_to_turrets               float64
physical_damage_taken             

Most of the columns seem to have a feasible data type; the only column that needs to be changed is the `game_start_utc` column which should be using a date-time data type rather than just object.

In [42]:
matches_filtered['game_start_utc'] = pd.to_datetime(matches_filtered['game_start_utc'], format='%Y-%m-%d %H:%M:%S', errors = 'coerce')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  matches_filtered['game_start_utc'] = pd.to_datetime(matches_filtered['game_start_utc'], format='%Y-%m-%d %H:%M:%S', errors = 'coerce')


In [43]:
matches_filtered.dtypes

game_id                                      float64
game_start_utc                        datetime64[ns]
game_duration                                float64
game_mode                                     object
game_version                                  object
summoner_id                                   object
summoner_level                               float64
champion_name                                 object
team_id                                      float64
individual_position                           object
kills                                        float64
deaths                                       float64
assists                                      float64
gold_earned                                  float64
gold_spent                                   float64
total_damage_dealt                           float64
physical_damage_dealt_to_champions           float64
magic_damage_dealt_to_champions              float64
true_damage_dealt_to_champions               f

#### **3) Changing Naming Semantics**

In [48]:
champions = matches.groupby("champion_name").size().reset_index(name="count")
champions.sort_values(by = "count")

Unnamed: 0,champion_name,count
67,Kled,40
48,Ivern,49
111,Rumble,67
104,Rammus,71
95,Olaf,72
...,...,...
53,Jhin,633
76,Lux,663
54,Jinx,678
82,MissFortune,713


In [49]:
matches_filtered.loc[:,'champion_name'] = matches_filtered['champion_name'].replace('MonkeyKing', 'Wukong')

In [50]:
matches_filtered['individual_position'].unique()

array(['UTILITY', 'JUNGLE', 'MIDDLE', 'Invalid', 'BOTTOM', 'TOP', nan],
      dtype=object)

In [51]:
matches_filtered.loc[:,'individual_position'] = matches_filtered['individual_position'].replace('UTILITY', 'SUPPORT')

#### **4) Splitting Dataset by Game Mode**

In [52]:
sr = matches_filtered[matches_filtered['game_mode'] == 'CLASSIC']
aram = matches_filtered[matches_filtered['game_mode'] == 'ARAM']
swiftplay = matches_filtered[matches_filtered['game_mode'] == 'SWIFTPLAY']

#### **5) Dealing with Null Values**

We will now check each of the three datasets for null values. Upon first look at the data through manual exploration, I saw many null values for the Solo rank columns, but I expect that this is concentrated mostly around the ARAM and Swiftplay mode rows.

In [53]:
swiftplay.isna().sum()

game_id                                 0
game_start_utc                          0
game_duration                           0
game_mode                               0
game_version                            0
summoner_id                             0
summoner_level                          0
champion_name                           0
team_id                                 0
individual_position                     0
kills                                   0
deaths                                  0
assists                                 0
gold_earned                             0
gold_spent                              0
total_damage_dealt                      0
physical_damage_dealt_to_champions      0
magic_damage_dealt_to_champions         0
true_damage_dealt_to_champions          0
damage_dealt_to_objectives              0
damage_dealt_to_turrets                 0
physical_damage_taken                   0
magic_damage_taken                      0
true_damage_taken                 

In [54]:
swiftplay.shape

(1280, 43)

Based on the above output, most of the rows for this subset of the data does not have any Solo rank details, which is most likely due to the fact that Swiftplay is predominantly played by beginners who have not unlocked the ranked Summoner's Rift mode, hence they are unranked. However, there is still a population of players who play Swiftplay despite being ranked, so there is indeed the possibility that the data is legitimately missing. Given that most of the rows do not have Solo rank data, and that there are still many other features to determine match outcome from, it may be best to not consider these columns for this game mode. Likewise, since ranked does not exist in the ARAM game mode, if the ARAM subset observes the same trend, those columns will be removed once again. 

As for the other columns, there are very few rows with missing values, hence their removal is unlikely to affect the results significantly.

In [55]:
aram.isna().sum()

game_id                                  0
game_start_utc                           0
game_duration                            0
game_mode                                0
game_version                             0
summoner_id                              0
summoner_level                           0
champion_name                            0
team_id                                  0
individual_position                      0
kills                                    0
deaths                                   0
assists                                  0
gold_earned                              0
gold_spent                               0
total_damage_dealt                       0
physical_damage_dealt_to_champions       0
magic_damage_dealt_to_champions          0
true_damage_dealt_to_champions           0
damage_dealt_to_objectives               0
damage_dealt_to_turrets                  0
physical_damage_taken                    0
magic_damage_taken                       0
true_damage

In [56]:
aram.shape

(9730, 43)

As expected, most of the Solo rank details are missing for this subset, so it would be best to remove these columns for these two subsets of the data, along with the other missing value rows.

In [57]:
swiftplay = swiftplay.drop(columns = ['solo_tier', 'solo_rank', 'solo_lp', 'solo_wins', 'solo_losses'])
aram = aram.drop(columns = ['solo_tier', 'solo_rank', 'solo_lp', 'solo_wins', 'solo_losses'])

In [58]:
swiftplay = swiftplay.dropna()
aram = aram.dropna()

Moving on to the classic Summoner's Rift mode, a ranked mode does exist and is a major factor that determines how players are matched together since it is an indication of skill. If there are some rows with null values in the Solo rank columns, they will have to be treated in order to preserve these important features.

In [59]:
sr.isna().sum()

game_id                                  0
game_start_utc                           0
game_duration                            0
game_mode                                0
game_version                             0
summoner_id                              0
summoner_level                           0
champion_name                            0
team_id                                  0
individual_position                      0
kills                                    0
deaths                                   0
assists                                  0
gold_earned                              0
gold_spent                               0
total_damage_dealt                       0
physical_damage_dealt_to_champions       0
magic_damage_dealt_to_champions          0
true_damage_dealt_to_champions           0
damage_dealt_to_objectives               0
damage_dealt_to_turrets                  0
physical_damage_taken                    0
magic_damage_taken                       0
true_damage

In [60]:
sr.shape

(29400, 43)

Unlike for the other two game modes, most of the rows have data in the Solo rank columns, which is promising. In order to preserve the rows with missing data, we will treat the players as if they are unranked, which is a very plausible assumption since not all players play the ranked version of the classic game mode. This means the following for these missing values: the `solo_tier` column will take the value "Unranked"; `solo_rank` will take the roman numeral value "V", since "IV" is the lowest rank and "I" is the highest, and we need another symbol lower than that that has not been used, and for all of the rows in the other 3 columns we will impute the value 0. All other missing values in other columns will be dropped since there is a negligible amount.

In [61]:
sr.loc[:,'solo_tier'] = sr['solo_tier'].fillna('Unranked')
sr.loc[:, 'solo_rank'] = sr['solo_rank'].fillna('V')
sr.loc[:, 'solo_lp'] = sr['solo_lp'].fillna(0)
sr.loc[:, 'solo_wins'] = sr['solo_wins'].fillna(0)
sr.loc[:, 'solo_losses'] = sr['solo_losses'].fillna(0)

In [62]:
sr = sr.dropna()

In [63]:
sr.dtypes

game_id                               float64
game_start_utc                         object
game_duration                         float64
game_mode                              object
game_version                           object
summoner_id                            object
summoner_level                        float64
champion_name                          object
team_id                               float64
individual_position                    object
kills                                 float64
deaths                                float64
assists                               float64
gold_earned                           float64
gold_spent                            float64
total_damage_dealt                    float64
physical_damage_dealt_to_champions    float64
magic_damage_dealt_to_champions       float64
true_damage_dealt_to_champions        float64
damage_dealt_to_objectives            float64
damage_dealt_to_turrets               float64
physical_damage_taken             

## **References**

https://activeplayer.io/league-of-legends/ (player count statement)