# **Predicting Match Outcomes in League of Legends Based on Team Performance and Player Statistics**

## **Introduction**

_League of Legends_ is a popular multiplayer online battle arena (MOBA) game developed and published by Riot Games that is played by well over 30 million players everyday on average. As a result of this, there is a wealth of game data available to gain insights about trends seen in League of Legends matches, as well as build a model that can predict the outcome of a game based on different match and player statistics.

In this analysis, I will begin by cleaning and pre-processing a League of Legends match dataset, after which I will perform an exploratory data analysis on the data, visualizing some key trends. Finally, I will use a CatBoost Classifier to predict match outcomes using the data available.

## **Data Description**

_License:_ **_CC BY 4.0 (Creative Commons Attribution 4.0 International)_**. \
This license allows sharing, modification, and use of the dataset as long as proper attribution is given.

The data that will be used is the `League of Legends Match Dataset (2025)` by user Jacob Krasuski from Kaggle.com: https://www.kaggle.com/datasets/jakubkrasuski/league-of-legends-match-dataset-2025

The dataset consists of 40412 rows and 94 columns, with each row representing a unique player in a match and each column representing information about various aspects regarding the match and the player's performance within the match. All of the data was taken from matches on the Europe Nordic and East server.

## **Data Cleaning**

remove unncessary columns, change monkeyking to wukong, change utility to support, remove nan, split into aram/sr/swiftplay,

In [3]:
# importing relevant libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [6]:
# loading in the dataset

matches = pd.read_csv("league_data.csv", dtype={17: 'str'})

In [7]:
matches.head(5)

Unnamed: 0,game_id,game_start_utc,game_duration,game_mode,game_type,game_version,map_id,platform_id,queue_id,participant_id,...,final_magicPen,final_magicPenPercent,final_magicResist,final_movementSpeed,final_omnivamp,final_physicalVamp,final_power,final_powerMax,final_powerRegen,final_spellVamp
0,3727443000.0,2025-01-15 14:56:00,1714.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,5.0,...,0.0,0.0,48.0,385.0,0.0,0.0,799.0,1134.0,147.0,0.0
1,3726377000.0,2025-01-13 10:50:00,1300.0,CLASSIC,MATCHED_GAME,15.1.648.3927,11.0,EUN1,420.0,5.0,...,0.0,0.0,38.0,390.0,0.0,0.0,970.0,970.0,105.0,0.0
2,3729644000.0,2025-01-19 18:15:00,2019.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,2.0,...,0.0,0.0,121.0,431.0,0.0,0.0,10000.0,10000.0,0.0,0.0
3,3729916000.0,2025-01-20 01:27:00,1625.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,8.0,...,12.0,0.0,47.0,380.0,0.0,0.0,1122.0,1596.0,37.0,0.0
4,3729902000.0,2025-01-20 00:40:00,1542.0,CLASSIC,MATCHED_GAME,15.1.649.4112,11.0,EUN1,420.0,10.0,...,0.0,0.0,40.0,534.0,0.0,0.0,1025.0,1025.0,109.0,0.0


In [22]:
len(matches['champion_name'].unique())

170

In [28]:
matches['individual_position'].unique()

array(['UTILITY', 'JUNGLE', 'MIDDLE', 'Invalid', 'BOTTOM', 'TOP', nan],
      dtype=object)

In [25]:
champions = matches.groupby("champion_name").size().reset_index(name="count")
champions.sort_values(by = "count")

Unnamed: 0,champion_name,count
67,Kled,40
48,Ivern,49
111,Rumble,67
104,Rammus,71
95,Olaf,72
...,...,...
53,Jhin,633
76,Lux,663
54,Jinx,678
82,MissFortune,713


## **References**

https://activeplayer.io/league-of-legends/ (player count statement)