# League of Legends Victory Prediction

### Andrew Italo, Conner Brown                          
CPSC 310 - Spring 2019

* * *

## Introduction

Currently the most played video game in the world, League of Legends is of a class of games called MOBAs, which stands for Multiplayer Online Battle Arena. Essentially, two teams of five players meet on a map divided into 3 distinct lanes in order to fight, capture objectives, kill AI-controlled minions and other players, and ultimately to destroy the other team's base. League of Legends is a complex game with many factors that can swing a match in your team's favor, many of which are hard to express as objective numbers and statistics. However, by looking at those factors that can be expressed in numbers, it is possible to get a fairly accurate picture of which team is doing better at a given time.

This project draws on a dataset containing game information for 7,600 professional matches of League of Legends. Each match contains information such as which teams played against each other, which champions were assigned to each lane, which team won, and a minute-by-minute breakdown of various objectives such as kills, towers destroyed, dragon and baron captures, and the gold differential at each minute.

Using this data, we will attempt to build a classifier that can accurately predict which team will win a match. We will be utilizing TDIDT decision tree generation, combined with a post-pruning algorithm to avoid overfitting. In order to test the comparative accuracy of our pruned decision tree, we will test it against both a non-pruned TDIDT tree and a random forest ensemble classifier to see if pruning the tree had any impact on accuracy, as well as a kNN and a Naive Bayes classifier to determine if other classifiers are more suited to this data than decision trees.

----

## Dataset

The dataset we chose includes virtually every factor about a given match that can be expressed as quantifiable data. Many of these, such as which professional teams faced off or which character each player selected, are attributes that would potentially have a very big impact on which side ended up winning, but are hard to include in a classification algorithm. For example, there are dozens of teams, with new teams being formed each year as some older teams disband. Each team occaisionally switches up their roster, so even the eligible players can change entirely over the years. A player's individual skill with a given character is definitely a huge factor, but player / champion relationships creates too many combinations to have an accurate classifier that considers all possibilities, and would leave a huge amount of unseen instances that wouldn't necessarily be able to be classified. Even just considering what 10 champions are present in a given match is a good indication of relative team strength, but the current roster is sitting at over 150 different characters, many of which were released during the time period that our dataset spans. Not only that, but Riot Games, the company behind League of Legends, releases frequent patches to the game for balancing purposes, and so the strength of each champion will change over time pretty drastically. So, a team composition that was almost guaranteed to win in 2016 might be horribly unoptimized in 2018. Combined with the fact that using team composition as a relative attribute would mean that all future champion releases would constitute unseen data, it would be incredibly difficult and time consuming to create a classifier that uses these attributes, even though they are some of the most important factors that go into the relative starting position of one team over another. 

As such, we have pruned our dataset down to only those attributes that meet several rules:

* **Attribute must be easy to represent in objective, quantifiable measurements**
    * This eliminates factors such as how well a team groups up and communicates, or their overall strategy for winning the match
* **Attribute must be unlikely to change drastically over time due to changes in the game's design or changes to the competitive scene**
    * This eliminates attributes such as champion selection or team roster
* **Attribute must intuitively contribute to a team's win or loss in some meaningful manner**
    * This eliminates attributes such as what year the game was played, or what league it was played in
* **Attribute must not create too many combinations to create a feasible classifier**
    * This eliminates attributes such as the location of a given kill on the map, or the individual kill count of each team member
    * This does not include attributes that can be discretized, such as gold difference
* **Attribute must not be a repeat or different representation of a previously included attribute**
    * This eliminates attributes such as blue team's total gold and red team's total gold, since we have one attribute that measures the difference in gold between the two teams

Here is the final list of attributes included in our data, along with a description of each:

| Attribute | Description |
| --------- | ----------- |
| **gamelength** | The duration in minutes of a given match |
| **bResult** | The winner of a given match <br> 0 = red team won <br> 1 = blue team won |
| **golddiff** | The difference in gold between the two teams <br> positive values indicate blue team is ahead <br> negative values indicate red team is ahead <br> Gold is generated through killing AI-controlled minions, other players, objectives, as well as at a very slow passive rate|
| **bKills / rkills** | The total number of times a player has killed a player of the opposite team <br> Players respawn after being killed, although the death timer increases based on the current time in the game|
| **bTowers / rTowers** | The total number of towers destroyed by each team <br> Towers are stationary structures that damage enemy players when they get too close <br> Towers must be destroyed to push in towards the enemy base <br> In total, each team starts with 11 symmetrically placed towers along the lanes on their side of the map <br> Towers do not respawn when destroyed |
| **bInhibs / rInhibs** | The total number of inhibitors destroyed by each team <br> Inhibitors are stationary structures that enhance your AI-controlled minions when destroyed <br> At least one Inhibitor must be destroyed in order to damage the other team's base <br> In total, each team starts with 3 symmetrically placed Inhibitors within their base <br> Inhibitors do respawn several minutes after being destroyed |
| **bDragons / rDragons** | The total number of Dragons killed by each team <br> The Dragon is a neutral map objective that provides a permanent teamwide buff to the team that kills it <br> There is one Dragon present on the map at a given time, always in the same spot <br> The Dragon does respawn several minutes after being killed |
| **bBarons / rBarons** | The total number of times each team has killed Baron Nashor <br> Baron is an incredibly powerful neutral objective that requires an entire team to fight <br> Once killed, it grants your team a very powerful stat boost that dissappears after a certain time or if you are killed <br> There is one Baron present on the map at a given time, always in the same spot <br> Baron does respawn several minutes after being killed |
| **bHeralds / rHeralds** | Indicates if / when a given team killed the Rift Herald <br> The Herald is a neutral objective that allows the team that kills it to then summon it to their side and help destroy towers quickly <br> There is only one Herald on the map, always in the same spot, which disappears at a certain time in the game <br> The Herald does not respawn after being killed or disappearing, and only one will ever exist in a given game |

Of these attributes, gamelength and bResult are static values that are only available after a game has ended. Gloddiff and kills are the two easiest measures of how well a team is doing at a given point and how strong they are relative to the other team. Towers and Inhibitors must be destroyed to make progress towards the other team's base, and so indicate how well a team is pushing towards ending the game, regardless of their relative strength to the other team. Dragons, Barons, and Rift Heralds are all strategic elements that grant situational power to one side, and although they can have a huge impact on the game, they are not required to win and their impact is dependent on how well a team leverages the advantage gained from capturing them against how well the other team is able to respond. For example, if a team kills Baron Nashor and gains the corresponding buff but does not use it to gain better position on the map or kill the other team, or if the other team is able to respond quickly and kill them before they are able to, then there is very little advantage gained from taking the objective.

Golddiff was originally represented as an array with values for each minute in the game, and all other attributes besides the game length and the game's winner have a timestamp for each time they were captured. As such, we have divided each game into several instances, with each instance representing one minute in the game. Therefore, a game that lasts 30 minutes will be divided into 30 separate instances, with each instance showing the difference in gold at the given minute, as well as the number of kills earned, structures destroyed, and objectives secured by each team up to that point in the game. 

----

## Summary Statistics

In total, our dataset consists of 7620 instances, with each instance describing one professional League of Legends match. Once split by minute, our data size increases to 221,358 instances, each instance matching one minute of one match. When analyzing the data, there are several interesting features that are worth noting.

##### Game Length
Analyzing the distribution of game length is important if we are going to classify our dataset by the minute. We can quickly get some overall stats to describe the data.

| Statistic | Minutes | 
| --------- | ------- |
| Shortest Game | 17 |
| Longest Game | 95 |
| Average Length | 37.013 |

It is also useful to be able to visualize this distribution, as in ***Figure 1.***

<img src="images/length.png">
<center><b><i>Figure1</i></b></center>

This histogram shows a plot of games that are at least *x* minutes long. As we can see, the graph falls of significantly after about 25-30 minutes, which matches the accepted "standard" length of a game of League of Legends. 

It is important to note how the game mechanics will influence our conclusion based on this distribution. In a game of League, all players start from theoretically "even" ground, with the same level and amount of gold. Kills and objectives taken early will set a player up for the later parts of the game, but there is also more time for the other player to catch up, especially considering it is tough to earn a significant lead early in the game. Combined with the fact that player the time for a player to respawn after being killed increases from about 5 seconds at the start of the game to about 45-50 seconds by its end means that early kills are also eaier to recover from, and so it will be difficult to classify with any significant accuracy in the first minutes of the game. This creates an interesting dichotemy in the late game though; if a team, or even one player, is doing very well in the early and mid game, they can build an advantage in power that is hard to overcome. This can lead to "run-away" games, where it is obvious one team is going to win and it's only a matter of waiting the last 10 minutes it will take for them to do so. However, if the teams are evenly split, the fact that deaths become more and more significant as the game goes on means that a single bad fight can swing the game in a matter of minutes. This effect becomes more and more prominent the longer a game goes on, and so we expect games that look incredibly even up until the last few minutes to also be common in the data. This also poses a challenge to classification, as it is difficult to predict which team will win the game-defining fight without many statistical advantages to analyze. This can even happen when the teams aren't even, and it is entirely possible for a team with a huge lead to make a few small mistakes and suddenly have their opponent close the power gap, putting the two even again. For these reasons, we expect our classifiers to perform best near the end of the game, with unusually long games being less accurate.  


##### Side Advantage