# League of Legends Win Chance Prediction
### ML model to predict the outcome of a League of Legends match based on champion selection

## Introduction
League of Legends, often abbreviated as LoL, is a popular online multiplayer video game. It's a competitive 5 versus 5 team-based game in which players control unique champions with special abilities and work together to defeat the opposing team. The main objective is to destroy the enemy team's Nexus, a structure in their base, while defending your own. It combines elements of strategy, teamwork, and individual skill and is known for its strategic depth and fast-paced action. League of Legends is played by millions of players worldwide and has a thriving esports scene with professional leagues and tournaments.

In the competitive environment of League of Legends, players are always looking for ways to improve their chances of winning. Since it's a strategy game, one key element affecting a team's success is the mix of champions they pick. Our aim is to create a model that helps players make better decisions about champion selection and team composition by predicting the likelihood of each team winning based on their chosen champions. This also enables the most dedicated players to dodge an unfavorable matchup before the game begins in such a case where the prediction of their chances of winning are looking less than good.

The information about the match is limited to just the champions picked before the game actually begins, so we are going to be using only this information for training our model. 



## Dataset
There are several datasets available online that contain information about the outcome of the game, champions selected, player stats and much more. There is also the official Riot Games API available, which could be used to gather data from the latest version of the game.

For the purpose of this concept, we will be using a dataset from Kaggle. This gives us easy access to a lot of training data, without being limited by the API. While this means that the data is not up to date, it is still a good starting point for our model and useful for evaluating the concept.

The dataset [League of Legends- 1 day's worth of solo queue KR](https://www.kaggle.com/datasets/junhachoi/all-ranked-solo-games-on-kr-server-24-hours/) contains information about all ranked matches on the League of Legends Korean Server during the course of 1 day (GMT 2022/07/02 00:00:00 to 2022/07/03 00:00:00). In total, this amounts to over 250.000 matches. The advantage this dataset has over other datasets is that it is very large and one of the most recent ones available. The data is also from a single day, which means that the game version is the same for all matches. This is important because the game is constantly being updated and the balance of champions changes with every patch. This means that the data from older patches is not as useful for training our model.


In [None]:
# load data used for training
# convert data to useable format
import tensorflow as tf

### Data cleaning
can we perform some data cleaning here? Maybe remove matches that were really unbalances/ended early, since then the players were probably the biggest factor in the outcome of the game. Also, we could remove matches where players left the game, since that is not a normal situation and would skew the data.
Downside: could negatively affect early game champions, since they are more likely to end the game early and would be removed more often.
Datapoints available for each player:
- no,gameNo,playerNo,CreationTime,KoreanTime,participantId,teamId,summonerName,gameEndedInEarlySurrender,gameEndedInSurrender,teamEarlySurrendered,win,teamPosition,kills,deaths,assists,objectivesStolen,visionScore,puuid,summonerId,baronKills,bountyLevel,champLevel,championName,damageDealtToBuildings,damageDealtToObjectives,detectorWardsPlaced,doubleKills,dragonKills,firstBloodAssist,firstBloodKill,firstTowerAssist,firstTowerKill,goldEarned,inhibitorKills,inhibitorTakedowns,inhibitorsLost,killingSprees,largestKillingSpree,largestMultiKill,longestTimeSpentLiving,neutralMinionsKilled,objectivesStolenAssists,pentaKills,quadraKills,timeCCingOthers,timePlayed,totalDamageDealt,totalDamageDealtToChampions,totalDamageTaken,totalHeal,totalHealsOnTeammates,totalMinionsKilled,totalTimeCCDealt,totalTimeSpentDead,totalUnitsHealed,tripleKills,unrealKills


## Data Analysis
Before we can start training our model, we need to do some data analysis to get a better understanding of the data. This will help us decide which features to use and how to process them. It can also help us with evaluating the performance of our models later on.

### Overall Win Rate
The first thing we want to look at is the overall win rate (of the blue side). Since the game is not symmetrical, we can't assume that the win rate is 50%. In fact, during most patches, the blue side (bottom left) has a slightly higher win rate than the red side. This can be explained by several factors, such as the camera angle, the position of the minimap, and the position of the HUD. The blue side also has a slight advantage in champion select, since they get to pick first.

This overall win rate gives us a baseline for our model. If our model is not able to beat this baseline, then it is not very useful. The overall win rate is calculated by dividing the number of wins by the total number of matches.

In [None]:
# calculate overall win rate here

### Champion Win Rate
Next, we want to look at the win rate of each champion. This gives us an idea of how strong each champion is and how likely they are to win. We can also see which champions are the most popular and which ones are the least popular. With this, we can evaluate the performance of our model and see if it is able to predict the outcome of the game better than just picking the most popular champions. If we match champions with high win rates against champions with low win rates, we can also see if our models are able to predict the outcome of the game correctly.