# League of Legends Win Chance Prediction
### ML model to predict the outcome of a League of Legends match based on champion selection

## Introduction
League of Legends, often abbreviated as LoL, is a popular online multiplayer video game. It's a competitive 5 versus 5 team-based game in which players control unique champions with special abilities and work together to defeat the opposing team. The main objective is to destroy the enemy team's Nexus, a structure in their base, while defending your own. It combines elements of strategy, teamwork, and individual skill and is known for its strategic depth and fast-paced action. League of Legends is played by millions of players worldwide and has a thriving esports scene with professional leagues and tournaments.

In the competitive environment of League of Legends, players are always looking for ways to improve their chances of winning. Since it's a strategy game, one key element affecting a team's success is the mix of champions they pick. Our aim is to create a model that helps players make better decisions about champion selection and team composition by predicting the likelihood of each team winning based on their chosen champions. This also enables the most dedicated players to dodge an unfavorable matchup before the game begins in such a case where the prediction of their chances of winning are looking less than good.

The information about the match is limited to just the champions picked before the game actually begins, so we are going to be using only this information for training our model. 



## Dataset
There are several datasets available online that contain information about the outcome of the game, champions selected, player stats and much more. There is also the official Riot Games API available, which could be used to gather data from the latest version of the game.

For the purpose of this concept, we will be using a dataset from Kaggle. This gives us easy access to a lot of training data, without being limited by the API. While this means that the data is not up to date, it is still a good starting point for our model and useful for evaluating the concept.

The dataset [League of Legends- 1 day's worth of solo queue KR](https://www.kaggle.com/datasets/junhachoi/all-ranked-solo-games-on-kr-server-24-hours/) contains information about all ranked matches on the League of Legends Korean Server during the course of 1 day (GMT 2022/07/02 00:00:00 to 2022/07/03 00:00:00). In total, this amounts to over 250.000 matches. The advantage this dataset has over other datasets is that it is very large and one of the most recent ones available. The data is also from a single day, which means that the game version is the same for all matches. This is important because the game is constantly being updated and the balance of champions changes with every patch. This means that the data from older patches is not as useful for training our model.


In [5]:
# load data used for training
import data.kr_24h.convert as convert

# load the data, including stats we dont need
games_dict = convert.load_raw_csv(file_path="data/kr_24h/kr_soloq_24h/sat df.csv")
games = list(games_dict.values())

# split into train, val, test
train, val, test = convert.split_iterable(games, weights=(90, 5, 5))
print("train: ", len(train))
print("val: ", len(val))
print("test: ", len(test))
print()

# convert each match into a list of 10 champions and a 1/0 for win/loss of blue team
# two copies of train data, one with some matches filtered out
train, train_filtered = convert.convert_data(train, filter_matches=False), convert.convert_data(train, filter_matches=True)
val = convert.convert_data(val, filter_matches=False)
test = convert.convert_data(test, filter_matches=False)

Loading game data...


2589340it [00:32, 79353.71it/s]


Number of games: 258934
train:  233040
val:  12947
test:  12947

Converting games...


100%|██████████| 233040/233040 [00:05<00:00, 45391.28it/s]


Removed 0 games from dataset
Number of games: 233040
Shuffling data...
Length of game data: 233040
Converting games...


100%|██████████| 233040/233040 [00:05<00:00, 45678.96it/s]


Removed 19024 games from dataset
Number of games: 214016
Shuffling data...
Length of game data: 214016
Converting games...


100%|██████████| 12947/12947 [00:00<00:00, 54292.47it/s]


Removed 0 games from dataset
Number of games: 12947
Shuffling data...
Length of game data: 12947
Converting games...


100%|██████████| 12947/12947 [00:00<00:00, 51231.99it/s]

Removed 0 games from dataset
Number of games: 12947
Shuffling data...
Length of game data: 12947





### Data cleaning
can we perform some data cleaning here? Maybe remove matches that were really unbalances/ended early, since then the players were probably the biggest factor in the outcome of the game. Also, we could remove matches where players left the game, since that is not a normal situation and would skew the data.
Downside: could negatively affect early game champions, since they are more likely to end the game early and would be removed more often.
Datapoints available for each player:
- no,gameNo,playerNo,CreationTime,KoreanTime,participantId,teamId,summonerName,gameEndedInEarlySurrender,gameEndedInSurrender,teamEarlySurrendered,win,teamPosition,kills,deaths,assists,objectivesStolen,visionScore,puuid,summonerId,baronKills,bountyLevel,champLevel,championName,damageDealtToBuildings,damageDealtToObjectives,detectorWardsPlaced,doubleKills,dragonKills,firstBloodAssist,firstBloodKill,firstTowerAssist,firstTowerKill,goldEarned,inhibitorKills,inhibitorTakedowns,inhibitorsLost,killingSprees,largestKillingSpree,largestMultiKill,longestTimeSpentLiving,neutralMinionsKilled,objectivesStolenAssists,pentaKills,quadraKills,timeCCingOthers,timePlayed,totalDamageDealt,totalDamageDealtToChampions,totalDamageTaken,totalHeal,totalHealsOnTeammates,totalMinionsKilled,totalTimeCCDealt,totalTimeSpentDead,totalUnitsHealed,tripleKills,unrealKills


## Data Analysis
Before we can start training our model, we need to do some data analysis to get a better understanding of the data. This will help us decide which features to use and how to process them. It can also help us with evaluating the performance of our models later on.

### Overall Win Rate
The first thing we want to look at is the overall win rate (of the blue side). Since the game is not symmetrical, we can't assume that the win rate is 50%. In fact, during most patches, the blue side (bottom left) has a slightly higher win rate than the red side. This can be explained by several factors, such as the camera angle, the position of the minimap, and the position of the HUD. The blue side also has a slight advantage in champion select, since they get to pick first.

This overall win rate gives us a baseline for our model. If our model is not able to beat this baseline, then it is not very useful. The overall win rate is calculated by dividing the number of wins by the total number of matches.

In [None]:
# calculate overall win rate here

### Champion Win Rate
Next, we want to look at the win rate of each champion. This gives us an idea of how strong each champion is and how likely they are to win. We can also see which champions are the most popular and which ones are the least popular. With this, we can evaluate the performance of our model and see if it is able to predict the outcome of the game better than just picking the most popular champions. If we match champions with high win rates against champions with low win rates, we can also see if our models are able to predict the outcome of the game correctly.

In [6]:
# split into x and y
train_x, train_y = train[:, :-1], train[:, -1]
train_filtered_x, train_filtered_y = train_filtered[:, :-1], train_filtered[:, -1]
val_x, val_y = val[:, :-1], val[:, -1]
test_x, test_y = test[:, :-1], test[:, -1]

# convert y to float and to correct shape
def conv_y(y):
    y = y.astype(float)
    y = y.reshape(-1, 1)
    return y

train_y, train_filtered_y = conv_y(train_y), conv_y(train_filtered_y)
val_y, test_y = conv_y(val_y), conv_y(test_y)


# convert champion ids to indices and then one-hot encode
from champion_dicts import ChampionConverter

# see champion_dicts.py for more info
# we have to convert the champion ids from the data into indices, since the ids are not contiguous
# (some ids are 500+, but there are less than 170 champions)
champ_converter = ChampionConverter()

def conv_x(x):
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = champ_converter.get_champion_index_from_id(x[i, j])
    return x

train_x, train_filtered_x = conv_x(train_x), conv_x(train_filtered_x)
val_x, test_x = conv_x(val_x), conv_x(test_x)


import numpy as np

# one-hot encode the champions, used by simple models
CHAMP_NUM = 170 # number of champions, actually a bit less, but this way we could keep same model for more champions
def one_hot_encode(x):
    one_hot = np.zeros((x.shape[0], x.shape[1], CHAMP_NUM))
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            one_hot[i,j,int(x[i,j]-1)] = 1
    return one_hot

train_x_1hot = one_hot_encode(train_x)
train_filtered_x_1hot = one_hot_encode(train_filtered_x)
val_x_1hot = one_hot_encode(val_x)
test_x_1hot = one_hot_encode(test_x)




In [8]:
# calculate average win chance, we should at least beat this :)
avg_win_chance = np.average(train_y)
print("average blue side win chance: ", avg_win_chance)

average blue side win chance:  0.5174219018194302


In [11]:
import tensorflow as tf



class TrivialModel(tf.keras.Model):
    """A trivial model that always predicts the average win chance"""
    def __init__(self):
        super(TrivialModel, self).__init__()
        self.prediction = avg_win_chance

    def call(self, inputs):
        if len(inputs.shape) > 1:
            return np.array([self.prediction]*inputs.shape[0])
        return np.array([self.prediction])


# baseline model, just some dense layers
class BaselineModel(tf.keras.Model):
    def __init__(self):
        super(BaselineModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(32, activation='relu', input_shape=(None,CHAMP_NUM))
        self.dense2 = tf.keras.layers.Dense(256, activation='relu')
        self.dense4 = tf.keras.layers.Dense(128, activation='relu')
        self.dense5 = tf.keras.layers.Dense(1, activation='sigmoid')

    def call(self, inputs):
        x = tf.reshape(inputs, (-1, 10, CHAMP_NUM))
        # same dense for every player
        x = self.dense1(x)
        # shape = (-1, 10, 32  )
        # flatten
        x = tf.reshape(x, (-1, 32*10))
        # 3 dense layers, last one is output of (-1, 1)
        x = self.dense2(x)
        x = self.dense4(x)
        return self.dense5(x)

In [12]:
# train baseline model

base_model = BaselineModel()
base_model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

base_model.fit(train_x_1hot, train_y, epochs=10, batch_size=32, validation_data=(val_x_1hot, val_y))

# evaluate baseline model
print("baseline model:")
base_model.evaluate(test_x_1hot, test_y)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
baseline model:


[0.7859109044075012, 0.511469841003418]

In [13]:
# same model, but with some matches filtered out
base_model_filtered = BaselineModel()
base_model_filtered.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

base_model_filtered.fit(train_filtered_x_1hot, train_filtered_y, epochs=5, batch_size=32, validation_data=(val_x_1hot, val_y))

# evaluate baseline model
print("baseline model filtered:")
base_model_filtered.evaluate(test_x_1hot, test_y)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
baseline model filtered:


[0.6937665939331055, 0.5292345881462097]