# League of Legends Game Predictor

This project is to predict who would win in a professional game of League of Legends depending on the first fifteen minutes of a game. League of Legends is a multiplayer online battle arena (MOBA) where five players play against 5 other players. Each player chooses one of 139 champions. The map is split into three lanes that are named based on their location: top, mid, bot. The area between the lanes is considered the jungle. Gold is acquired through multiple ways throughout the game and is used to buy items that usually helps with dealing damage to other players. To win the game, one team must defeat the nexus in the other team's base. In order to get to the nexus, all turrets and inhibitors must be destroyed in one lane. In the jungle there are objectives, like dragons, barons, and rift herald, that you can take to boost team stats for either a permanent or temporary advantage. A couple different approaches were taken to see if there was an optimal way to process the data.

In [1]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import keras
from keras.models import Model, Sequential
from keras.layers import Dense, Activation
import keras.backend as K
from keras import optimizers
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn import linear_model
from sklearn import svm
K.clear_session()

Using TensorFlow backend.


This function was made to grab data from a csv file. The function converts a string of numbers into an array of float data points. We did this because when we imported the csv data into a dataframe our cells were strings.

In [2]:
def grabData(strArray):
    resultArray = []
    stringValue = ""
    for i in strArray:
        if i != '[' and i != ']' and i != ' ':
            if i != ',':
                stringValue+=i
            else:
                resultArray.append(float("".join(stringValue)))
                stringValue = ""
    return resultArray

This function was used to grab the first fifteen minutes of data of an array. If there are no values for certain objectives at 15 minutes then we pad with zeros. We do this because it is not uncommon to have no values for certain features at 15 minutes. For example, there can be many games where there is no dragon taken at 15 minutes of the game because the game could be very evenly matched. 

In [3]:
def grabFirstFifteen(grabData, dataArray):
    FirstFifteen = np.zeros(15)
    for i in dataArray:
        Temp = grabData(i)[0:15]
        if len(Temp) == 0:
            Temp = np.zeros(15)
        elif len(Temp) < 15:
            Temp = np.pad(Temp, (0, 15-len(Temp)), 'constant', constant_values = 0)
        FirstFifteen = np.vstack((FirstFifteen, Temp))
    FirstFifteen = np.delete(FirstFifteen, 0, 0)
    return FirstFifteen

This function returns the number of objectives taken before 15 minutes when you input an array of times. We take the array generated from the previous function to get the count of how many objectives were taken. The data from the csv is represented as time stamps of when the each objective is taken, so we replace the time stamps with 1s and 0s. We then sum each row to get the number of objectives taken for each game.

In [4]:
def countArray(array):
    padded= grabFirstFifteen(grabData, array)
    np.place(padded, padded == 0, [50])
    np.place(padded, padded <= 15, [1])
    np.place(padded, padded > 15, [0])
    num = np.sum(padded, axis = 1)
    return num

This function was used to find the win rate of a champion depending on the lane that they were played in. Out of the 139 champions playable, there are a handful of champions that are considered "meta" and are frequently played. From these frequently played champions we can get their win rates and use that as a feature to measure trends in champion selection.

In [5]:
def champWinRate(laneChamps, result):
    winRate = 0
    uniqueChampNames = np.unique(laneChamps, return_index=False)
    numWins = np.zeros(len(uniqueChampNames))
    timesPlayed = np.zeros(len(uniqueChampNames))
    ChampWinRate = []
    for i in range(0, len(uniqueChampNames)):
        for j in range(0, len(laneChamps)):
            if uniqueChampNames[i] == laneChamps[j]:
                timesPlayed[i]+=1
                if result[j] == 1:
                    numWins[i]+=1
    winRate = numWins / timesPlayed
    for i in range(0, len(laneChamps)):
        for j in range(0, len(uniqueChampNames)):
            if uniqueChampNames[j] == laneChamps[i]:
                ChampWinRate = np.append(ChampWinRate, winRate[j])
    return ChampWinRate

The following code parses throught the data to get what was needed. The paramters that were wanted were the times Turrets, Inhibitors, Dragons, and Rift Heralds. In the game there is an objective called Baron, but it spawns at fifteen minutes, therefore it is not taken into account. The champions played each game were used to find the win rate of a champion.

Any game with the champion Azir would be omitted because of his passive skill that temporarily ressurects any destroyed turrets. This turret would not grant any significant amount of gold and is only used as a temporary defense, but it counted as a destroyed turret in our data set.

In [6]:
LoL = pd.read_csv('_LeagueofLegends.csv')
LoL = LoL[LoL.redMiddleChamp != 'Azir']
LoL = LoL[LoL.blueMiddleChamp != 'Azir']
goldDiff15 = grabFirstFifteen(grabData, np.array(LoL['golddiff'])) #blueteam - redteam

NumBlueHeralds=countArray(np.array(LoL['bHeralds']))
NumRedHeralds=countArray(np.array(LoL['rHeralds']))
NumRedDragons=countArray(np.array(LoL['rDragons']))
NumBlueDragons=countArray(np.array(LoL['bDragons']))
NumRedInhibs=countArray(np.array(LoL['rInhibs']))
NumBlueInhibs=countArray(np.array(LoL['bInhibs']))
NumRedTowers =countArray(np.array(LoL['rTowers']))
NumBlueTowers =countArray(np.array(LoL['bTowers']))
blueResult = np.array(LoL['bResult'])
goldBlueMid = countArray(np.array(LoL['goldblueMiddle']))
goldBlueTop = countArray(np.array(LoL['goldblueTop']))
goldBlueJung = countArray(np.array(LoL['goldblueJungle']))
goldBlueADC = countArray(np.array(LoL['goldblueADC']))
goldBlueSupp = countArray(np.array(LoL['goldblueSupport']))

blueJungChamps = np.array(LoL['blueJungleChamp'])
blueMidChamps = np.array(LoL['blueMiddleChamp'])
blueADCChamps = np.array(LoL['blueADCChamp'])
blueSupChamps = np.array(LoL['blueSupportChamp'])
blueTopChamps = np.array(LoL['blueTopChamp'])

redJungChamps = np.array(LoL['redJungleChamp'])
redMidChamps = np.array(LoL['redMiddleChamp'])
redADCChamps = np.array(LoL['redADCChamp'])
redSupChamps = np.array(LoL['redSupportChamp'])
redTopChamps = np.array(LoL['redTopChamp'])

blueJungWinRate = champWinRate(blueJungChamps, blueResult)
redJungWinRate = champWinRate(redJungChamps, blueResult)
blueMidWinRate = champWinRate(blueMidChamps, blueResult)
redMidWinRate = champWinRate(redMidChamps, blueResult)
blueAdcWinRate = champWinRate(blueADCChamps, blueResult)
redAdcWinRate = champWinRate(redADCChamps, blueResult)
blueSupWinRate = champWinRate(blueSupChamps, blueResult)
redSupWinRate = champWinRate(redSupChamps, blueResult)
blueTopWinRate = champWinRate(blueTopChamps, blueResult)
redTopWinRate = champWinRate(redTopChamps, blueResult)

The data was separated into training and testing data. This data seperation does not include the gold difference at 15 minutes between the two teams.

In [7]:
ntr = 3128
nts = 100

permIndex = np.random.permutation(ntr+nts)
goldDiffTr = preprocessing.scale(goldDiff15[permIndex[0:ntr]])
goldDiffTs = preprocessing.scale(goldDiff15[permIndex[ntr:]])

data=np.column_stack([NumBlueTowers, NumBlueInhibs, NumBlueDragons, NumBlueHeralds,
    NumRedTowers, NumRedInhibs, NumRedDragons, NumRedHeralds,blueMidWinRate,
    blueJungWinRate, blueAdcWinRate, blueSupWinRate, blueTopWinRate,
    redMidWinRate, redJungWinRate, redAdcWinRate, redSupWinRate, redTopWinRate])
datatr = data[permIndex[0:ntr]]
datats = data[permIndex[ntr:]]
ytr = blueResult[permIndex[0:ntr]]
yts = blueResult[permIndex[ntr:]]

# Neural Network Into Logistic Regression

The first approach to this project was to put the gold difference of the first 15 minutes into a neural network with a binary classifier. The output of this was then used together with all of the other features in a logistic regression. This seemed logical because sometimes the shape of the gold curve can be more important than the gold values themselves.

In [8]:
numfit = 20
nin = goldDiffTr.shape[1]  # dimension of input data
nh = 200 # number of hidden units
nout = 1    # number of outputs
model = Sequential()
model.add(Dense(nh, input_shape=(nin,), activation = 'sigmoid', name='hidden'))
model.add(Dense(nout, activation='hard_sigmoid', name='output'))

opt = optimizers.Adam(lr = 0.001)
model.compile(optimizer=opt,loss='binary_crossentropy',metrics=['accuracy'])
for i in range(numfit):
    model.fit(goldDiffTr, ytr, epochs=10, batch_size=100, validation_data=(goldDiffTs,yts), verbose = 0)
yhattr = model.predict(goldDiffTr)
yhatts = model.predict(goldDiffTs)

In [9]:
datatr = np.column_stack([datatr, yhattr])
datats = np.column_stack([datats, yhatts])
datatr = preprocessing.scale(datatr)
datats = preprocessing.scale(datats)
regr = linear_model.LogisticRegression()
regr.fit(datatr,ytr)
yhat2=regr.predict(datats)
count=0
for i in range(0, len(yts)):
    if yhat2[i]==yts[i]:
        count+=1
acc = np.sum(count)/yts.shape[0]
print("Accuracy of LogReg: ", acc*100)

Accuracy of LogReg:  83.0


# Neural Network into SVM

Another approach was to take the output from the neural network into an SVM instead of a logistic regression.

In [10]:
svc = svm.SVC(probability=False,  kernel="rbf", C=2.8, gamma=.0073,verbose=10)
svc.fit(datatr, ytr)
yhatsvm = svc.predict(datats)
svmacc = np.mean(yhatsvm == yts)
print('SVM Acc: {0:f}'.format(svmacc*100))

[LibSVM]SVM Acc: 84.000000


# Only Nerual Network

We also tried putting all of the paramters into one nerual network. We used the gold difference at 15 minutes as a direct parameter, with the other features, into the neural network. 

In [11]:
data=np.column_stack([NumBlueTowers, NumBlueInhibs, NumBlueDragons, NumBlueHeralds,
    NumRedTowers, NumRedInhibs, NumRedDragons, NumRedHeralds,
    blueMidWinRate, blueJungWinRate, blueAdcWinRate, blueSupWinRate, blueTopWinRate,
    redMidWinRate, redJungWinRate, redAdcWinRate, redSupWinRate, redTopWinRate,
    goldDiff15])

datascaled = preprocessing.scale(data)
Xtr = datascaled[permIndex[0:ntr]]
Xts = datascaled[permIndex[ntr:]]
ytr = blueResult[permIndex[0:ntr]]
yts = blueResult[permIndex[ntr:]]
numfit = 20
nin = Xtr.shape[1]  # dimension of input data
nh = 200 # number of hidden units
nout = 1    # number of outputs
model = Sequential()
model.add(Dense(nh, input_shape=(nin,), activation = 'sigmoid', name='hidden'))
model.add(Dense(nout, activation='hard_sigmoid', name='output'))

opt = optimizers.Adam(lr = 0.001)
model.compile(optimizer=opt,loss='binary_crossentropy',metrics=['accuracy'])
for i in range(numfit):
    model.fit(Xtr, ytr, epochs=10, batch_size=100, validation_data=(Xts,yts), verbose = 0)
yhat = model.predict(Xts)

np.place(yhat, yhat >= 0.5, [1])
np.place(yhat, yhat <0.5, [0])
yhat = np.ravel(yhat)
accuracyNN = np.mean(yhat == yts)
print("Accuracy Neural Net: ", accuracyNN*100)

Accuracy Neural Net:  84.0


# Only SVM

Another approach was to put all of the features directly into an SVM classifier.

In [12]:
svc = svm.SVC(probability=False,  kernel="rbf", C=2.8, gamma=.0073,verbose=10)
svc.fit(Xtr, ytr)
yhatsvm = svc.predict(Xts)
svmacc = np.mean(yhatsvm == yts)
print('SVM Acc: {0:f}'.format(svmacc*100))

[LibSVM]SVM Acc: 82.000000


# Only Logistic Regression

The last approach taken was to put all of the data into a logistic regression.

In [13]:
regr = linear_model.LogisticRegression()
regr.fit(Xtr,ytr)
yhat2=regr.predict(Xts)
count=0
for i in range(0, len(yts)):
    if yhat2[i]==yts[i]:
        count+=1
acc = np.sum(count)/yts.shape[0]
print("Accuracy of LogReg: ", acc*100)

Accuracy of LogReg:  82.0


# Conclusion

Each classification method produces very similar accuracies. The lowest accuracy we get is around 75%. The highest accruacy that was obtained was 85%. This is to be expected since there are many games where the team with a lead isn't able to press their advantage into a victory. There is also the problem where the team with the early disadvantage performs better later in the game due to late game advantages, better teamwork, better shotcalling, team composition etc. Because of the reasons listed, a 75-85% accuracy is pretty decent because humans are playing the game, not robots that would make perfect decisions throughout the game. Of all the approaches taken with this project, none of them seemed to be better than the others.