<a href="https://colab.research.google.com/github/ObiAU/LoL-TensorFlow-Projects/blob/main/LoLWinPredictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Winrate base on Champion bans

### General Plan/Thoughts:

* will need to assign numerical values to the champions/download data set from kaggle.

* Will be a binary classification model (win or lose)

* Will compile with binary cross entropy and accuracy metric rather than mae/mse - I want to measure accuracy of winrate prediction to see if i can use it in realisitc settings.

* issue is how to train it/ is it necessary to take into account the user who bans the champion?

* Is it necessary/feasible to take into account the combination of 5 champion bans? That may be too complicated.

* It can be any given user.

* I will take 50,000 league ranked games as the general dataset. I will not do combination of champion bans but rather which champions were banned and how it affected the win or loss. This should output ~500,000 data units whilst taking into account champion multi-banning. (Maybe subset this) 

* x values will be what champions were banned.

* y values will be game outcome.

* x and y dimensions will not be equal. I need to figure out how to sort that out. 

* Let's say I test it by evaluating the model with an input of 'yasuo', then it needs to find all games where yasuo was banned and predict whether the player would win or lose.

## Training

* This is the crux of the difficulty (post-preprocessing). 

* Rather than thinking of x and y in terms of dimensions (layers should be able to sort that out for me).

* input layer will contain input shape argument to process this function

* output layer will obviously contain 1 neuron.

In [1]:
# Import required packages
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import RMSprop
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import seaborn as sns

In [2]:
# Upload new file(s) to colab and create function to generalise data
# extraction process then comment out

# data extractor
def load_data(file):
  return pd.read_csv(file)

In [3]:
# Upload files(games.csv, champion_info2, champion_info, summoner_spell_info)
from google.colab import files
uploaded = files.upload()

Saving games.csv to games.csv


In [4]:
games = load_data('games.csv')
games.head()

Unnamed: 0,gameId,creationTime,gameDuration,seasonId,winner,firstBlood,firstTower,firstInhibitor,firstBaron,firstDragon,...,t2_towerKills,t2_inhibitorKills,t2_baronKills,t2_dragonKills,t2_riftHeraldKills,t2_ban1,t2_ban2,t2_ban3,t2_ban4,t2_ban5
0,3326086514,1504279457970,1949,9,1,2,1,1,1,1,...,5,0,0,1,1,114,67,43,16,51
1,3229566029,1497848803862,1851,9,1,1,1,1,0,1,...,2,0,0,0,0,11,67,238,51,420
2,3327363504,1504360103310,1493,9,1,2,1,1,1,2,...,2,0,0,1,0,157,238,121,57,28
3,3326856598,1504348503996,1758,9,1,1,1,1,1,1,...,0,0,0,0,0,164,18,141,40,51
4,3330080762,1504554410899,2094,9,1,2,1,1,1,1,...,3,0,0,1,0,86,11,201,122,18


In [None]:
# # use train_test_split to randomly split the data set into training and test data
# Gtrain, Gtest = train_test_split(games, test_size=0.2, random_state=5)
# Gtrain.head(), Gtest.head()

In [5]:
games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51490 entries, 0 to 51489
Data columns (total 61 columns):
 #   Column              Non-Null Count  Dtype
---  ------              --------------  -----
 0   gameId              51490 non-null  int64
 1   creationTime        51490 non-null  int64
 2   gameDuration        51490 non-null  int64
 3   seasonId            51490 non-null  int64
 4   winner              51490 non-null  int64
 5   firstBlood          51490 non-null  int64
 6   firstTower          51490 non-null  int64
 7   firstInhibitor      51490 non-null  int64
 8   firstBaron          51490 non-null  int64
 9   firstDragon         51490 non-null  int64
 10  firstRiftHerald     51490 non-null  int64
 11  t1_champ1id         51490 non-null  int64
 12  t1_champ1_sum1      51490 non-null  int64
 13  t1_champ1_sum2      51490 non-null  int64
 14  t1_champ2id         51490 non-null  int64
 15  t1_champ2_sum1      51490 non-null  int64
 16  t1_champ2_sum2      51490 non-null  int6

In [6]:
# Check if the data set contains missing values
games.isnull().sum()

gameId          0
creationTime    0
gameDuration    0
seasonId        0
winner          0
               ..
t2_ban1         0
t2_ban2         0
t2_ban3         0
t2_ban4         0
t2_ban5         0
Length: 61, dtype: int64

In [7]:
# Check for na values
games.isna().sum()

gameId          0
creationTime    0
gameDuration    0
seasonId        0
winner          0
               ..
t2_ban1         0
t2_ban2         0
t2_ban3         0
t2_ban4         0
t2_ban5         0
Length: 61, dtype: int64

In [8]:
games['winner'].value_counts() # approximate 50%

1    26077
2    25413
Name: winner, dtype: int64

In [9]:
# Use groupby to check value counts by season ID
games.groupby(['t2_ban1'])['seasonId'].value_counts().sort_values(ascending = False)

t2_ban1  seasonId
157      9           3307
238      9           2544
31       9           2538
40       9           2308
122      9           2274
                     ... 
106      9             13
13       9             10
14       9             10
72       9              8
77       9              6
Name: seasonId, Length: 139, dtype: int64

In [10]:
# For our purpose we only need the t2 and t1 ban values
# Alongside win and loss binary labels

iban = games[['winner','t1_ban1', 't1_ban2', 't1_ban3'
, 't1_ban4', 't1_ban5','t2_ban1', 't2_ban2',
't2_ban3', 't2_ban4', 't2_ban5']].copy()


# Create new columns in the dataframe to store the lists
iban['Team_1_bans'] = 0
iban['Team_2_bans'] = 0

def List_Gen_Tool(df):
    df.reset_index(drop=True,inplace=True)
    df['Team_1_bans'] = df['Team_1_bans'].astype(object)
    df['Team_2_bans'] = df['Team_2_bans'].astype(object)
    for index in df.index:
        Team=df.loc[index][1:6].tolist()
        df.at[index,'Team_1_bans'] = Team
        Team2=df.loc[index][6:11].tolist()
        df.at[index,'Team_2_bans'] = Team2
    return df

In [11]:
iban.head()

Unnamed: 0,winner,t1_ban1,t1_ban2,t1_ban3,t1_ban4,t1_ban5,t2_ban1,t2_ban2,t2_ban3,t2_ban4,t2_ban5,Team_1_bans,Team_2_bans
0,1,92,40,69,119,141,114,67,43,16,51,0,0
1,1,51,122,17,498,19,11,67,238,51,420,0,0
2,1,117,40,29,16,53,157,238,121,57,28,0,0
3,1,238,67,516,114,31,164,18,141,40,51,0,0
4,1,90,64,412,25,31,86,11,201,122,18,0,0


In [12]:
iBan = List_Gen_Tool(iban)
iBan.head()

Unnamed: 0,winner,t1_ban1,t1_ban2,t1_ban3,t1_ban4,t1_ban5,t2_ban1,t2_ban2,t2_ban3,t2_ban4,t2_ban5,Team_1_bans,Team_2_bans
0,1,92,40,69,119,141,114,67,43,16,51,"[92, 40, 69, 119, 141]","[114, 67, 43, 16, 51]"
1,1,51,122,17,498,19,11,67,238,51,420,"[51, 122, 17, 498, 19]","[11, 67, 238, 51, 420]"
2,1,117,40,29,16,53,157,238,121,57,28,"[117, 40, 29, 16, 53]","[157, 238, 121, 57, 28]"
3,1,238,67,516,114,31,164,18,141,40,51,"[238, 67, 516, 114, 31]","[164, 18, 141, 40, 51]"
4,1,90,64,412,25,31,86,11,201,122,18,"[90, 64, 412, 25, 31]","[86, 11, 201, 122, 18]"


In [13]:
iBan['winner'].value_counts()

1    26077
2    25413
Name: winner, dtype: int64

In [14]:
# Now to drop the unnecessary rows and only have the final two
iGames = iBan[['winner', 'Team_1_bans', 'Team_2_bans']].copy()
iGames.head()

Unnamed: 0,winner,Team_1_bans,Team_2_bans
0,1,"[92, 40, 69, 119, 141]","[114, 67, 43, 16, 51]"
1,1,"[51, 122, 17, 498, 19]","[11, 67, 238, 51, 420]"
2,1,"[117, 40, 29, 16, 53]","[157, 238, 121, 57, 28]"
3,1,"[238, 67, 516, 114, 31]","[164, 18, 141, 40, 51]"
4,1,"[90, 64, 412, 25, 31]","[86, 11, 201, 122, 18]"


In [18]:
# Having both team 1 and team 2 bans would not make sense
iGames = iGames.drop(['Team_2_bans'], axis=1)

In [19]:
# check row count
len(iGames.index) # dataframe contains 51490 rows, split into test and train imminently

51490

In [None]:
# team1_bans = iGames['Team_1_bans'].apply(pd.Series)
# team2_bans = iGames['Team_2_bans'].apply(pd.Series)
# team1_bans = team1_bans.rename(columns=lambda x: 'Ban{}'.format(x+1))
# team2_bans = team2_bans.rename(columns=lambda x: 'Ban{}'.format(x+1))

In [None]:
# champion_ids = sorted(set(team1_bans.values.flatten()) | set(team2_bans.values.flatten()))
# champion_df = pd.DataFrame(0, index=iGames.index, columns=champion_ids)


In [None]:
# for ban_col in team1_bans.columns:
#     col_champions = team1_bans[ban_col].unique()
#     for champ_id in col_champions:
#         champion_df[champ_id] |= (team1_bans[ban_col] == champ_id)
# for ban_col in team2_bans.columns:
#     col_champions = team2_bans[ban_col].unique()
#     for champ_id in col_champions:
#         champion_df[champ_id] |= (team2_bans[ban_col] == champ_id)


In [21]:
# new_iGames = pd.concat([iGames.drop(['Team_1_bans', 'Team_2_bans'], axis=1), champion_df], axis=1)
# new_iGames

In [29]:
# Split data into training and testing sets
train_data = iGames.sample(frac=0.8, random_state=42)
val_data = iGames.drop(train_data.index)

# Create train and validation features
train_features = np.array(train_data['Team_1_bans'].tolist())
val_features = np.array(val_data['Team_1_bans'].tolist())

# Create training and validation labels
train_labels = np.array(train_data['winner'].tolist())
val_labels = np.array(val_data['winner'].tolist())

# # Convert data to TensorFlow tensors
# train_dataset = tf.data.Dataset.from_tensor_slices((train_data.iloc[:,3:], train_data.iloc[:,0]))
# test_dataset = tf.data.Dataset.from_tensor_slices((test_data.iloc[:,3:], test_data.iloc[:,0]))


In [30]:
train_features, train_labels 

(array([[ 27,  84,  31,  51,  28],
        [122, 238,  12, 516,  29],
        [  6, 238, 157,  75, 141],
        ...,
        [141,  75,  18,  25,  23],
        [ 81,  64, 141,   7, 157],
        [ 29,  80, 154,  19, 105]]), array([1, 1, 1, ..., 1, 2, 2]))

In [31]:
len(train_labels), len(train_features), len(val_labels), len(val_features)

(41192, 41192, 10298, 10298)

In [33]:
# Build simple LoLWinPredictor model
LoLWinPredictor = tf.keras.Sequential([
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile model
LoLWinPredictor.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train model
LoLhistory = LoLWinPredictor.fit(train_features, train_labels,
                    epochs=15, batch_size = 32,
                    validation_data=(val_features, val_labels))
