First you will have to install some packages.

In [None]:
!pip install -q sklearn

In [None]:
pip install -q tensorflow>=2 tfds-nightly matplotlib

In [None]:
!pip install -q kaggle

The following line is required only if you are running this on a notebook

In [None]:
%tensorflow_version 2.x

In this notebook, I will be using a dataset from kaggle.  
In order for you to load the dataset without downloading it first from kaggle and then uploading it, you can get a key from kaggle's API, which will be in a file ```kaggle.json```.
When you get this file, run the following line and upload it here.

In [None]:
# Upload kaggle API key file
uploaded = files.upload()

After uploading ```kaggle.json``` these lines will make sure that the file is in the right directory and only you have permissons to view it.<br>
Moreover, the third line downloads the dataset we will be working with.

In [None]:
!mkdir ~/.kaggle
!cp /content/kaggle.json ~/.kaggle/kaggle.json

In [8]:
!chmod 600 /root/.kaggle/kaggle.json 

In [None]:
!!kaggle datasets download -d gyejr95/league-of-legends-challenger-ranked-games2020

A quick check - If everything went smoothly after running the following command you should be seeing<br> these files : 
```kaggle.json```, ```league-of-legends-challenger-ranked-games-2020-zip```.

In [None]:
!ls

Here are the following imports we will be use on this notebook.
Note that the line ```%matplotlib inline``` is required only if you are running from a notebook

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf
import pandas as pd
from zipfile import ZipFile
import seaborn as sns
from google.colab import files
%matplotlib inline
sns.set_style('darkgrid')

Since there are actually 3 datasets zipped together in the dataset we have downloaded, here we open each file and create a dictionary which will be of the following format: ```{file_name: fileObject}```.<br>
Try ```print(dfs)``` if the format is not clear to you.

In [None]:
zip_file = ZipFile('league-of-legends-challenger-ranked-games2020.zip')

dfs = {text_file.filename: zip_file.open(text_file.filename)
       for text_file in zip_file.infolist()
       if text_file.filename.endswith('.csv')} 

In this notebook I will be using only one of the files.<br>
The following lines will load the data into three different dataframes.

In [36]:
# Import the test and train datasets into pandas dataframe
df_train_challenger = pd.read_csv(dfs['Challenger_Ranked_Games.csv'])
df_for_graphs = df_train_challenger.copy(deep=True)
df_test_challenger = df_train_challenger.copy(deep=True)

Here you can take a glimpse on how the dataset actually look like.

In [None]:
df_for_graphs.head()

Now, from the dataframes that are not used for graphing I will pop the columns we are trying to predict.

In [None]:
challenger_train = df_train_challenger[["blueWins"]]
challenger_test = df_test_challenger[["blueWins"]]

df_train_challenger.pop('blueWins')
df_train_challenger.pop("redWins")

The 'gameId' column is not relavent for the goal of this notebook, since I will be not connecting to Riot's API to get more info about each game so I will pop this column as well.

In [None]:
df_train_challenger.pop("gameId")

Here you can see some statistics about our dataset

In [None]:
df_train_challenger.describe()

The following graphs should help visualize the dataset and see if there are some imbalances or what columns predict a win the best.

In [None]:
df_train_challenger.gameDuraton.hist(bins=50)

In [None]:
df_train_challenger.blueFirstBlood.value_counts().plot(kind='barh')

In [None]:
df_train_challenger.blueFirstDragon.value_counts().plot(kind='barh')

In [None]:
df_train_challenger.blueWardPlaced.hist(bins=50)

In [None]:
pd.concat([df_train_challenger, challenger_train], axis=1).groupby('blueFirstTower').blueWins.mean().plot(kind='barh').set_xlabel('% Blue Won')

In [None]:
pd.concat([df_train_challenger, challenger_train], axis=1).groupby('blueFirstBaron').blueWins.mean().plot(kind='barh').set_xlabel('% Blue Won')

Each column will get a value from -1 to 1 based on how well the column predicts a win.<br>
add ```print(blue_corr)``` to see what are these columns (can be red_corr as well)

In [None]:
blue_corr = df_for_graphs.corr()['blueWins'][:].sort_values(axis=0, ascending=False) 
red_corr = df_for_graphs.corr()['redWins'][:].sort_values(axis=0, ascending=False) 

Here is a heatmap of all the columns with a correlation score above 0.3

In [None]:
corr_cols = [prop for prop,corr in blue_corr.iteritems() if abs(corr)>0.3 and prop != 'blueWins' and prop != 'redWins']
plt.figure(figsize=(26,26))
sns.set(font_scale = 1)
sns.heatmap(df_train_challenger[corr_cols].corr(), annot=True, linewidths=.5, linecolor='black', cmap="BuPu")

And another heatmap for correlation score above 0.5

In [None]:
corr_cols_2 = [prop for prop,corr in blue_corr.iteritems() if abs(corr)>0.5 and prop != 'blueWins' and prop != 'redWins']
plt.figure(figsize=(12,12))
sns.set(font_scale = 1)
sns.heatmap(df_train_challenger[corr_cols_2].corr(), annot=True, linewidths=.5, linecolor='black', cmap="BuPu")

Now we will create the model and train it

In [54]:
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_function():  # inner function, this will be returned
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  # create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000)  # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)  # split dataset into batches of 32 and repeat process for number of epochs
    return ds  # return a batch of the dataset
  return input_function  # return a function object for use

train_input_fn = make_input_fn(df_train_challenger, challenger_train)  # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(df_test_challenger, challenger_test, num_epochs=1, shuffle=False)

In [51]:
feature_columns = []
for feature_name in corr_cols_2:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

In [None]:
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)

In [73]:
linear_est.train(train_input_fn)  # train
result = linear_est.evaluate(eval_input_fn)  # get model metrics/stats by testing on tetsing data

clear_output()
print('This model predicts LoL wins in ' + str(result['accuracy']*100) + '% accuracy')  # the result variable is simply a dict of stats about our model

This model predicts LoL wins in 90.33229351043701% accuracy


WORK ON THIS PART- NOT COMPLETED

In [None]:
# Build a DNN with 2 hidden layers with 30 and 10 hidden nodes each.
classifier = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    # Two hidden layers of 30 and 10 nodes respectively.
    hidden_units=[30, 10],
    # The model must choose between 3 classes.
    n_classes=2)

In [None]:
def input_fn(features, batch_size=256):
#     # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)
# features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
features = ['blueTowerKills', 'blueFirstInhibitor', 'blueInhibitorKills']
predict = {}

print("Please type numeric values as prompted.")
for feature in features:
    val = input(feature + ": ")


predict[feature] = [float(val)]

predictions = classifier.predict(input_fn=lambda: input_fn(predict))
for pred_dict in predictions:
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    # print('Prediction is "{}" ({:.1f}%)'.format(
    #     [class_id], 100 * probability))