In [1]:
import riotwatcher
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import json

%matplotlib inline

## Retrieving User Data from Riot API Using Riotwatcher

We can use the Python package riotwatcher (https://github.com/pseudonym117/Riot-Watcher) to utilize Riot's API (https://developer.riotgames.com/)and get relevant data of a player and their matches.

We create an instance of LolWatcher, which is an interface between Python and Riot's API, where we can call various methods to look at data from Riot's API. To do this we need an api_key from Riot games. You can do this by going on the Riot API website provided in the cell above. The API key is what let's riot know who is accessing their data. API keys should not be shared with anyone. If you are going to use riotwatcher code (or some version of this code) for anything, make sure you edit out your API key.

In [2]:
watcher = riotwatcher.LolWatcher(api_key = 'RGAPI-48517173-a31d-40b7-9425-eec44901d018')

Let's use riotwatcher to get identifying information on Doublelift

## Weird Caveats to summoners

I tried grabbing data for Doublelift and found that the data i grabbed did not match up with that of OP.gg or U.gg.

The last match that they have is from 3 days ago. I believe that they are not able to grab data as frequently as I expected, at least for today for whatever reason. I have no idea why this is the case.

Knowing this, we will only compare our data acquired to sites like OP.gg and U.gg. This will be done outside this notebook and I encourage you to cross-validate your data from these sites.

In [3]:
#We will grab username of the player of interest
summoner_name = 'puddingpiler'

#We can use the LolWatcher instance to get some data on the user's identifying information
watcher.summoner.by_name(region = 'na1', summoner_name = summoner_name)

{'id': '1-lKe0sKHoomm1cGKTMGm3usHkmg0g-WD420VYOlLW_MS1I',
 'accountId': 'TLGnXMAPXufbdp4-QHxRNZH4YxgyOh3bvVyr0KEs8ao2-Ac',
 'puuid': 'IhFJhHpKEN_vNez9P-N98_45C83HsUxDdyTAWrJYaoDdCt4SN0TPxeQ96rar0hT9njemIRezJ2gQrQ',
 'name': 'puddingpiler',
 'profileIconId': 4052,
 'revisionDate': 1647387185000,
 'summonerLevel': 413}

## Data Retrieval

We will now use riotwatcher to retrieve data from Riot Games. The data we will be retrieving here is data from the past 100 matches of the player Doublelift. Riotwatcher can only download 100 games per one call of watcher.match.matchlist_by_puuid, so that is the amount of games that we will analyze.

In [32]:
#Getting the puuid, a string identifying a player
my_puuid = watcher.summoner.by_name('na1', summoner_name)['puuid']

#How many matches do you want the data from (max is 100 for LolWatcher)
count = 10


#Grab list of {count} last match id's
match_list = watcher.match.matchlist_by_puuid(region = 'americas', puuid = my_puuid, count = count)


#We will append various data from the matches onto these empty lists
match_id = []
kills = []
deaths = []
assists = []
gold_total = []
queue_id = []
win = []


#Loop over all these matches 
for match in match_list:
    
    #We want to append the match identifier string
    match_id.append(match)
    
        
    #Grabs dictionary of match info from Riot API
    match_json = watcher.match.by_id('americas', match)
    
    
    #Grabs the part of the dictionary that contains non-metadata
    match_details = watcher.match.by_id('americas', match)['info']
    
    
    #Saving the match details to a json file for future use
    with open("Riot_API_Folder/{}_match_details_{}.json".format(summoner_name, match), "w") as outfile:
        json.dump(match_json, outfile)

    
    #Identify the player of interest in each match
    #Since there are 10 players we need a way to identify the data of the player of interest
    #I do this by looping through the players' puuid and picking the one that matches the
    #puuid of the player of interest
    for player in match_details['participants']:

        if player['puuid'] == my_puuid:
            
            #These are the metrics that we are interested in
            #There are many metrics provided by Riot's API, and in future projects
            #I will be more comprehensive with the features I choose
            kills.append(player['kills'])
            deaths.append(player['deaths'])
            assists.append(player['assists'])
            gold_total.append(player['goldEarned'])
            queue_id.append(match_details['queueId'])
            win.append(player['win'])

        else: pass


## Importing Data From JSON Files

# NOTE:

## Only run next cell of code if you have downloaded relevant JSON files from cell above

I have downloaded the json files from Riot's API from the relevant code above so we can use it later. This cell above takes a little while to run and saving data in its completeness ensures I can go back and do any kind of analysis swiftly.

In the code cell above we created lists of relevant player data. However, we may want to do some extra analysis on this data and may not want to rerun this cell above. This code cell below walks us through how to create the same lists as the code cell above. Skip this cell if you are creating the lists from the cell above.

In [51]:
import os

#Getting the filenames in the folder containing match json files
#NOTE: if there are any other files in here OR there are duplicate files, you need to do some cleaning 
#in that directory
filenames = [name for name in os.listdir('Riot_API_Folder/')]


#Getting the number of files in the folder containing match json files. 
file_counts = len(filenames)



match_id = []
kills = []
deaths = []
assists = []
gold_total = []
queue_id = []
win = []


for i in range(file_counts):
    
    #with open('Riot_API_Folder/{}'.format(filenames[i])) as match_json:
    #    match_json = json.load(match_json)
        
    match_id.append(filenames[i])
    
    for player in match_json['info']['participants']:
        
        if player['puuid'] == my_puuid:
            
            kills.append(player['kills'])
            deaths.append(player['deaths'])
            assists.append(player['assists'])
            gold_total.append(player['goldEarned'])
            queue_id.append(match_details['queueId'])
            win.append(player['win'])
            
    
            
            #Used for testing if you get the same data from other sources (Like OP.gg or U.gg)
            '''print(match_json['info']['participants'][0]['kills'],
                  match_json['info']['participants'][0]['deaths'],
                  match_json['info']['participants'][0]['assists'])'''
            
            
        else: pass

## Cleaning Data

Riot's API identifies a game type such as Ranked Solo/Duo, ARAM, Draft through identifying numbers called queue_id.The relevant ones for the data I'll be using are:
- 400 -> Draft Pick
- 420 -> Ranked Solo/Duo
- 450 -> ARAM
- 900 -> ARURF

However, Riot has an exhaustive list of all available queue ID's found here (https://static.developer.riotgames.com/docs/lol/queues.json). You can write a few lines of code that gets the queue type from the queue id that utilizes this json file, if you wish.

In [52]:
#We will create a new list that replaces these nubmers with string of match type
match_type = []

for queue in queue_id:
    
    if queue == 420:
        match_type.append('Ranked')
    
    elif queue == 450:
        match_type.append('ARAM')
        
    elif queue == 400:
        match_type.append('Draft')
        
    elif queue == 900:
        match_type.append('ARURF')
        
    else:
        match_type.append(queue)

## Creating the Pandas dataframe

In [53]:
#Joining together all data acquired through LolWatcher
df = pd.DataFrame({'match_id': match_id, 
                   'kills': kills, 
                   'deaths': deaths, 
                   'assists': assists, 
                   'gold_earned': gold_total,
                   'match_type': match_type,
                   'win': win
                  })

In [54]:
df.head()

Unnamed: 0,match_id,kills,deaths,assists,gold_earned,match_type,win
0,.ipynb_checkpoints,5,7,17,13868,Draft,False
1,puddingpiler_match_details_NA1_4245705629.json,5,7,17,13868,Draft,False
2,puddingpiler_match_details_NA1_4245731071.json,5,7,17,13868,Draft,False
3,puddingpiler_match_details_NA1_4246871711.json,5,7,17,13868,Draft,False
4,puddingpiler_match_details_NA1_4246887442.json,5,7,17,13868,Draft,False


## More Cleaning

In [37]:
#Replacing the win column of booleans with 1 or 0 instead of the True/False values given
#1 indacates a win and 0 indicates loss
df['win'] = df['win'].apply(lambda x: 1 if x == True else 0)

## Separating Game Modes

We want to separate all the different game modes from each other (ARAM, Ranked, etc.) so that we may do data analysis on data from one game type at a time.

In [38]:
#Getting a count of what kind and how many matches there are
df['match_type'].value_counts()

Draft    11
Name: match_type, dtype: int64

In [39]:
#Creating dataframes of gametypes
#Dropping the match_type columns since it will be redundant with the name of the dataframe

ARAM = df[df['match_type'] == 'ARAM'].drop(['match_type'], axis = 1)
Draft = df[df['match_type'] == 'Draft'].drop(['match_type'], axis = 1)
ARURF = df[df['match_type'] == 'ARURF'].drop(['match_type'], axis = 1)
Ranked = df[df['match_type'] == 'Ranked'].drop(['match_type'], axis = 1)


#Checking if data separated correctly
print (len(ARAM), len(Draft), len(ARURF), len(Ranked))

0 11 0 0


## Checking Out Dataframe of Ranked games

In [40]:
Ranked.head()

Unnamed: 0,match_id,kills,deaths,assists,gold_earned,win


## Predicting Wins using a Logistic Regression Model

We now want to answer the questions: 
- Can we predict if someone will win based on user metrics, such as kills, deaths, etc.?
- What independent variables are good predictors for a win or lose?

We can start answering these questions by using a logistic regression model to predict whether or not a player wins or loses where the dependent variables are those given in the dataframe created earlier.

If we find that these independent variables are not good predictors, then we can always go back and insert new types of variables from match data into our logistic regression model as an extension of this project.

## Building a Logistic Regression Model

To build any machine learning model, we must divide our avaiable data into three datasets:
- training dataset : Used to train the model to predict outcomes
- testing dataset : Data used to test the model created through the training dataset. Can tweak until we have reasonable predictions
- validation dataset : Final testing dataset for the model. If the model does not predict well enough, then we start over with our model. Either toss the model, or review if you made enough reasonable changes to your model in the first place in the testing phase.

We will not be validating this dataset as I believe there is not enough data with 100 points.

In [41]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

### Training the Model

# DELETE

In [46]:
X

Unnamed: 0,kills,deaths,assists,gold_earned


In [47]:
X = ARAM[['kills', 'deaths', 'assists', 'gold_earned']]
y = ARAM['win']

In [48]:
X

Unnamed: 0,kills,deaths,assists,gold_earned


In [50]:
y

Series([], Name: win, dtype: int64)

In [44]:



#We will split our dataframe to train and test our model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

ValueError: With n_samples=0, test_size=0.3 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

In [26]:
#X will be the independent variables for our model given by kda and gold earned
#y will be the independent variable, which is categorical and indacates a win or loss
X = Ranked[['kills', 'deaths', 'assists', 'gold_earned']]
y = Ranked['win']

#We will split our dataframe to train and test our model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

ValueError: With n_samples=0, test_size=0.3 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

In [28]:
#Creating an Instance of the Logistic regression model form scikit-learn
LogModel = LogisticRegression()

In [31]:
X_test

Unnamed: 0,kills,deaths,assists,gold_earned
0,11,10,14,15482
1,11,10,14,15482


In [30]:
#Fitting the training data onto our logistic regression model
LogModel.fit(X_train, y_train)

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1

### Testing the Model

In [None]:
#Predicting from our test data whether we win or lose a game
y_predict = LogModel.predict(X_test)

## Analyzing Model Predictions

In [None]:
We can analyze the model predictions a few different ways. Here we will be 

In [None]:
sns.countplot(x = x = range(len(y_predict)), y = np.abs(y_predict - y_test))

In [None]:
#1 indicates an incorrect prediction of win or lose, 
#0 indicates a correct prediction of win or lose
sns.scatterplot(x = range(len(y_predict)), y = np.abs(y_predict - y_test))
plt.ylim(-.1,1.1)

In [None]:
from sklearn.metrics import confusion_matrix

#Confusion matrix giving:
#    True Positive    False Positive
#    False Negative   True Negative
confusion_matrix(y_test, y_predict)