# League of Legends Win Predictor

**Author: Kyle Weesner**
[github](https://github.com/KyleWeesner)/[linked](https://www.linkedin.com/in/kyleweesner/)/[email](mailto:weesnerkew@yahoo.com)
 
![img](https://wallpaperaccess.com/full/217097.jpg)


## Overview

This project outlines the process of data collection, using Riots API to train algorithms.  We are doing this to predict the winnability of a game at 15 mins.  This win predicting model and its app is meant to be used as a time efficient rank climbing helper tool to help more serious gamers.  I'm here to pitch this model to Swift Gaming Company, to be used in the Blitz App for gamers.  This model will be used side by side as a win predictor tool.  

## Business Problem

Playing ranked games of League of Legends can take consumable amount of time.  An average game can take between 30-40 minutes.  At the fifteen minute mark, you have the option to surrender.  My model will predict whether your team will win or not at the fifteen minute mark.  Being able to know if you're going to win the match or not can save time for the losing team.  The losing team will have the option to surrender instead of playing the remainder of the game.  This app will be important for serious gamers who want to get in as many games possible in hope to maximize the amount of wins per gaming session.

## Data
Data was acquired through [Riots Developer Portal](https://developer.riotgames.com/).  An account with Riot is required along with agreeing with their general policies to have access to their API key.  Functions for data collection are located in `data_gathering_functions.py` and the process for collecting my dataset is located [Web_scraping_for_Final_Dataset](https://github.com/KyleWeesner/Catpstone-exploration/blob/main/Untitled_Folder/Web_scraping_for_Final_Dataset.ipynb). 

Features that went into the model were:
- Kills
- VisionScore 
- Assists
- CS 
- Levels
- Number Dragons Slayed
- Types of Dragons Slayed (Earth, Fire, Air, Hextech, Water)
- Number of Rift Heralds Slayed
- Number of Turrets Destroyed
- Number of Inhibitors Destroyed

## Tools
The predictive models were made with a combination of the data and the tools listed below:
- Pandas for data frame manipulation and data analysis.  
- Sklearn for statistical modeling and machine learning.
- Streamlit for app building.  

## Navigation

Follow these notebooks in order for the workflow.

1. [Web scraping/ Data Collection](https://github.com/KyleWeesner/Catpstone-exploration/blob/main/Web_scraping_for_Final_Dataset.ipynb)

2. [Model Building](https://github.com/KyleWeesner/Catpstone-exploration/blob/main/Modeling.ipynb)

# Webscraping for Data

### importing relevant libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import defaultdict
import string

#Python file containing functions I Developed. Location = ('./data_gathering_functions.py')
from data_gathering_functions import getAssists,getBuildingDestroyed,getcs, \
    getELITE_MONSTER_KILL,getKills,getlevel,getVisionScore,getwin 

#Libraries for webscaping
import requests
from riotwatcher import LolWatcher, ApiError
import json
import time

Establishing a way to get my API key stored on my local computer.  

I used riots Development api key which is a temporary key that needs to be refreshed daily.  https://developer.riotgames.com/

In [19]:
def get_keys(path):
    with open(path) as f:
        return json.load(f)

# Using the function to open and load all keys in that file 
api_keys = list((get_keys('/Users/weesn/Documents/Flatiron/secrets.json')).values())[0]

Setting up LolWatcher.  Found this set up through their documentation. https://riot-watcher.readthedocs.io/en/latest/

`lol_watcher` from how it appears manages requests limit.  I am using a developer API key which can for make 100 requests every 2 mins

In [20]:
lol_watcher = LolWatcher(f'{api_keys}')
my_region = 'na1'

me = lol_watcher.summoner.by_name(my_region, 'KyleWeez')
print(me)

# all objects are returned (by default) as a dict
# lets see if i got diamond yet (i probably didnt)
my_ranked_stats = lol_watcher.league.by_summoner(my_region, me['id'])
print(my_ranked_stats)

# First we get the latest version of the game from data dragon
versions = lol_watcher.data_dragon.versions_for_region(my_region)
champions_version = versions['n']['champion']

# Lets get some champions
current_champ_list = lol_watcher.data_dragon.champions(champions_version)
print(current_champ_list)

# For Riot's API, the 404 status code indicates that the requested data wasn't found and
# should be expected to occur in normal operation, as in the case of a an
# invalid summoner name, match ID, etc.
#
# The 429 status code indicates that the user has sent too many requests
# in a given amount of time ("rate limiting").

try:
    response = lol_watcher.summoner.by_name(my_region, 'this_is_probably_not_anyones_summoner_name')
except ApiError as err:
    if err.response.status_code == 429:
        print('We should retry in {} seconds.'.format(err.response.headers['Retry-After']))
        print('this retry-after is handled by default by the RiotWatcher library')
        print('future requests wait until the retry-after time passes')
    elif err.response.status_code == 404:
        print('Summoner with that ridiculous name not found.')
    else:
        raise

{'id': 'sxAViHUpUFO1U0yayOWQHuWFBfvQ4iYtUa3dMcxag3UK3eQ', 'accountId': 'TkRmi54Sa-lOuySeCliJQ8BTBRtvAY5gBHkkGJeL9gSs3MQ', 'puuid': 'f9jL5C1IrczsbmdVUxcv8ZcrkWqlpy_6YRP3VgvVNhUIM64Ml-Bd1PFcNM0_nACpWkumdLuj6Ax-gA', 'name': 'KyleWeez', 'profileIconId': 1388, 'revisionDate': 1653286274000, 'summonerLevel': 375}
[{'leagueId': '371f2307-b230-4972-be37-9647e7a277cf', 'queueType': 'RANKED_FLEX_SR', 'tier': 'GOLD', 'rank': 'IV', 'summonerId': 'sxAViHUpUFO1U0yayOWQHuWFBfvQ4iYtUa3dMcxag3UK3eQ', 'summonerName': 'KyleWeez', 'leaguePoints': 13, 'wins': 27, 'losses': 18, 'veteran': False, 'inactive': False, 'freshBlood': False, 'hotStreak': False}, {'leagueId': '0d0cb7a5-3699-4ffb-b246-2a37ceba3e52', 'queueType': 'RANKED_SOLO_5x5', 'tier': 'PLATINUM', 'rank': 'III', 'summonerId': 'sxAViHUpUFO1U0yayOWQHuWFBfvQ4iYtUa3dMcxag3UK3eQ', 'summonerName': 'KyleWeez', 'leaguePoints': 19, 'wins': 45, 'losses': 32, 'veteran': False, 'inactive': False, 'freshBlood': False, 'hotStreak': False}]
Summoner with that r

# Gathering list of High Elo players (Summoner Names)

Process to get Match Data:
1. Get Summoner Names of high elo players(Diamond, masters, grandmasters, and Challenger)
2. Get their IDs specifically their PUUID (PUUID is a requirement for getting match data)
3. Use PUUID to gather match data with functions and add them to a data frame

Variables that the wrapper function, `lol_watcher`, needs for gather SummonerNames

In [21]:
player_region= 'na1'
queue_type= 'RANKED_SOLO_5x5'

### Challenger Summoner Names

In [5]:
#The requests returns a json file
challenger_ladder= lol_watcher.league.challenger_by_queue(region= player_region, 
                                                          queue=queue_type) 
 #converting to a dataframe to find and locate their summonerName and return in a list
challenger = pd.DataFrame(challenger_ladder)
challenger_players= pd.DataFrame.from_dict(challenger_ladder['entries']) 
challenger_info_df = pd.merge(challenger,challenger_players, left_index=True, right_index=True)
challenger_sum_names = challenger_info_df.summonerName.tolist()

### Grandmaster summoner Names

Same process as Challenger Summoner Names

In [8]:
grandmaster_ladder= lol_watcher.league.grandmaster_by_queue(region= player_region, 
                                                          queue=queue_type)
grandmaster = pd.DataFrame(grandmaster_ladder)
grandmaster_players= pd.DataFrame.from_dict(grandmaster_ladder['entries'])
grandmaster_info_df = pd.merge(grandmaster,grandmaster_players, left_index=True, right_index=True)
grandmaster_sum_names = grandmaster_info_df.summonerName.tolist()

NameError: name 'lol_watcher' is not defined

### Master Summoner Names

Same process as Challenger Summoner Names

In [7]:
masters_ladder= lol_watcher.league.masters_by_queue(region= player_region, 
                                                          queue=queue_type)
masters = pd.DataFrame(masters_ladder)
masters_players= pd.DataFrame.from_dict(masters_ladder['entries'])
masters_info_df = pd.merge(masters,masters_players, left_index=True, right_index=True)
masters_sum_names = masters_info_df.summonerName.tolist()

### Diamond Summoner Names

Cannot get Diamond summoner Names the same way the other 3 ranks.
Diamond rank contains tiers so we need to make requests for each rank then combine the list together

In [8]:
tiers = ['I','II','III','IV']

In [9]:
#Diamond I contains an extra column (series/ also known as promos) so we need dropped them before combining the lists
D1 = pd.DataFrame((requests.get(f'https://na1.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/DIAMOND/{tiers[0]}?page=2&api_key={api_keys}')).json()).drop(columns='miniSeries')
D2 = pd.DataFrame((requests.get(f'https://na1.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/DIAMOND/{tiers[1]}?page=2&api_key={api_keys}')).json()) 
D3 = pd.DataFrame((requests.get(f'https://na1.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/DIAMOND/{tiers[2]}?page=2&api_key={api_keys}')).json()) 
D4 = pd.DataFrame((requests.get(f'https://na1.api.riotgames.com/lol/league/v4/entries/RANKED_SOLO_5x5/DIAMOND/{tiers[3]}?page=2&api_key={api_keys}')).json()) 

In [10]:
Diamond = [D1,D2,D3,D4] #combining each tier in Diamond to have one list
Diamond_elo = pd.concat(Diamond).reset_index() #turning Dimond list into a Dataframe
d_sum_names = list(Diamond_elo['summonerName']) #grabbing the Summoner Names from the Dataframe

### Combined List of Summoner Names  so now the list contains players that are only Diamond up rank

In [11]:
sum_list = challenger_sum_names + grandmaster_sum_names + masters_sum_names + d_sum_names

# Getting MatchId for `sum_list`

New Variable that is needed to get timeline data from matches

In [22]:
player_routing= 'americas' 

I found that Summoner names that contain special letters will return errors.  Need to figure out how to encode special characters with HTML requests or just not use them.

works = 'KyleWeez'

doesn't work with above code = 'JueJue ÒwÓ' , 'Ðèéþÿąń'



No need to contain all names.   Just need names so i can get Match data for data frame for my model.  So to deal with this i added a try/ except to skip those names that return and error with this code.  

For loop generate a list of MatchIds for the most recent games for each player in the list.

In [None]:
len(sum_list)

In [None]:
match_history = []

for player in sum_list:
    try:
        #wrapper limit for count is 100.
        summoner= lol_watcher.summoner.by_name(player_region, player)
        match_history.append(lol_watcher.match.matchlist_by_puuid(region= player_routing, puuid= summoner['puuid'],
                                                            queue= 420, 
                                                            start=0, count= 100)) 
    except:
        pass

In [49]:
match_history

[['NA1_4317007551',
  'NA1_4316951932',
  'NA1_4316926145',
  'NA1_4316847374',
  'NA1_4316841307',
  'NA1_4316776120',
  'NA1_4316779938',
  'NA1_4314588391',
  'NA1_4314548148',
  'NA1_4314471246',
  'NA1_4314303678',
  'NA1_4314279949',
  'NA1_4314256037',
  'NA1_4314158983',
  'NA1_4314164254',
  'NA1_4314139729',
  'NA1_4314103394',
  'NA1_4314077715',
  'NA1_4313852880',
  'NA1_4313818456',
  'NA1_4313796047',
  'NA1_4313703706',
  'NA1_4313609357',
  'NA1_4313602905',
  'NA1_4313492140',
  'NA1_4313474870',
  'NA1_4313470818',
  'NA1_4313371131',
  'NA1_4313317373',
  'NA1_4313353528',
  'NA1_4313340539',
  'NA1_4313203215',
  'NA1_4312363493',
  'NA1_4311460800',
  'NA1_4311364072',
  'NA1_4311253647',
  'NA1_4311196873',
  'NA1_4311108317',
  'NA1_4311074614',
  'NA1_4311051257',
  'NA1_4311035251',
  'NA1_4311032363',
  'NA1_4310557408',
  'NA1_4310497543',
  'NA1_4310402488',
  'NA1_4310333429',
  'NA1_4309826355',
  'NA1_4309862245',
  'NA1_4309496239',
  'NA1_4309439318',


In [15]:
#match_history  returns a list of lists.   Used sum to iterate through only have a list
all_matchIDs = sum(match_history, []) 

We only care about unique matches.  Making a list of unique matches

In [16]:
len(all_matchIDs)

510870

In [23]:
unique_matchIDs =list(set(all_matchIDs)) 

In [24]:
len(unique_matchIDs)

223246

Saved the matchId list because it is fairly large and I wont have to run all of the above requests to get MatchId again when notebook is closed. commented out after saving

In [28]:
# with open('matchId.txt', 'w') as f:
#     f.write(json.dumps(unique_matchIDs))
# #Now read the file back into a Python list object.
# with open('matchId.txt', 'r') as f:
#     unique_matchIDs = json.loads(f.read())

In [23]:
with open('matchId.txt', 'r') as f:
    unique_matchIDs = json.loads(f.read())

In [24]:
unique_matchIDs

['NA1_4307254820',
 'NA1_4145811121',
 'NA1_4111806763',
 'NA1_4267117332',
 'NA1_4291432747',
 'NA1_4273148025',
 'NA1_4284177004',
 'NA1_4291306286',
 'NA1_4252556662',
 'NA1_4311266313',
 'NA1_4291330863',
 'NA1_4262948903',
 'NA1_4235017145',
 'NA1_4309931607',
 'NA1_4301521490',
 'NA1_4293638341',
 'NA1_4092962267',
 'NA1_4285647593',
 'NA1_4252341265',
 'NA1_4310532221',
 'NA1_4284353206',
 'NA1_4235992507',
 'NA1_4295252380',
 'NA1_4309660798',
 'NA1_4258215849',
 'NA1_4300900680',
 'NA1_4254231745',
 'NA1_4286198538',
 'NA1_4285801874',
 'NA1_4308952695',
 'NA1_4070343273',
 'NA1_4286643264',
 'NA1_4296651524',
 'NA1_4314709226',
 'NA1_4213822518',
 'NA1_4238511392',
 'NA1_4275020150',
 'NA1_4223609829',
 'NA1_4303727351',
 'NA1_4250310476',
 'NA1_4302161639',
 'NA1_4276506803',
 'NA1_4307912076',
 'NA1_4295904881',
 'NA1_4168019734',
 'NA1_4232536469',
 'NA1_4250403731',
 'NA1_4301788780',
 'NA1_4179003084',
 'NA1_4284732960',
 'NA1_4196009889',
 'NA1_4314026169',
 'NA1_422142

# Creating Dataframe

for loop for getting the required match data for each game from functions made in `Developing_functions.ipynb`.  This notebook may be a little messy with little markdown.  The for loop can go through 223246 games but i stopped it at ~23000 due to time constraint.  The for loop appends to the outside of the for loop so you can stop it early and still retain what it collected

Made a try/ except with a timer.  Did not have problems with my functions but riots server occasionally has maintenance  breaks so except with a timer it will let the for loop continuously run until it completes the run or interrupting  the kernel.

In [39]:
#appending each game to a list.  Makes each row when turned into a Dataframe
gameInformationList = [] 
#created a break list for times it excepts for requests errors.  Can be used for Diagnosing problems with requests and for games
#that may not have been collected during a maintenaince break
games_break = [] 
#This list catches games that fail for any error which could be a problem with my functions.  Need to diagnose problems with
#function if need be
function_breaks = []

for game in unique_matchIDs[:]:
    try:
       #game data that are used in the features are found in two different locations, so making two requests per game in loop.  
        timeline = lol_watcher.match.timeline_by_match(region= player_routing, match_id= game) 
        matchdata = lol_watcher.match.by_id(region= player_routing, match_id= game)    

        #this if statement filters out only games that last more than 15 mins
        if len(pd.DataFrame(pd.DataFrame(timeline)['info'][4])['participantFrames']) >= 16:

            #If something breaks it will continuring running. except appends the matchId to a list for future diagnosing
            try: 
                #information on elite monsters and buildings (objectives) are gathered.  
                monster = getELITE_MONSTER_KILL(timeline)
                blue_info = monster[monster['killerTeamId'] == 100]
                red_info = monster[monster['killerTeamId'] == 200]

                buildings = getBuildingDestroyed(timeline)
                blue_buildings = buildings[buildings.teamId == 100]
                red_buildings = buildings[buildings.teamId == 200]

                #creating the "columns" for the dataframe.
                gameInformation = {} 
                gameInformation['matchId'] = game
                
                #most of the functions with 0 is blue team and 1 is red team.
                gameInformation['blue_team_kills'] = getKills(timeline)[0] 
                gameInformation['red_team_kills'] = getKills(timeline)[1]

                gameInformation['blue_team_visionScore'] = getVisionScore(timeline)[0]
                gameInformation['red_team_visionScore'] = getVisionScore(timeline)[1]

                gameInformation['blue_team_assists'] = getAssists(timeline)[0]
                gameInformation['red_team_assists'] = getAssists(timeline)[1]

                gameInformation['blue_team_cs'] = getcs(timeline)[0]
                gameInformation['red_team_cs'] = getcs(timeline)[1]

                gameInformation['blue_team_level'] = getlevel(timeline)[0]
                gameInformation['red_team_level'] = getlevel(timeline)[1]

                gameInformation['blue_dragons_slained'] = (blue_info.monsterType == 'DRAGON').sum()
                gameInformation['red_dragons_slained'] = (red_info.monsterType == 'DRAGON').sum()

                gameInformation['blue_rift_heralds_slained'] = (blue_info.monsterType == 'RIFTHERALD').sum()
                gameInformation['red_rifts_heralds_slained'] = (red_info.monsterType == 'RIFTHERALD').sum()
                gameInformation['blue_AIR_DRAGON'] = (blue_info.monsterSubType == 'AIR_DRAGON').sum()
                gameInformation['red_AIR_DRAGON'] = (red_info.monsterSubType == 'AIR_DRAGON').sum()
                gameInformation['blue_EARTH_DRAGON'] = (blue_info.monsterSubType == 'EARTH_DRAGON').sum()
                gameInformation['red_EARTH_DRAGON'] = (red_info.monsterSubType == 'EARTH_DRAGON').sum()
                gameInformation['blue_FIRE_DRAGON'] = (blue_info.monsterSubType == 'FIRE_DRAGON').sum()
                gameInformation['red_FIRE_DRAGON'] = (red_info.monsterSubType == 'FIRE_DRAGON').sum()
                gameInformation['blue_HEXTECH_DRAGON'] = (blue_info.monsterSubType == 'HEXTECH_DRAGON').sum()
                gameInformation['red_HEXTECH_DRAGON'] = (red_info.monsterSubType == 'HEXTECH_DRAGON').sum()
                gameInformation['blue_WATER_DRAGON'] = (blue_info.monsterSubType == 'WATER_DRAGON').sum()
                gameInformation['red_WATER_DRAGON'] = (red_info.monsterSubType == 'WATER_DRAGON').sum()

                gameInformation['blue_inhibitors_destroyed'] = len(blue_buildings[blue_buildings.buildingType == 'INHIBITOR_BUILDING'])
                gameInformation['red_inhibitors_destroyed'] = len(red_buildings[red_buildings.buildingType == 'INHIBITOR_BUILDING'])

                gameInformation['blue_towers_destroyed'] = len(blue_buildings[blue_buildings.buildingType == 'TOWER_BUILDING'])
                gameInformation['red_towers_destroyed'] = len(red_buildings[red_buildings.buildingType == 'TOWER_BUILDING'])

                gameInformation['win'] = getwin(matchdata)
                gameInformationList.append(gameInformation)
            except:
                function_breaks.append(game)
        #Pass games less than 15 mins.  Not needed.  Win predictor for games at 15 mins
        else:
            pass
    except:
        print('request error starting sleep. Most likely 503')
        time.sleep(60)
        print(f'game error was {unique_matchIDs.index(game)} will continue to next matchId')

503 error starting sleep
resuming at 10920
503 error starting sleep
resuming at 11122
503 error starting sleep
resuming at 11706
503 error starting sleep
resuming at 13889
503 error starting sleep
resuming at 14304
503 error starting sleep
resuming at 16853
503 error starting sleep
resuming at 16879
503 error starting sleep
resuming at 18213
503 error starting sleep
resuming at 20238
503 error starting sleep
resuming at 21143
503 error starting sleep
resuming at 23080
503 error starting sleep
resuming at 23131
503 error starting sleep
resuming at 23182
503 error starting sleep


KeyboardInterrupt: 

Stopped at ~23k games/rows for the dataframe.

Saved data collected to a csv file

In [None]:
df = pd.DataFrame(gameInformationList)
df.to_csv('../data/full.csv')

Continue to the [Modeling](https://github.com/KyleWeesner/Catpstone-exploration/blob/main/Modeling.ipynb) notebook