# 1. Data Extraction

## Table of Contents:
- [Imports](#Imports)
- [Ratelimit](#Ratelimit)
- [Summoner Names](#Summoner-Names)
- [PUUID](#PUUID)
- [Game ID](#Game-ID)
- [Match Information](#Match-Information)
- [Data Extraction](#Data-Extraction)


### Imports

In [None]:
import pandas as pd
import numpy as np
import requests
import json
from pandas.io.json import json_normalize
from ratelimit import limits, sleep_and_retry
import requests

### Ratelimit

Using below code snippet, we were able to set up a timed limiter that will continuously make requests through the API in a timed fashion so that it does not surpass the number of requests beyond the API's given threshold.  RIOT API has a limit of 120 requests per 2 minutes so we set our ratelimit threshhold at 90 calls per every 2 minutes in efforts to not timeout while making requests continously.

In [None]:
# https://pypi.org/project/ratelimit/

two_minutes = 120

@sleep_and_retry
@limits(calls=90, period=two_minutes)
def call_api(url):
    response = requests.get(url)

    if response.status_code != 200:
        raise Exception('API response: {}'.format(response.status_code))
    return response


### Summoner Names

We will first create a list of combinations of player's rank tiers and divisions so that we can substitute the concatenated list into the tft/league/v1/entries/ API query to obtain all the players in game IDs (summoner name) that exist in the ranking system for North America.

In [None]:
# Global Variables for API 
KEY = 'RGAPI-676264e0-fb2d-45c8-8db6-612cd065d286'
API_BASE = 'https://na1.api.riotgames.com/'
API_BASE_REGION = 'https://americas.api.riotgames.com/'
 
# Setting up combinations of player's Rank Tier to query into RIOT API
TIERS = ['IRON' , 'BRONZE' , 'SILVER', 'GOLD', 'PLATINUM', 'DIAMOND']
DIVISIONS = ['I' , 'II', 'III', 'IV']
TIERS_LIST = []
for tier in TIERS:
  for division in DIVISIONS:
    TIER_DIVISION = tier + '/' + division
    TIERS_LIST.append(TIER_DIVISION)

TIER_URLS = []
for tiers_divis in TIERS_LIST: 
  url = API_BASE + 'tft/league/v1/entries/' + tiers_divis + '?page=1&api_key=' + KEY
  TIER_URLS.append(url)

In [None]:
TIER_URLS

In [None]:
# Get list of each summoner in each tier
tier_frames = []

for url in TIER_URLS:
  REQUEST_TIERS = call_api(url)
  tier_data = json.loads(REQUEST_TIERS.text)
  tier_df = json_normalize(tier_data)
  tier_frames.append(tier_df)
divisions = pd.concat(tier_frames, ignore_index=True, sort=True)

In [None]:
divisions['division_tier'] = divisions['tier'] + '_' + divisions['rank'] 
divisions

In [None]:
divisions

In [None]:
# Get summoner name from divisions data frame / keeping it at 5 for now for testing
summoner_names = divisions.loc[ :, 'summonerName' ]

In [None]:
summoner_names

### PUUID

RIOT API assigns distinct puuid values for each player which can be used to look up match data.  Using the list we created previously, we will traverse through the list and run them into the tft/summoner/v1/summoners query to obtain the list of corresponding puuid for the summoners names.

In [None]:
summoner_frames = []

for summoner in summoner_names:
  summoner_url = API_BASE + 'tft/summoner/v1/summoners/by-name/' + summoner + '?api_key=' + KEY
  REQUEST_NAME = call_api(summoner_url)
  summoner_data = json.loads(REQUEST_NAME.text)
  summoner_df = json_normalize(summoner_data)
  summoner_frames.append(summoner_df)
join_summoners = pd.concat(summoner_frames, ignore_index=True, sort=True)

In [None]:
join_summoners = join_summoners.merge(divisions[['summonerName','division_tier']],left_on='name', right_on = 'summonerName')

In [None]:
join_summoners.head()

In [None]:
# Get puuid from summoners data frame
summoner_puuid = join_summoners.loc[ : , 'puuid' ]

In [None]:
summoner_puuid

### Game ID

Using the list of puuid, we query them into tft/match/v1/matches/by-puuid/ to obtain a list of the first match game IDs corresponding to each of the player's puuid.  We set our match_count equals to 1 so that we can obtain a new list of match IDs from the most recent match of all the players from all rank divisions and tiers.  We had to manually set the match_count to 1 when a single query maximum is 20 due to the Riot server saving entries of players with minimum of 1 game.  This ensured that we had data for each of the players being looked up however, when the players ONLY had 1 recent match, it would result in a flood of null values so we had to work around the limitations of the API.

In [None]:
game_id_frames = []
match_count = '1'

for puuid in summoner_puuid:
  match_url = API_BASE_REGION + '/tft/match/v1/matches/by-puuid/' + puuid + '/ids?count=' + match_count + '&api_key=' + KEY
  REQUEST_MATCH = call_api(match_url)
  match_data = json.loads(REQUEST_MATCH.text)
  match_df = pd.Series(match_data)
  game_id_frames.append(match_df)
game_id_data = pd.concat(game_id_frames, ignore_index=True, sort=True)

In [None]:
# need to convert to series
game_id_list = game_id_data.loc[:]

In [None]:
game_id_list

### Match Information

As our final step, we will run the query on the list of game IDs into tft/match/v1/matches/ to obtain the match details.  Match details provided from RIOT API consists of all 8 players information on the team compositions and champions used at the moment of their elimination.  For example, if a player is eliminated earlier on, the pieces that made up his board when he was knocked out will be recorded.  For a player who achieve first place, their final board pieces at the moment of victory is recorded.

Since the information we are interested in is stored in a list of dictionaries in a dataframe, we need to convert the data so that it is ready for our modeling process.  As for the details of the team compositions and champion units, the dictionary for the details includes their names and tiers so we will combine the names and tiers into a names_tier format to be analyzed further through our models.  Team composition tiers range from 0 to 2 where 0 means that there are no tier bonuses but the player still has some units that can form the team composition, and 2 means they have rightfully obtained all neccessary units to achieve the best team composition bonus.  As for the champion units, their tiers range from 1 to 3 where three units of tier 1 combines to a single tier 2 unit and three units of tier 2 combines to a single tier 3 unit.

In [None]:
match_info_frames = []

for game_id in game_id_list:
  result_url = API_BASE_REGION + 'tft/match/v1/matches/' + game_id + '?api_key=' + KEY
  REQUEST_RESULT = call_api(result_url)
  result_data = json.loads(REQUEST_RESULT.text)
  result_df = json_normalize(result_data)
  match_info_frames.append(result_df)
match_info_data = pd.concat(match_info_frames, ignore_index=True, sort=True)

In [None]:
match_info_data

In [None]:
match_info_data.columns

In [None]:
pd.options.display.max_seq_items = 2000

In [None]:
match_info_data['info.participants'][2]

In [None]:
participants_df = []
for num in range(len(match_info_data['info.participants'])):
    
    data = match_info_data['info.participants'][num]
    data_df = pd.DataFrame.from_dict(data)
    participants_df.append(data_df)
    game_df = pd.concat(participants_df, ignore_index=True, sort=True)

game_df

In [None]:
range(len(game_df['units']))

In [None]:
traits_data = game_df['traits']
traits_df = pd.DataFrame.from_dict(traits_data)
traits_df['traits'][0]

In [None]:
units_data = game_df['units']
units_df = pd.DataFrame.from_dict(units_data)
units_df['units'][1]

In [None]:
# join the unit name with tier value in a list separated by comma
game_df['units_name'] = [','.join([unit['name']+'_'+str(unit['tier']) for unit in units]) for units in game_df['units']]

# create dummy columns of the list of 'unit_tier' per player and add it to original df
units_df = game_df['units_name'].str.get_dummies(sep = ',')

In [None]:
# join the trait name with set tier value in a list separated by comma
game_df['trait_name'] = [','.join([trait['name']+'_'+str(trait['tier_current']) for trait in traits]) for traits in game_df['traits']]

# create dummy columns of the list of 'unit_tier' per player and add it to original df
traits_df = game_df['trait_name'].str.get_dummies(sep = ',')

In [None]:
# join the dummy columns with original game_df
game_df = game_df.join(units_df)
game_df = game_df.join(traits_df)


In [None]:
game_df = game_df.merge(join_summoners[['puuid','summonerName','division_tier']],left_on='puuid', right_on ='puuid')

### Data Extraction 

Due to the limitations of RIOT API having empty server responses for some missing data, we had to separate our query process into 5 different pieces to be combined in the next pre-processing section.  To show example of methodology, all processes above should be run as written while below code should be edited to save data separately from this project's data source.

In [None]:
# game_df previously have been saved as 1k_data.csv to 5k_data.csv
game_df.to_csv('./datasets/6k_data.csv')