# Capstone Project - League of Legends Champion Recommender

> Author: Ryan Yong

**Summary:**   
- Develop a Recommender System for recommending champions to users based on their account mastery points.
- Training data: Account & Champion Data

There are a total of 7 notebooks for this project:  
 1. `01a_data_scrape.ipynb`   
 2. `01b_wiki_scrape_fail.ipynb`   
 3. `02_champion_dataset_EDA.ipynb`
 4. `03_account_dataset_EDA.ipynb`
 5. `04_intial_recommender_system.ipynb`
 6. `05_final_hybrid_system.ipynb`
 7. `06_implementation.ipynb`

---
**This Notebook**
- Scrapes the account data and champion mastery scores
- Creates the `account_matery_dataset.csv` and `accounts_dataset.csv`

In [None]:
import requests
import pandas as pd
import time
from tqdm import tqdm
from riotwatcher import RiotWatcher, ApiError, LolWatcher


### 1. API_KEY

Replace 'API_KEY' with your actual Riot Games Developer API Key

In order to obtain a Riot Games Developer API Key, use this [link](https://developer.riotgames.com) here to make an account and generate a temporary API Key. Copy it into the variable labelled [API_KEY](#API_KEY) to attempt the scrape.

In [None]:
# Replace 'YOUR_API_KEY' with your actual Riot Games API key
API_KEY = 'YOUR_API_KEY'

## 2. Test use for RiotWatcher and LolWatcher

RiotWatcher and LolWatcher are python wrappers for the Riot Games API keys. for documentation please click this link [here](https://riot-watcher.readthedocs.io)

note for account_region and summoner_region, the difference is due to the legacy code of older LoL APIs compared to the newer Riot APIs. See image for further details

Account Region Options:
![account_region](../images/account_region_options.png)


Summoner Region Options:
![summoner_region](../images/summoner_region_options.png)


## Current Use Case:

For the current use case for this scrape, the account region was set to 'asia' and the summoner region set to 'SG2' in order for the dataset to be localised to the Singapore region. For future work the dataset should intend to scrape for multiple regions, if not all, to account for the variance between regions. 

The code block below is the initial test attempted using the Riot Games ID#Tag format. This was to ensure that in the final product, a user was able to input their game name & tag and have their account mastery scores obtained for input into the model.

In [None]:
# Initialize RiotWatcher with your API key
riot_watcher = RiotWatcher(API_KEY)
lol_watcher = LolWatcher(API_KEY)

# List of summoner names in the format "game name#tag name"
summoner_names = [
    'Atrophy#Fiend',  # Example summoner name, for the case of this my personal accoutn will serve as an example.
    
]

# Region to search for summoners
account_region = 'asia'  # Replace with desired account region shown in image above
summoner_region = 'SG2'  # Replace with desired summoner region shown in image above


def get_puuid(game_name, tag_name, account_region):
    try:
        account = riot_watcher.account.by_riot_id(account_region, game_name, tag_name)
        return account['puuid']
    except ApiError as e:
        print(f"Error occurred while fetching PUUID for {game_name}#{tag_name}: {e.response.text}")
        return None

def get_champion_masteries(puuid, summoner_region):
    if puuid:
        try:
            champion_masteries = lol_watcher.champion_mastery.by_puuid(summoner_region, puuid)
            return champion_masteries
        except ApiError as e:
            print(f"Error occurred while fetching champion masteries for {puuid}: {e.response.text}")
    return []

# Dictionary to store champion masteries for each summoner
mastery_data = {}

# Scrape champion masteries for each summoner
for summoner_name in summoner_names:
    game_name, tag_name = summoner_name.split('#')
    puuid = get_puuid(game_name, tag_name, account_region)
    champion_masteries = get_champion_masteries(puuid, summoner_region)
    mastery_data[summoner_name] = {mastery['championId']:mastery['championPoints'] for mastery in champion_masteries}

# Create DataFrame from mastery_data
df = pd.DataFrame.from_dict(mastery_data, orient='index')

# Fill NaN values with 0
df.fillna(0, inplace=True)

# Display the DataFrame
print(df)


                       51        22        64        92        145     81   \
Atrophy#Fiend     683542.0  356429.0  325494.0  259217.0  242040.0  240035   
Aradia#NaCl        67287.0   52300.0   12385.0   39951.0    2022.0  134905   
Commet#OG1          4730.0   22185.0   34275.0   15025.0   83919.0  119005   
Asura#8186          1877.0    4582.0       0.0     192.0     111.0   29949   
Clairvoyant#8721  108140.0   64302.0     807.0       0.0    4650.0   67079   
Dorainen#FKKK      79988.0   51084.0   15200.0   60754.0   80795.0   65855   
Froggers#0002       6010.0    3469.0    4634.0       0.0     166.0    7754   
Ruthfu#SLK69        7544.0       0.0       0.0       0.0       0.0    2910   
23parent#SG2        7916.0   24616.0   21981.0   47071.0   41497.0   62728   
Vlone#8882        110329.0   58526.0  109597.0   83693.0  100167.0  141193   
TheFrog#0005           0.0     206.0       0.0       0.0       0.0     189   

                       50        21      67      412  ...     2

As shown above, given a user's gamename#tag, a dataset with the account's mastery scores with every champion is given. However, it is currently arranged in order of the descending mastery score from left to right (based only on the first account). 

In [None]:
df.head()

Unnamed: 0,51,22,64,92,145,81,50,21,67,412,...,240,910,234,902,200,233,897,887,876,950
Atrophy#Fiend,683542.0,356429.0,325494.0,259217.0,242040.0,240035,233260.0,208561.0,201080,196786,...,425.0,133.0,105.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Aradia#NaCl,67287.0,52300.0,12385.0,39951.0,2022.0,134905,11261.0,38910.0,117580,514606,...,1440.0,772.0,0.0,8674.0,0.0,0.0,0.0,0.0,0.0,0.0
Commet#OG1,4730.0,22185.0,34275.0,15025.0,83919.0,119005,78842.0,119.0,69478,89070,...,6153.0,786.0,104909.0,0.0,166142.0,31111.0,207.0,0.0,0.0,0.0
Asura#8186,1877.0,4582.0,0.0,192.0,111.0,29949,0.0,756.0,1475,838,...,0.0,27179.0,111.0,0.0,0.0,0.0,132.0,141.0,0.0,0.0
Clairvoyant#8721,108140.0,64302.0,807.0,0.0,4650.0,67079,15409.0,93328.0,13881,5841,...,0.0,1015.0,1497.0,10618.0,0.0,0.0,0.0,188.0,20610.0,0.0


From the results above, it can be shown that the scraper works. However 2 things must be addressed.

1st, the column titles are based on the champion, following their numerical code. 

In order to address this, champion data will be obtained from the official Riot library known as [Data Dragon](https://developer.riotgames.com/docs/lol). This will allow us to map the number to the champion name, renaming the columns

2nd, the current scraping iteration requires the user's Riot ID and Tag (E.g. Atrophy#Fiend is necessary to obtain the data). While this is good for the implementation purposes, it will be impossible to obtain a dataset of names of this scale, and thus the code will have to be slightly altered to operate based on Personal Unique User IDentification (PUUID) instead. 

In [None]:
# Data Dragon URL for champion data
data_dragon_url = 'https://ddragon.leagueoflegends.com/cdn/14.8.1/data/en_US/champion.json'

# Function to fetch champion data from Data Dragon and create a mapping
def create_champion_id_name_mapping():
    try:
        response = requests.get(data_dragon_url)
        data = response.json()
        champion_data = data['data']
        champion_id_name_mapping = {int(champion_data[champion]['key']): champion_data[champion]['name'] for champion in champion_data}
        return champion_id_name_mapping
    except Exception as e:
        print(f"Error occurred while fetching champion data from Data Dragon: {e}")
        return None

# Create a dictionary mapping champion IDs to their names
champion_id_name_mapping = create_champion_id_name_mapping()

# Display the mapping
print(champion_id_name_mapping)


{266: 'Aatrox', 103: 'Ahri', 84: 'Akali', 166: 'Akshan', 12: 'Alistar', 32: 'Amumu', 34: 'Anivia', 1: 'Annie', 523: 'Aphelios', 22: 'Ashe', 136: 'Aurelion Sol', 268: 'Azir', 432: 'Bard', 200: "Bel'Veth", 53: 'Blitzcrank', 63: 'Brand', 201: 'Braum', 233: 'Briar', 51: 'Caitlyn', 164: 'Camille', 69: 'Cassiopeia', 31: "Cho'Gath", 42: 'Corki', 122: 'Darius', 131: 'Diana', 119: 'Draven', 36: 'Dr. Mundo', 245: 'Ekko', 60: 'Elise', 28: 'Evelynn', 81: 'Ezreal', 9: 'Fiddlesticks', 114: 'Fiora', 105: 'Fizz', 3: 'Galio', 41: 'Gangplank', 86: 'Garen', 150: 'Gnar', 79: 'Gragas', 104: 'Graves', 887: 'Gwen', 120: 'Hecarim', 74: 'Heimerdinger', 910: 'Hwei', 420: 'Illaoi', 39: 'Irelia', 427: 'Ivern', 40: 'Janna', 59: 'Jarvan IV', 24: 'Jax', 126: 'Jayce', 202: 'Jhin', 222: 'Jinx', 145: "Kai'Sa", 429: 'Kalista', 43: 'Karma', 30: 'Karthus', 38: 'Kassadin', 55: 'Katarina', 10: 'Kayle', 141: 'Kayn', 85: 'Kennen', 121: "Kha'Zix", 203: 'Kindred', 240: 'Kled', 96: "Kog'Maw", 897: "K'Sante", 7: 'LeBlanc', 64: 'L

As shown above, a dictionary mapping based on every champion's name and their individual codes are given. This dictionary will be used to edit the column names in the codeblock below

In [None]:
# Rename DataFrame columns using the champion ID name mapping
df.rename(columns=champion_id_name_mapping, inplace=True)

print(df)

                   Caitlyn      Ashe   Lee Sin     Riven    Kai'Sa  Ezreal  \
Atrophy#Fiend     683542.0  356429.0  325494.0  259217.0  242040.0  240035   
Aradia#NaCl        67287.0   52300.0   12385.0   39951.0    2022.0  134905   
Commet#OG1          4730.0   22185.0   34275.0   15025.0   83919.0  119005   
Asura#8186          1877.0    4582.0       0.0     192.0     111.0   29949   
Clairvoyant#8721  108140.0   64302.0     807.0       0.0    4650.0   67079   
Dorainen#FKKK      79988.0   51084.0   15200.0   60754.0   80795.0   65855   
Froggers#0002       6010.0    3469.0    4634.0       0.0     166.0    7754   
Ruthfu#SLK69        7544.0       0.0       0.0       0.0       0.0    2910   
23parent#SG2        7916.0   24616.0   21981.0   47071.0   41497.0   62728   
Vlone#8882        110329.0   58526.0  109597.0   83693.0  100167.0  141193   
TheFrog#0005           0.0     206.0       0.0       0.0       0.0     189   

                     Swain  Miss Fortune   Vayne  Thresh  ...  

In [None]:
df.head()

Unnamed: 0,Caitlyn,Ashe,Lee Sin,Riven,Kai'Sa,Ezreal,Swain,Miss Fortune,Vayne,Thresh,...,Kled,Hwei,Viego,Milio,Bel'Veth,Briar,K'Sante,Gwen,Lillia,Naafiri
Atrophy#Fiend,683542.0,356429.0,325494.0,259217.0,242040.0,240035,233260.0,208561.0,201080,196786,...,425.0,133.0,105.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Aradia#NaCl,67287.0,52300.0,12385.0,39951.0,2022.0,134905,11261.0,38910.0,117580,514606,...,1440.0,772.0,0.0,8674.0,0.0,0.0,0.0,0.0,0.0,0.0
Commet#OG1,4730.0,22185.0,34275.0,15025.0,83919.0,119005,78842.0,119.0,69478,89070,...,6153.0,786.0,104909.0,0.0,166142.0,31111.0,207.0,0.0,0.0,0.0
Asura#8186,1877.0,4582.0,0.0,192.0,111.0,29949,0.0,756.0,1475,838,...,0.0,27179.0,111.0,0.0,0.0,0.0,132.0,141.0,0.0,0.0
Clairvoyant#8721,108140.0,64302.0,807.0,0.0,4650.0,67079,15409.0,93328.0,13881,5841,...,0.0,1015.0,1497.0,10618.0,0.0,0.0,0.0,188.0,20610.0,0.0


The example dataset is now properly labeled with every column being a champion's name rather than a numerical key. However, it is still ordered based on the first account's mastery score in descending order. This will be addressed later to fix the column names by alphabetical order.

## Account Scraping

In order to get a large enough sample size of user accounts, the code block below scrapes user PUUID and their rank based on the server's leaderboards. This allows me to obtain a large dataset without having to get their RIOT ID and Tag.

In [None]:
# Initialize LolWatcher with your API key
lol_watcher = LolWatcher(API_KEY)

# List of ranks to search for accounts
ranks = ['CHALLENGER', 'GRANDMASTER', 'MASTER', 'DIAMOND', 'PLATINUM', 'GOLD', 'SILVER', 'BRONZE', 'IRON']
high_elo_ranks = ['CHALLENGER', 'GRANDMASTER', 'MASTER']
# Region to search for summoners
region = 'sg2'  # Replace with desired region
divisions = ['IV', 'III','II','I']

# Create lists to store account data
account_data = []

def get_account_data_by_high_elo_rank(rank, region):
    try:
        if rank == 'CHALLENGER':
            summoners = lol_watcher.league.challenger_by_queue(region, 'RANKED_SOLO_5x5')
        elif rank == 'GRANDMASTER':
            summoners = lol_watcher.league.grandmaster_by_queue(region, 'RANKED_SOLO_5x5')
        elif rank == 'MASTER':
            summoners = lol_watcher.league.masters_by_queue(region, 'RANKED_SOLO_5x5')
        
        # Extract account PUUIDs from summoner data and append to account_data list
        for entry in summoners['entries']:
            account_data.append({'PUUID': entry['summonerId'], 'Rank': rank, 'Division': None})

    except ApiError as e:
        print(f"Error occurred while fetching summoners for rank {rank}: {e.response.text}")

def get_account_data_by_rank(rank, region, division):
    try:
        summoners = lol_watcher.league.entries(region, 'RANKED_SOLO_5x5', tier=rank, division=division)
        # Extract account PUUIDs from summoner data
        account_puuids = [entry['summonerId'] for entry in summoners]
        # Append account data to the list
        for puuid in account_puuids:
            account_data.append({'PUUID': puuid, 'Rank': rank, 'Division': division})
    except ApiError as e:
        print(f"Error occurred while fetching summoners for rank {rank}: {e.response.text}")

# Scrape account data for each rank
for rank in ranks:
    if rank in high_elo_ranks:
        get_account_data_by_high_elo_rank(rank, region)
    else:
        for division in divisions:
            get_account_data_by_rank(rank, region, division)

# Create a DataFrame from the account data
account_df = pd.DataFrame(account_data)

# Display the DataFrame
print(account_df)


                                                  PUUID        Rank Division
0     jRkg4rNEumgpbzf2MorJc9MrcpJdhoZwGzXa8GAu6foEZp...  CHALLENGER     None
1     qmgs4nrhQGgqVHSONiEz8x8U3uz50kpLkZ67fehwscFzSv...  CHALLENGER     None
2     5fOj0o2GvDMoSTMNKri7WL_whZVGEfXJM57VRF9_h8vpsa...  CHALLENGER     None
3     TJEdr98RlfVxySYsa7yq8aTxzR5CIaOr6MdYCMiSWwQUbT...  CHALLENGER     None
4     raJXEZ0gCMgk3rBEEq08q0Hiq0RQt-PUI0qHlf_nYEql7B...  CHALLENGER     None
...                                                 ...         ...      ...
5387  J7tXQfLCY8ktrqJhIF60czLDnhi6gEqmUHlyz0_mnrz5jM...        IRON        I
5388  H4V7V-N1PXf66dJEFUm0o8KVrBi4eF5vpdSVKss2eNlR7p...        IRON        I
5389  VCJS4qI2Voj3M6YTIku18QRxU1tzntr5ROICt7SUly9BfG...        IRON        I
5390  AUBjvtsrOeWUHCcwtOTGcyx_wa3nC3sswpF8lbKCyqTOvb...        IRON        I
5391  gPG38ckRuq04yV0SQF10ylpCqwlYy_8isQq4wF9J5-YeZW...        IRON        I

[5392 rows x 3 columns]


breakdown on the accounts scraped by rank. Note that for higher ranks such as MASTER, GRANDMASTER and Challenger, the limit is due to the limited number of players able to achieve each rank as well as League of Legend's bottle neck (Challenger is a rank given to the TOP 50 players, Grandmaster is the next top 100 players, and lastly Masters is given to those who have ascended past Diamond 1 as well as being in the next top 500 players.) For that reason, these ranks (also known as high-elo) will have a more limited dataset.

In [None]:
account_df['Rank'].value_counts()

Rank
DIAMOND        820
PLATINUM       820
GOLD           820
SILVER         820
BRONZE         820
IRON           820
MASTER         322
GRANDMASTER    100
CHALLENGER      50
Name: count, dtype: int64

Upon further assessment, it is discovered that the PUUID obtained was not in fact the PUUID of the user account. Due to the legacy issues of the Riot API, the ID was instead summoner ID, which is an older format of PUUID used specifically for the League of Legends Client side. It will be properly renamed for future reference and the PUUID of the user will be obtained based on the user's summoner ID.

In [None]:
# Change column name from 'PUUID' to 'summoner_id'
account_df.rename(columns={'PUUID': 'summoner_id'}, inplace=True)

In [None]:
account_df

Unnamed: 0,summoner_id,Rank,Division
0,jRkg4rNEumgpbzf2MorJc9MrcpJdhoZwGzXa8GAu6foEZp...,CHALLENGER,
1,qmgs4nrhQGgqVHSONiEz8x8U3uz50kpLkZ67fehwscFzSv...,CHALLENGER,
2,5fOj0o2GvDMoSTMNKri7WL_whZVGEfXJM57VRF9_h8vpsa...,CHALLENGER,
3,TJEdr98RlfVxySYsa7yq8aTxzR5CIaOr6MdYCMiSWwQUbT...,CHALLENGER,
4,raJXEZ0gCMgk3rBEEq08q0Hiq0RQt-PUI0qHlf_nYEql7B...,CHALLENGER,
...,...,...,...
5387,J7tXQfLCY8ktrqJhIF60czLDnhi6gEqmUHlyz0_mnrz5jM...,IRON,I
5388,H4V7V-N1PXf66dJEFUm0o8KVrBi4eF5vpdSVKss2eNlR7p...,IRON,I
5389,VCJS4qI2Voj3M6YTIku18QRxU1tzntr5ROICt7SUly9BfG...,IRON,I
5390,AUBjvtsrOeWUHCcwtOTGcyx_wa3nC3sswpF8lbKCyqTOvb...,IRON,I


Below is the code used to obtain PUUID from summoner_id. However, due to the instance of the API timing out (either due to too many requests etc.), artifical timers were added in the code. Alongside, a progress bar was used after chunking the dataframe into 54 different chunks, allowing them to be scraped separately for a total time span of 1 Hour 47 Minutes.

In [None]:
lol_watcher = LolWatcher(API_KEY)

# Function to obtain PUUID from Summoner ID
def get_puuid_from_summoner_id(summoner_id, region):
    try:
        summoner_info = lol_watcher.summoner.by_id(region, summoner_id)
        return summoner_info['puuid']
    except ApiError as e:
        print(f"Error occurred while fetching PUUID for Summoner ID {summoner_id}: {e.response.text}")
        return None

# Function to process DataFrame in chunks and add PUUIDs
def process_df_chunks(df, chunk_size, region):
    puuid_list = []
    num_chunks = len(df) // chunk_size + 1
    for chunk_id in tqdm(range(num_chunks), desc="Processing Chunks"):
        start_idx = chunk_id * chunk_size
        end_idx = min((chunk_id + 1) * chunk_size, len(df))
        chunk_df = df.iloc[start_idx:end_idx]
        puuid_list.extend(chunk_df['summoner_id'].apply(get_puuid_from_summoner_id, args=(region,)).tolist())
        time.sleep(10)
    return puuid_list

# Split DataFrame into chunks and process each chunk
chunk_size = 100  # Adjust this based on your needs
region = 'sg2'  # Replace with your desired region
puuid_list = process_df_chunks(account_df, chunk_size, region)

# Add PUUIDs to DataFrame
account_df['puuid'] = puuid_list

# Display the modified DataFrame
print(account_df.head())


Processing Chunks:   0%|          | 0/54 [00:00<?, ?it/s]

Processing Chunks: 100%|██████████| 54/54 [1:47:10<00:00, 119.08s/it]

                                         summoner_id        Rank Division  \
0  jRkg4rNEumgpbzf2MorJc9MrcpJdhoZwGzXa8GAu6foEZp...  CHALLENGER     None   
1  qmgs4nrhQGgqVHSONiEz8x8U3uz50kpLkZ67fehwscFzSv...  CHALLENGER     None   
2  5fOj0o2GvDMoSTMNKri7WL_whZVGEfXJM57VRF9_h8vpsa...  CHALLENGER     None   
3  TJEdr98RlfVxySYsa7yq8aTxzR5CIaOr6MdYCMiSWwQUbT...  CHALLENGER     None   
4  raJXEZ0gCMgk3rBEEq08q0Hiq0RQt-PUI0qHlf_nYEql7B...  CHALLENGER     None   

                                               puuid  
0  1bIcsAlZYBFasZ_7i8BEj3J7cB_gTYeXE78_TWeMdF8vJx...  
1  DGJ4Ei1usxJ8wZCRs56FBh3ISL7LIn12Auhn5PdKyWeq8n...  
2  Sub_TWuD8aVUM4YNiyQCW5cbBq6GcXI-bGuXqI7OQOcU5S...  
3  QVu3nczFWCsl18IOoExbe1AzbaMl2wIyKsB1u1S2Ysg_ed...  
4  TF7CMsUCs_z7jpZb7I6wHMAOoyM_A9GEmoJL0uA8FEeqPP...  





Final check to see if puuid was obtained from summoner_id

In [None]:
account_df.head()

Unnamed: 0,summoner_id,Rank,Division,puuid
0,jRkg4rNEumgpbzf2MorJc9MrcpJdhoZwGzXa8GAu6foEZp...,CHALLENGER,,1bIcsAlZYBFasZ_7i8BEj3J7cB_gTYeXE78_TWeMdF8vJx...
1,qmgs4nrhQGgqVHSONiEz8x8U3uz50kpLkZ67fehwscFzSv...,CHALLENGER,,DGJ4Ei1usxJ8wZCRs56FBh3ISL7LIn12Auhn5PdKyWeq8n...
2,5fOj0o2GvDMoSTMNKri7WL_whZVGEfXJM57VRF9_h8vpsa...,CHALLENGER,,Sub_TWuD8aVUM4YNiyQCW5cbBq6GcXI-bGuXqI7OQOcU5S...
3,TJEdr98RlfVxySYsa7yq8aTxzR5CIaOr6MdYCMiSWwQUbT...,CHALLENGER,,QVu3nczFWCsl18IOoExbe1AzbaMl2wIyKsB1u1S2Ysg_ed...
4,raJXEZ0gCMgk3rBEEq08q0Hiq0RQt-PUI0qHlf_nYEql7B...,CHALLENGER,,TF7CMsUCs_z7jpZb7I6wHMAOoyM_A9GEmoJL0uA8FEeqPP...


Below is the code used to scrape for the champion masteries of each account based on PUUID

In [None]:
def get_champion_masteries(puuid, summoner_region):
    if puuid:
        try:
            champion_masteries = lol_watcher.champion_mastery.by_puuid(summoner_region, puuid)
            return champion_masteries
        except ApiError as e:
            print(f"Error occurred while fetching champion masteries for {puuid}: {e.response.text}")
    return []

# Dictionary to store champion masteries for each summoner
mastery_data = {}

# Scrape champion masteries for each summoner
for puuid in account_df['puuid']:
    champion_masteries = get_champion_masteries(puuid, summoner_region)
    mastery_data[puuid] = {mastery['championId']:mastery['championPoints'] for mastery in champion_masteries}

# Create DataFrame from mastery_data
df = pd.DataFrame.from_dict(mastery_data, orient='index')

# Fill NaN values with 0
df.fillna(0, inplace=True)

In [None]:
df.head()

Unnamed: 0,154,60,517,82,122,555,101,89,266,104,...,420,83,897,200,223,888,902,526,950,233
1bIcsAlZYBFasZ_7i8BEj3J7cB_gTYeXE78_TWeMdF8vJx5JAsP5Q7-PvCW_9MtGpOSmMVpHzBvY_Q,105881.0,51478.0,24524.0,22955.0,7909.0,7237.0,5113.0,4855.0,3253.0,2629.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
QVu3nczFWCsl18IOoExbe1AzbaMl2wIyKsB1u1S2Ysg_edBwkQFocceuX3hzmkrMPAROi7n2BvG5Vg,44888.0,19696.0,55465.0,7602.0,58001.0,82628.0,60256.0,28987.0,78536.0,84864.0,...,847.0,566.0,181.0,165.0,128.0,0.0,0.0,0.0,0.0,0.0
TF7CMsUCs_z7jpZb7I6wHMAOoyM_A9GEmoJL0uA8FEeqPPfZRMAnaewBV4pmfUgYIn2Bbp_bGjWb9g,2412.0,22517.0,55270.0,354.0,22225.0,26426.0,18173.0,2182.0,58213.0,90408.0,...,1259.0,0.0,10678.0,0.0,0.0,618.0,146.0,0.0,0.0,0.0
Fa8kC2YI6rCqb9yqXzrZPfaRNCKMsf42vOD6Wg-uR1P5oDJTKCN3bn8NT85iIkB_lWJJSHG7c3Dldw,1160.0,9356.0,12700.0,346.0,3224.0,205.0,773.0,4462.0,1218.0,58448.0,...,134.0,0.0,1417.0,0.0,147.0,0.0,0.0,134.0,0.0,0.0
JpY9M9-bA7FPXfkf4lJzqlvh1yDb-MwEpUnHK4jfKoE62L8uCg2i8r0_LvajlG_Pp7KXNqlSuFTopA,95920.0,26672.0,18963.0,31357.0,166917.0,21252.0,20522.0,5761.0,430765.0,15432.0,...,61141.0,3266.0,251967.0,0.0,18439.0,0.0,972.0,0.0,9831.0,0.0


In [None]:
df.shape

(5392, 167)

mapping of champion names on columns

In [None]:
# Rename DataFrame columns using the champion ID name mapping
df.rename(columns=champion_id_name_mapping, inplace=True)

print(df)

                                                         Zac    Elise  \
1bIcsAlZYBFasZ_7i8BEj3J7cB_gTYeXE78_TWeMdF8vJx5...  105881.0  51478.0   
QVu3nczFWCsl18IOoExbe1AzbaMl2wIyKsB1u1S2Ysg_edB...   44888.0  19696.0   
TF7CMsUCs_z7jpZb7I6wHMAOoyM_A9GEmoJL0uA8FEeqPPf...    2412.0  22517.0   
Fa8kC2YI6rCqb9yqXzrZPfaRNCKMsf42vOD6Wg-uR1P5oDJ...    1160.0   9356.0   
JpY9M9-bA7FPXfkf4lJzqlvh1yDb-MwEpUnHK4jfKoE62L8...   95920.0  26672.0   
...                                                      ...      ...   
3w-0XMl5M7Ov5oxAQ1KsLWinkfIHe9Jhcr6DQvppks7Qf4R...       0.0      0.0   
pc8oXLbyqVUsg6k4C9HfrzjQBlwAMuSAQ5Px1VrsWP6PsZ4...       0.0      0.0   
mhoIsHZhg4UzBcCKK_CAFohyb1lnx08WGb5YQ5-W5D0WhVX...       0.0      0.0   
G8bKuhkxXmPJgPnXwBjC1qoOGsLXvxwitvz-D2KzgA53Xh9...       0.0      0.0   
YEWACkdmZHXBJZYqVa3eomYG2bXVGpsH4HxCPFYi3octaj7...       0.0      0.0   

                                                      Sylas  Mordekaiser  \
1bIcsAlZYBFasZ_7i8BEj3J7cB_gTYeXE78_TWeMdF8vJx5

In [None]:
df.head()

Unnamed: 0,Zac,Elise,Sylas,Mordekaiser,Darius,Pyke,Xerath,Leona,Aatrox,Graves,...,Illaoi,Yorick,K'Sante,Bel'Veth,Tahm Kench,Renata Glasc,Milio,Rell,Naafiri,Briar
1bIcsAlZYBFasZ_7i8BEj3J7cB_gTYeXE78_TWeMdF8vJx5JAsP5Q7-PvCW_9MtGpOSmMVpHzBvY_Q,105881.0,51478.0,24524.0,22955.0,7909.0,7237.0,5113.0,4855.0,3253.0,2629.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
QVu3nczFWCsl18IOoExbe1AzbaMl2wIyKsB1u1S2Ysg_edBwkQFocceuX3hzmkrMPAROi7n2BvG5Vg,44888.0,19696.0,55465.0,7602.0,58001.0,82628.0,60256.0,28987.0,78536.0,84864.0,...,847.0,566.0,181.0,165.0,128.0,0.0,0.0,0.0,0.0,0.0
TF7CMsUCs_z7jpZb7I6wHMAOoyM_A9GEmoJL0uA8FEeqPPfZRMAnaewBV4pmfUgYIn2Bbp_bGjWb9g,2412.0,22517.0,55270.0,354.0,22225.0,26426.0,18173.0,2182.0,58213.0,90408.0,...,1259.0,0.0,10678.0,0.0,0.0,618.0,146.0,0.0,0.0,0.0
Fa8kC2YI6rCqb9yqXzrZPfaRNCKMsf42vOD6Wg-uR1P5oDJTKCN3bn8NT85iIkB_lWJJSHG7c3Dldw,1160.0,9356.0,12700.0,346.0,3224.0,205.0,773.0,4462.0,1218.0,58448.0,...,134.0,0.0,1417.0,0.0,147.0,0.0,0.0,134.0,0.0,0.0
JpY9M9-bA7FPXfkf4lJzqlvh1yDb-MwEpUnHK4jfKoE62L8uCg2i8r0_LvajlG_Pp7KXNqlSuFTopA,95920.0,26672.0,18963.0,31357.0,166917.0,21252.0,20522.0,5761.0,430765.0,15432.0,...,61141.0,3266.0,251967.0,0.0,18439.0,0.0,972.0,0.0,9831.0,0.0


Final dataset is complete. For sake of anonymity, the users PUUIDs will be removed from the dataset itself

In [None]:
df.to_csv(r"../data/account_mastery_dataset.csv", index = False)

In [None]:
account_df.to_csv(r"../data/accounts_dataset.csv", index = False)