## LD2L data project

#### Introduction

In order to pull interesting data from ld2l.gg I have created a basic set of pulls to gather match data and parse it into a dataframe. 
This dataframe is exported to a csv and can be used with any BI software or can be read programatiicaly with dataframe packages for exploration

In [17]:
#import 

import pandas as pd
import numpy as np
import requests
import json
import bs4
import time
import os

# You will need your own API key from opendota 
# and a config.py file that sets the variable api_key to your key
from config import api_key

In [18]:
# display full dataframe
pd.set_option('display.max_columns', None)

#### Pulling season info

Basic match data is found on ld2l.gg/seasons/##/matches. My outline here will use season 37 for prototyping. 

Using BeautifulSoup (BS) (https://www.crummy.com/software/BeautifulSoup/bs4/doc/) the ld2l matches page is parsed to get the ld2l match id. 
The ld2l match id does not match the dota match id

A cache file is created, unless it already exists, to avoid re-parsing saved info.

Note that seasons on the website do not relate to the ticket or season that would be listed in Dota/OpenDota api.
This method also reduces adding in matches that ticket holders use for scrims or other reasons that aren't official games.
One limitation is that this method can't pull unticketed data, even if entered in completly on the ld2l website.



In [19]:
#set ld2l season webpage

#prototype will be for season 37, the function will be for any season via input
season = 37

url = f'https://ld2l.gg/seasons/{season}/matches'


soup = bs4.BeautifulSoup(requests.get(url).text, 'html.parser')
matches = []
# This is a directory string to help manage the data of different seasons
save_dir = f'match_data_{season}'

for a in soup.find_all('a', href=True):
    # Finds ld2l match links and appends them to a list
    if 'match' in a['href'] and 'season' not in a['href']:
        matches.append('https://ld2l.gg' + a['href'])

#sort matches by ID
matches.sort(key=lambda x: int(x.split('/')[-1]))

# create a folder to store match data if it doesn't exist
if not os.path.exists(save_dir):
    os.mkdir(save_dir)

# matches text file location to variable
matches_file = f'{save_dir}/matches_{season}.txt'

# create a matches text file to store match IDs if it doesn't exist via touch command

if not os.path.exists(matches_file):
    os.system(f'touch {matches_file}')

# write a file to save week number via touch command
if not os.path.exists(f'{save_dir}/week.txt'):
    os.system(f'touch {save_dir}/week.txt')

print(f'Found {len(matches)} matches in season {season} \nfile location: {matches_file}')


Found 97 matches in season 37 
file location: match_data_37/matches_37.txt


#### Converting to OpenDota links

After gathering the match data, each match page is opened via BS. From here the Match ID is extracted from the 
OpenDota link and a correctly formatted OpenDota API link is added to a list.

This section skips over matches that have been parsed already by checking the matches.txt file created in the last secion

In [20]:
# below code writes match IDs to file if they are not already in the file

with open(matches_file) as f:
    data = f.read()

for match in matches:
    #check if match is already in file matches.txt to prevent re-scraping and angry butterygreg
    if match not in data:
        #write match to file
        with open(matches_file, 'a') as f:
            f.write(match + '\n')


In [21]:
# add openDota link for each match

#read matches.txt into list
with open(matches_file) as f:
    data = f.read().split('\n')
    #remove empty string from list
    if data[-1] == '':
        data.pop()

# parse each match link via bs to get openDota link

opendota = []


for match in data:
    soup = bs4.BeautifulSoup(requests.get(match).text, 'html.parser')
    for a in soup.find_all('a', href=True):
        if 'opendota' in a['href'] and 'matches/0' not in a['href']:
            match_link = a['href']
            opendota.append(match_link)
            break


# convert opendota links to API links

opendota_api = []

for match in opendota:
    match_id = match.split('/')[-1]
    api_link = f'https://api.opendota.com/api/matches/{match_id}?api_key={api_key}'
    opendota_api.append(api_link)

#### Pulling OpenDota Jsons

In [22]:
# hold list of file names
file_names = []

for files in os.listdir(save_dir):
    if files.endswith('.json'):
        file_names.append(files)

for match in opendota_api:

    # get match id
    match_id = match.split('/')[-1].split('?')[0]

    file_path = f'{save_dir}/match_{match_id}.json'

    #check if file already exists
    if not os.path.isfile(f'{file_path}'):
        # get json of match and save to json file
        match_json = requests.get(match).json()
        with open(f'{save_dir}/match_{match_id}.json', 'w') as f:
            json.dump(match_json, f)
            file_names.append(f'match_{match_id}.json')

In [23]:
file_names

['match_6982865709.json',
 'match_6982866447.json',
 'match_6982867622.json',
 'match_6982868616.json',
 'match_6982877034.json',
 'match_6982884342.json',
 'match_6982900159.json',
 'match_6982904807.json',
 'match_6982907814.json',
 'match_6982931651.json',
 'match_6982934817.json',
 'match_6982940646.json',
 'match_6993380342.json',
 'match_6993383257.json',
 'match_6993383916.json',
 'match_6993385502.json',
 'match_6993387493.json',
 'match_6993392642.json',
 'match_6993429451.json',
 'match_6993431150.json',
 'match_6993431932.json',
 'match_6993433983.json',
 'match_6993438994.json',
 'match_6993447694.json',
 'match_7004140570.json',
 'match_7004149439.json',
 'match_7004155202.json',
 'match_7004155925.json',
 'match_7004159214.json',
 'match_7004160100.json',
 'match_7004196123.json',
 'match_7004199027.json',
 'match_7004203500.json',
 'match_7004203614.json',
 'match_7004206771.json',
 'match_7004210195.json',
 'match_7025177090.json',
 'match_7025177613.json',
 'match_7025

#### DataFrame formatting and basic cleaning

Below a blank dataframe is created with the selected features from the players section in the read json files. 
As with earlier sections, if a cached match_data.csv exists, new items will be concatenated instead of a new creations, saving time and resources.

#### If the data is not been parsed on opendota by requesting, the below will not function properly

In [24]:
# create an empty dataframe to hold all match data if it doesn't exist

# csv file location to variable

csv_file = f'{save_dir}/match_data_{season}.csv'

if not os.path.exists(csv_file):
    match_data = pd.DataFrame(columns=['match_id', 'date', 'account_id', 'personaname', 'teamID', 'rank_tier', 'kills', 'assists',
       'deaths', 'kills_per_min', 'kda', 'denies', 'gold', 'gold_per_min', 'gold_spent', 'hero_damage', 'damage_taken',
       'hero_healing', 'hero_id', 'item_0', 'item_1', 'item_2', 'item_3',
       'item_4', 'item_5', 'item_neutral', 'last_hits', 'level',
       'net_worth', 'tower_damage', 'xp_per_min', 'radiant_win',
       'duration', 'patch', 'isRadiant', 'win', 'lose',
       'total_gold', 'total_xp', 'obs_placed', 'sen_placed', 'rune_pickups', 'camps_stacked', 'stuns', 'creeps_stacked',
       'firstblood_claimed', 'pings', 'teamfight_participation', 'roshans_killed'])
    match_data.to_csv(csv_file, index=False)
else:
    match_data = pd.read_csv(csv_file, index_col=None)

In [25]:
# heroes are stored as IDs isntead of names. A new api call is needed to get the hero names. This will be stored in a dataframe

#check if hero names csv file exists
if not os.path.exists('hero_names.csv'):
    heroes = requests.get(f'https://api.opendota.com/api/heroes?{api_key}').json()
    heroes_df = pd.DataFrame(heroes)
    heroes_df.to_csv('hero_names.csv', index=False)
else:
    heroes_df = pd.read_csv('hero_names.csv')

heroes_df.head()

Unnamed: 0,id,name,localized_name,primary_attr,attack_type,roles,legs
0,1,npc_dota_hero_antimage,Anti-Mage,agi,Melee,"['Carry', 'Escape', 'Nuker']",2
1,2,npc_dota_hero_axe,Axe,str,Melee,"['Initiator', 'Durable', 'Disabler', 'Jungler'...",2
2,3,npc_dota_hero_bane,Bane,int,Ranged,"['Support', 'Disabler', 'Nuker', 'Durable']",4
3,4,npc_dota_hero_bloodseeker,Bloodseeker,agi,Melee,"['Carry', 'Disabler', 'Jungler', 'Nuker', 'Ini...",2
4,5,npc_dota_hero_crystal_maiden,Crystal Maiden,int,Ranged,"['Support', 'Disabler', 'Nuker', 'Jungler']",2


In [27]:
for  i, file in enumerate(file_names):

    #uncomment to see whcih files are loaded
    # print(f'{save_dir}/{file}')
    
    # read first json file as a dictionary
    with open(f'{save_dir}/{file}') as f:
        data = json.load(f)

    # get match id
    match_id = data['match_id']

    # if match id is already in matches_df, skip
    if match_id in match_data['match_id'].values:
        pass
    else:

        # check to see if match is valid

        rad_team_id = data['radiant_team_id']
        dire_team_id = data['dire_team_id']
        
        # read player from data into a dataframe

        df = pd.DataFrame(data['players'])

        #convert start_time from unix time to datetime using
        df['start_time'] = pd.to_datetime(df['start_time'], unit='s')
        df['date'] = df['start_time'].dt.date

        #drop start_time
        df.drop('start_time', axis=1, inplace=True)

        # if isRadiant is true, set teamID to radiant team ID, else set to dire team ID

        df['teamID'] = df['isRadiant'].apply(lambda x: rad_team_id if x == True else dire_team_id)

        # damage_taken is a nested dictionary. We want the sum of the values in the dictionary. if damage_taken is empty, set to 0
        df['damage_taken'] = df['damage_taken'].apply(lambda x: sum(x.values()) if x else 0)


        new_order = match_data.columns.tolist()

        df = df[new_order]

        #replace hero_id with hero name from heroes_df
        df['hero_id'] = df['hero_id'].map(heroes_df.set_index('id')['localized_name'])

        # append to main df via concat

        match_data = pd.concat([match_data, df], axis=0)

        # replace NaN with 0
        match_data.fillna(0, inplace=True)

        # save to csv every loop
        match_data.to_csv(csv_file, index=False)

match_data_37/match_6982865709.json
match_data_37/match_6982866447.json
match_data_37/match_6982867622.json
match_data_37/match_6982868616.json
match_data_37/match_6982877034.json
match_data_37/match_6982884342.json
match_data_37/match_6982900159.json
match_data_37/match_6982904807.json
match_data_37/match_6982907814.json
match_data_37/match_6982931651.json
match_data_37/match_6982934817.json
match_data_37/match_6982940646.json
match_data_37/match_6993380342.json
match_data_37/match_6993383257.json
match_data_37/match_6993383916.json
match_data_37/match_6993385502.json
match_data_37/match_6993387493.json
match_data_37/match_6993392642.json
match_data_37/match_6993429451.json
match_data_37/match_6993431150.json
match_data_37/match_6993431932.json
match_data_37/match_6993433983.json
match_data_37/match_6993438994.json
match_data_37/match_6993447694.json
match_data_37/match_7004140570.json
match_data_37/match_7004149439.json
match_data_37/match_7004155202.json
match_data_37/match_70041559

#### Preview

Below will be a dataframe preview.

In [28]:
match_data.head()

Unnamed: 0,match_id,date,account_id,personaname,teamID,rank_tier,kills,assists,deaths,kills_per_min,kda,denies,gold,gold_per_min,gold_spent,hero_damage,damage_taken,hero_healing,hero_id,item_0,item_1,item_2,item_3,item_4,item_5,item_neutral,last_hits,level,net_worth,tower_damage,xp_per_min,radiant_win,duration,patch,isRadiant,win,lose,total_gold,total_xp,obs_placed,sen_placed,rune_pickups,camps_stacked,stuns,creeps_stacked,firstblood_claimed,pings,teamfight_participation,roshans_killed
0,6982865709,2023-01-22,147665746,Cheeseburger,8975614,35,2,7,6,0.064865,1,11,466,408,11640,16940,18072,0,Outworld Destroyer,1,63,116,23,534,0,289,149,16,11966,1884,460,True,1850,51,True,1,0,12580,14183,2,0,5,0,69.7229,0,0,0.0,0.529412,1
1,6982865709,2023-01-22,163983635,Turtles,8975614,35,2,11,4,0.064865,2,6,482,408,12850,10692,14406,2268,Beastmaster,11,34,635,131,29,0,358,142,17,12532,6645,543,True,1850,51,True,1,0,12580,16742,0,0,1,0,16.250732,0,0,0.0,0.764706,0
2,6982865709,2023-01-22,123736773,Space Cowboy,8975614,35,1,4,9,0.032432,0,1,1322,207,5905,12385,13466,0,Skywrath Mage,206,77,77,77,29,38,357,19,12,6137,257,301,True,1850,51,True,1,0,6382,9280,0,1,3,2,2.006836,5,0,2.0,0.294118,0
3,6982865709,2023-01-22,85398356,GRaff,8975614,35,4,8,5,0.12973,2,3,1029,260,6945,8454,14722,6634,Treant Protector,244,39,180,1,36,0,356,27,15,5379,0,408,True,1850,51,True,1,0,8016,12580,13,20,4,0,37.857666,0,0,6.0,0.705882,0
4,6982865709,2023-01-22,433544241,DaviruzZ,8975614,53,8,6,0,0.259459,14,5,2090,520,14555,23986,19680,0,Phantom Lancer,174,36,131,63,147,24,947,196,19,16190,2753,642,True,1850,51,True,1,0,16033,19795,0,0,2,1,0.0,5,0,17.0,0.823529,0
