# Sourcing and  converting SC2 replays 

StarCraftII replay files are a dime a dozen. 

In this notebook we dedicate ourselves to sourcing some of these files, and converting them to a tractable format.

Our priorities are:

    - the 420+ pro replays of the most recent SC2 world championship.
    
    - the 7200 pro replays available at http://lotv.spawningtool.com/
    - the 16,000+ gand-master and master replays readily www.gamereplays.org
    
    - the 25,000+ mixed-skill replays at http://lotv.spawningtool.com/
    - the 65,000+ mixed-skill replays at www.gamereplays.org

It is also worth noting that Blizzard (in partnership with Google Deep Mind) recently released 35,000 anonymized replay files for the purposes of A.I. research, and that they intend for this dataset to grow to 500,000 by the end of the year. However, their proces of annonymizing these files have made them incompatible with our parser. If time allows we will seek to remedy this, but, then again, maybe 100k+ replays are enough.

### Sourcing the 420+ pro replays of the most recent SC2 world championship.

This one is not dificult, a download link is readily available:

http://www.mediafire.com/file/4er2bk8k5d65bb4/IEM+XI+-+World+Championship+-+StarCraft+II+Replays.rar

### Converting these 420+ pro replays to dictionary:

In [1]:
import sc2reader
import pickle

from Scripts.replay_to_dict import replay_to_dict

In [2]:
path_to_games = './../../../sc2games'

In [3]:
iem_replays = [replay_to_dict(replay) 
               for replay in sc2reader.load_replays(
                   path_to_games+'/IEM XI - World Championship - StarCraft II Replays/',
               load_level = 3)]

In [4]:
with open(path_to_games + '/PickledGames/iem_replays.p','wb') as iem_file:
    pickle.dump(iem_replays, iem_file)

### Sourcing the 7200 pro replays available at http://lotv.spawningtool.com/

This can be asily done. Using Xpath we dicovered LoTV.spawningtool.com allows the download a zip file of 25 replays by visiting a url of the form:

    http://lotv.spawningtool.com/zip/? + <details>
    
With some further tinckering we discovered the following settings of interest:

    pro_only=on
    tag= <120 to 219> (relates to the labled build)
    
http://lotv.spawningtool.com/zip/?pro_only=


In [16]:
import urllib
urllib.request.urlopen('http://lotv.spawningtool.com/zip/?pro_only=on&p=80')

<http.client.HTTPResponse at 0x10e6c9f60>

In [17]:
urllib.request.urlopen('http://lotv.spawningtool.com/replays/?p=&pro_only=on&tag=193&query=&after_time=&before_time=&after_played_on=&before_played_on=&patch=&order_by=')

<http.client.HTTPResponse at 0x109b4d0b8>

In [11]:
b_url = 'http://lotv.spawningtool.com/zip/?pro_only=on&p='
for i in range(80,101):
    url = b_url + str(i)
    #requests.get(url)
    print('_'+url)

_http://lotv.spawningtool.com/zip/?pro_only=on&p=80
_http://lotv.spawningtool.com/zip/?pro_only=on&p=81
_http://lotv.spawningtool.com/zip/?pro_only=on&p=82
_http://lotv.spawningtool.com/zip/?pro_only=on&p=83
_http://lotv.spawningtool.com/zip/?pro_only=on&p=84
_http://lotv.spawningtool.com/zip/?pro_only=on&p=85
_http://lotv.spawningtool.com/zip/?pro_only=on&p=86
_http://lotv.spawningtool.com/zip/?pro_only=on&p=87
_http://lotv.spawningtool.com/zip/?pro_only=on&p=88
_http://lotv.spawningtool.com/zip/?pro_only=on&p=89
_http://lotv.spawningtool.com/zip/?pro_only=on&p=90
_http://lotv.spawningtool.com/zip/?pro_only=on&p=91
_http://lotv.spawningtool.com/zip/?pro_only=on&p=92
_http://lotv.spawningtool.com/zip/?pro_only=on&p=93
_http://lotv.spawningtool.com/zip/?pro_only=on&p=94
_http://lotv.spawningtool.com/zip/?pro_only=on&p=95
_http://lotv.spawningtool.com/zip/?pro_only=on&p=96
_http://lotv.spawningtool.com/zip/?pro_only=on&p=97
_http://lotv.spawningtool.com/zip/?pro_only=on&p=98
_http://lotv

In [None]:
import requests
from lxml import html
import pandas as pd

https://www.gamereplays.org/starcraft2/replays.php?game=33&show=master_replays&sort_by=latest_upload&search=&st=0

In [3]:
base_url = 'https://www.gamereplays.org/starcraft2/replays.php?game=33&show=master_replays&sort_by=latest_upload&search=&st='

indexes = list(range(0,16400,30))

def getHTML(index = 0):
    url = base_url + str(index)
    rawHTML = requests.get(url).content
    return str(rawHTML)

In [4]:
"""for i in indexes:
    with open('../../../../Desktop/replayhtmls.text', "a") as myfile:
            myfile.write(getHTML(index=i))"""

"""with open('../../../../Desktop/replayhtmls.text', "r") as myfile:
        m_html = myfile.read()
        tree = html.fromstring(m_html)"""

In [8]:
games = tree.xpath('//a[@class = "replay_index_button index_download"]/@href')

In [11]:
series = pd.Series(games)

In [19]:
series.to_clipboard()

http://lotv.spawningtool.com/zip/?pro_only=on&p=80
http://lotv.spawningtool.com/zip/?pro_only=on&p=81
http://lotv.spawningtool.com/zip/?pro_only=on&p=82
http://lotv.spawningtool.com/zip/?pro_only=on&p=83
http://lotv.spawningtool.com/zip/?pro_only=on&p=84
http://lotv.spawningtool.com/zip/?pro_only=on&p=85
http://lotv.spawningtool.com/zip/?pro_only=on&p=86
http://lotv.spawningtool.com/zip/?pro_only=on&p=87
http://lotv.spawningtool.com/zip/?pro_only=on&p=88
http://lotv.spawningtool.com/zip/?pro_only=on&p=89
http://lotv.spawningtool.com/zip/?pro_only=on&p=90
http://lotv.spawningtool.com/zip/?pro_only=on&p=91
http://lotv.spawningtool.com/zip/?pro_only=on&p=92
http://lotv.spawningtool.com/zip/?pro_only=on&p=93
http://lotv.spawningtool.com/zip/?pro_only=on&p=94
http://lotv.spawningtool.com/zip/?pro_only=on&p=95
http://lotv.spawningtool.com/zip/?pro_only=on&p=96
http://lotv.spawningtool.com/zip/?pro_only=on&p=97
http://lotv.spawningtool.com/zip/?pro_only=on&p=98
http://lotv.spawningtool.com/zi

In [3]:
import sc2reader

In [15]:
%%time
games = []
for game in replays:
    games.append(game)

CPU times: user 23.1 s, sys: 1.09 s, total: 24.2 s
Wall time: 25.5 s


In [25]:
gg = games[0]

for game in games:
    if gg.type != '1v1': print(gg.type)

In [11]:
2302

2302