# Sourcing and  converting SC2 replays 

StarCraftII replay files are a dime a dozen. 

In this notebook we dedicate ourselves to sourcing some of these files, and converting them to a tractable format.

Our priorities are:

    - the 420+ pro replays of the most recent SC2 world championship.
    
    - the 7200 pro replays available at http://lotv.spawningtool.com/
    - the 16,000+ gand-master and master replays readily www.gamereplays.org
    
    - the 25,000+ mixed-skill replays at http://lotv.spawningtool.com/
    - the 65,000+ mixed-skill replays at www.gamereplays.org

It is also worth noting that Blizzard (in partnership with Google Deep Mind) recently released 35,000 anonymized replay files for the purposes of A.I. research, and that they intend for this dataset to grow to 500,000 by the end of the year. However, their proces of annonymizing these files have made them incompatible with our parser. If time allows we will seek to remedy this, but, then again, maybe 100k+ replays are enough.

### Sourcing the 420+ pro replays of the most recent SC2 world championship.

This one is not dificult, a download link is readily available:

http://www.mediafire.com/file/4er2bk8k5d65bb4/IEM+XI+-+World+Championship+-+StarCraft+II+Replays.rar

### Converting these 420+ pro replays to dictionary:

In [1]:
import sc2reader
import pickle

from Scripts.replay_to_dict import replay_to_dict

In [2]:
path_to_games = './../../../sc2games'

In [None]:
iem_replays = [replay_to_dict(replay) 
               for replay in sc2reader.load_replays(
                   path_to_games+'/IEM XI - World Championship - StarCraft II Replays/',
               load_level = 3)]

In [4]:
# with open(path_to_games + '/PickledGames/iem_replays.p','wb') as iem_file:
#     pickle.dump(iem_replays, iem_file)

### Sourcing the 7200 pro replays available at http://lotv.spawningtool.com/

This can be asily done. Using Xpath we dicovered LoTV.spawningtool.com allows the download a zip file of 25 replays by visiting a url of the form:

    http://lotv.spawningtool.com/zip/? + <details>
    
With some further tinckering we discovered the following settings of interest:

    pro_only=on
    tag= <120 to 219> (relates to the labled build)
    
http://lotv.spawningtool.com/zip/?pro_only=


### Sourcing the 16k+ pro replays available at https://www.gamereplays.org

This was slightly more involved. Here replay files may be downloaded one at a time by visiting a url of the form 

    https://www.gamereplays.org/starcraft2/replays...
    
where the game's id is a unique identifier within gamereplays.org.

No clear index existed for these id's, but it was easy enough to:

- obtain the raw html of the various result pages using requests.
- parse the html for the id's using the lxml library and xpaths.
- colate the id's into a csv file to serve as an index

In [23]:
import pandas as pd
game_links = pd.read_csv('./Resources/links_to_games_in_GameReplay.csv', header = None)[1]
game_links.tail()

16280    https://www.gamereplays.org/starcraft2/replays...
16281    https://www.gamereplays.org/starcraft2/replays...
16282    https://www.gamereplays.org/starcraft2/replays...
16283    https://www.gamereplays.org/starcraft2/replays...
16284    https://www.gamereplays.org/starcraft2/replays...
Name: 1, dtype: object

In [266]:
game_links[0]

'https://www.gamereplays.org/starcraft2/replays.php?s=9cb99b76d182221d04913285879f47f1&game=33&show=download&&id=304724'

At this point it is just a matter of itterating through the 16284 links using requests.

We introduce a significant delay of 1 second between calls to avoid the wrath of their I.T. staff.

In [183]:
file_names = [str(a)+'_'+b.split('id=')[-1] for a,b in game_links.items()]

In [273]:
file_names[113]

'113_283410'

In [275]:
sc2reader.load_replay('./../../../sc2games/GameReplayOrg/113_283410.SC2Replay').map_name

"Bel'Shir Vestige LE"

In [268]:
sc2reader.load_replay('./../../../../Downloads/Terraformacja_ER_9__[Star2.org].sc2replay').objects

{0: None [0],
 1: VespeneGeyser [1],
 262145: VespeneGeyser [40001],
 524289: VespeneGeyser [80001],
 786433: VespeneGeyser [C0001],
 1048577: LabMineralField [100001],
 1310721: LabMineralField [140001],
 1572865: LabMineralField [180001],
 1835009: CollapsibleRockTower [1C0001],
 2097153: LabMineralField [200001],
 2359297: LabMineralField [240001],
 2621441: LabMineralField [280001],
 2883585: LabMineralField [2C0001],
 3145729: LabMineralField [300001],
 3407873: MineralField [340001],
 3670017: MineralField [380001],
 3932161: CollapsibleRockTower [3C0001],
 4194305: MineralField [400001],
 4456449: MineralField [440001],
 4718593: MineralField [480001],
 4980737: MineralField [4C0001],
 5242881: MineralField [500001],
 5505025: MineralField [540001],
 5767169: MineralField [580001],
 6029313: SpacePlatformGeyser [5C0001],
 6291457: SpacePlatformGeyser [600001],
 6553601: VespeneGeyser [640001],
 6815745: VespeneGeyser [680001],
 7077889: MineralField [6C0001],
 7340033: LabMinera

In [269]:
game_links.iloc[113]

'https://www.gamereplays.org/starcraft2/replays.php?s=5dee6d45fae7e603d4c4e483bbef7fc9&game=33&show=download&&id=283410'

In [146]:
# 0    - 599  done
# 600  - 1199 done
# 1200 - 2400 trying...

In [244]:
for i in [100:300]:
    id_of_game = game_links.iloc[i].split('id=')[1]
    with open('./../../../sc2games/GameReplayOrg/'+str(i)+'_'+id_of_game+'.SC2Replay', 'wb') as destination:
        r = requests.get(game_links.iloc[i], allow_redirects=True)
        time.sleep(1)
        destination.write(r.content)
        time.sleep(1)
        if i%100 == 0: print(i, end=';')

In [259]:
game_links.iloc[113]

'https://www.gamereplays.org/starcraft2/replays.php?s=5dee6d45fae7e603d4c4e483bbef7fc9&game=33&show=download&&id=283410'

In [261]:
test = sc2reader.load_replay('./../../../../Downloads/Nuke_build__[Star2.org].sc2replay', load_level = 3)
test.objects

{}

In [None]:
from sklearn.linear_model import SGDClassifier, SGDRegressor

In [242]:
print(game_links.iloc[16284])

https://www.gamereplays.org/starcraft2/replays.php?s=a3422a1fbde5510259061b26426e651c&game=33&show=download&&id=180457


In [228]:
rr = sc2reader.load_replay('./../../../../Downloads/1_ling_into_hydra_vs_3GExpo_on_BS__[Star2.org].sc2replay')

In [230]:
rr.objects

{0: WarpGate [0],
 786433: DestructibleRock6x6 [C0001],
 786434: Egg [C0002],
 786435: Egg [C0003],
 786436: Hydralisk [C0004],
 786437: Pylon [C0005],
 786438: Egg [C0006],
 786439: Hydralisk [C0007],
 1310721: MineralField [140001],
 1572865: MineralField [180001],
 1835009: MineralField [1C0001],
 3145729: MineralField [300001],
 3932161: MineralField [3C0001],
 4194305: MineralField [400001],
 4456449: MineralField [440001],
 5242881: MineralField [500001],
 7077889: MineralField [6C0001],
 7340033: MineralField [700001],
 7602177: MineralField [740001],
 8650753: MineralField [840001],
 9175041: MineralField [8C0001],
 9437185: MineralField [900001],
 9699329: MineralField [940001],
 9961473: MineralField [980001],
 10223617: MineralField [9C0001],
 10485761: MineralField [A00001],
 10747905: MineralField [A40001],
 11272193: MineralField [AC0001],
 11796481: MineralField [B40001],
 12320769: MineralField [BC0001],
 12582913: MineralField [C00001],
 17563649: VespeneGeyser [10C000

In [271]:
i_need_redoing = []

In [272]:

for i,filepath in enumerate(file_names[100:600]):
    i += 100
    replay = sc2reader.load_replay('./../../../sc2games/GameReplayOrg/'+filepath+'.SC2Replay', load_level = 3)
    if len(replay.objects.keys()) == 0:
        print(i,replay.filename.split("/")[-1].split('.')[0])
        i_need_redoing.append(i)

113 113_283410
129 129_282712
133 133_282331
134 134_282327
135 135_282325
136 136_282277
137 137_282276
138 138_282261
139 139_282251
140 140_282223
141 141_282199
142 142_282190
143 143_282161
144 144_282147
145 145_282120
146 146_282100
147 147_282062
148 148_282019
149 149_281983
150 150_281923
151 151_281922
152 152_281921
153 153_281919
154 154_281918
155 155_281722
156 156_281720
157 157_281676
158 158_281674
159 159_281667
160 160_281627
161 161_281533
162 162_281532
163 163_281531
164 164_281530
165 165_281529
166 166_281528
167 167_281527
168 168_281526
169 169_281525
170 170_281524
171 171_281523
172 172_281522
173 173_281521
174 174_281520
175 175_281519
176 176_281518
177 177_281516
178 178_281515
179 179_281514
180 180_281512
181 181_281511
182 182_281510
183 183_281509
184 184_281508
185 185_281507
186 186_281506
187 187_281505
188 188_281504
189 189_281458
190 190_281404
191 191_281385
192 192_281349
193 193_281298
194 194_281250
195 195_281238
196 196_281237
197 197_28

In [235]:
i_need_redoing

[113,
 129,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185,
 186,
 187,
 188,
 189,
 190,
 191,
 192,
 193,
 194,
 195,
 196,
 197,
 198,
 199,
 200,
 201,
 202,
 203,
 204,
 205,
 206,
 207,
 208,
 209,
 210,
 211,
 212,
 213,
 214,
 215,
 216,
 217,
 218,
 219,
 220,
 221,
 222,
 223,
 224,
 225,
 226,
 227,
 228,
 229,
 230,
 231,
 232,
 233,
 234,
 235,
 236,
 237,
 238,
 239,
 240,
 241,
 242,
 243,
 244,
 245,
 246,
 247,
 248,
 249,
 250,
 251,
 252,
 253,
 254,
 255,
 256,
 257,
 258,
 259,
 260,
 261,
 262,
 263,
 264,
 265,
 266,
 267,
 268,
 269,
 270,
 271,
 272,
 273,
 274,
 275,
 276,
 277,
 278,
 279,
 280,
 281,
 282,
 283,
 284,
 285,
 286,
 287,
 288,
 289,
 290,
 291,
 292,
 293,
 294,
 295,
 296,
 297

In [221]:
test = sc2reader.load_replay('./../../../../Downloads/Nuke_build__[Star2.org].sc2replay')
test.objects

{38797313: OrbitalCommand [2500001],
 36962305: Drone [2340001],
 37224449: Drone [2380001],
 37486593: Drone [23C0001],
 37748737: Drone [2400001],
 38010881: Drone [2440001],
 38273025: Drone [2480001],
 3407873: MineralField [340001],
 5242881: MineralField [500001],
 4456449: MineralField [440001],
 35913729: Lair [2240001],
 36175873: Egg [2280001],
 36438017: Egg [22C0001],
 36700161: Egg [2300001],
 40108033: SCV [2640001],
 5767169: MineralField [580001],
 38535169: Overlord [24C0001],
 40370177: SCV [2680001],
 2621441: MineralField [280001],
 2883585: MineralField [2C0001],
 40632321: SCV [26C0001],
 40894465: Egg [2700001],
 4718593: MineralField [480001],
 2097153: MineralField [200001],
 1835009: MineralField [1C0001],
 39059457: SCV [2540001],
 39321601: SCV [2580001],
 39583745: SCV [25C0001],
 39845889: SCV [2600001],
 41418753: SCV [2780001],
 41680897: Drone [27C0001],
 36175874: Egg [2280002],
 41943041: SCV [2800001],
 36700162: Egg [2300002],
 2359297: MineralField

In [None]:
# r = requests.get(game_links.iloc[i])
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()
    time.sleep(2)

In [16]:
import requests, zipfile, io
r = requests.get('http://lotv.spawningtool.com/zip/?pro_only=on&p=80')
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()

<http.client.HTTPResponse at 0x10e6c9f60>

'304724'

In [32]:
with open('wabadabadubdub.SC2Replay', 'wb') as mf:
    mf

In [34]:
import sc2reader
sc2reader.load_replay('./wabadabadubdub.SC2Replay', load_level = 3).objects

{1: VespeneGeyser [1],
 262145: VespeneGeyser [40001],
 524289: VespeneGeyser [80001],
 786433: VespeneGeyser [C0001],
 1048577: LabMineralField [100001],
 1310721: LabMineralField [140001],
 1572865: LabMineralField [180001],
 1835009: CollapsibleRockTower [1C0001],
 2097153: LabMineralField [200001],
 2359297: LabMineralField [240001],
 2621441: LabMineralField [280001],
 2883585: LabMineralField [2C0001],
 3145729: LabMineralField [300001],
 3407873: MineralField [340001],
 3670017: MineralField [380001],
 3932161: CollapsibleRockTower [3C0001],
 4194305: MineralField [400001],
 4456449: MineralField [440001],
 4718593: MineralField [480001],
 4980737: MineralField [4C0001],
 5242881: MineralField [500001],
 5505025: MineralField [540001],
 5767169: MineralField [580001],
 6029313: SpacePlatformGeyser [5C0001],
 6291457: SpacePlatformGeyser [600001],
 6553601: VespeneGeyser [640001],
 6815745: VespeneGeyser [680001],
 7077889: MineralField [6C0001],
 7340033: LabMineralField [700001

In [11]:
b_url = 'http://lotv.spawningtool.com/zip/?pro_only=on&p='
for i in range(80,101):
    url = b_url + str(i)
    #requests.get(url)
    print('_'+url)

_http://lotv.spawningtool.com/zip/?pro_only=on&p=80
_http://lotv.spawningtool.com/zip/?pro_only=on&p=81
_http://lotv.spawningtool.com/zip/?pro_only=on&p=82
_http://lotv.spawningtool.com/zip/?pro_only=on&p=83
_http://lotv.spawningtool.com/zip/?pro_only=on&p=84
_http://lotv.spawningtool.com/zip/?pro_only=on&p=85
_http://lotv.spawningtool.com/zip/?pro_only=on&p=86
_http://lotv.spawningtool.com/zip/?pro_only=on&p=87
_http://lotv.spawningtool.com/zip/?pro_only=on&p=88
_http://lotv.spawningtool.com/zip/?pro_only=on&p=89
_http://lotv.spawningtool.com/zip/?pro_only=on&p=90
_http://lotv.spawningtool.com/zip/?pro_only=on&p=91
_http://lotv.spawningtool.com/zip/?pro_only=on&p=92
_http://lotv.spawningtool.com/zip/?pro_only=on&p=93
_http://lotv.spawningtool.com/zip/?pro_only=on&p=94
_http://lotv.spawningtool.com/zip/?pro_only=on&p=95
_http://lotv.spawningtool.com/zip/?pro_only=on&p=96
_http://lotv.spawningtool.com/zip/?pro_only=on&p=97
_http://lotv.spawningtool.com/zip/?pro_only=on&p=98
_http://lotv

In [None]:
import requests
from lxml import html
import pandas as pd

https://www.gamereplays.org/starcraft2/replays.php?game=33&show=master_replays&sort_by=latest_upload&search=&st=0

In [3]:
base_url = 'https://www.gamereplays.org/starcraft2/replays.php?game=33&show=master_replays&sort_by=latest_upload&search=&st='

indexes = list(range(0,16400,30))

def getHTML(index = 0):
    url = base_url + str(index)
    rawHTML = requests.get(url).content
    return str(rawHTML)

In [4]:
"""for i in indexes:
    with open('../../../../Desktop/replayhtmls.text', "a") as myfile:
            myfile.write(getHTML(index=i))"""

"""with open('../../../../Desktop/replayhtmls.text', "r") as myfile:
        m_html = myfile.read()
        tree = html.fromstring(m_html)"""

In [8]:
games = tree.xpath('//a[@class = "replay_index_button index_download"]/@href')

In [11]:
series = pd.Series(games)

In [19]:
series.to_clipboard()

http://lotv.spawningtool.com/zip/?pro_only=on&p=80
http://lotv.spawningtool.com/zip/?pro_only=on&p=81
http://lotv.spawningtool.com/zip/?pro_only=on&p=82
http://lotv.spawningtool.com/zip/?pro_only=on&p=83
http://lotv.spawningtool.com/zip/?pro_only=on&p=84
http://lotv.spawningtool.com/zip/?pro_only=on&p=85
http://lotv.spawningtool.com/zip/?pro_only=on&p=86
http://lotv.spawningtool.com/zip/?pro_only=on&p=87
http://lotv.spawningtool.com/zip/?pro_only=on&p=88
http://lotv.spawningtool.com/zip/?pro_only=on&p=89
http://lotv.spawningtool.com/zip/?pro_only=on&p=90
http://lotv.spawningtool.com/zip/?pro_only=on&p=91
http://lotv.spawningtool.com/zip/?pro_only=on&p=92
http://lotv.spawningtool.com/zip/?pro_only=on&p=93
http://lotv.spawningtool.com/zip/?pro_only=on&p=94
http://lotv.spawningtool.com/zip/?pro_only=on&p=95
http://lotv.spawningtool.com/zip/?pro_only=on&p=96
http://lotv.spawningtool.com/zip/?pro_only=on&p=97
http://lotv.spawningtool.com/zip/?pro_only=on&p=98
http://lotv.spawningtool.com/zi