Starcraft II replay converter

Extracts data from websites and creates datasets for ML or analysis purposes.

Setup • Configuration • Usage • Table schemes •

About the Project

This repository is dedicated to gathering and organizing datasets for machine learning based StarCraft II bots. The aim of this project is twofold - firstly, it provides a tool to collect replay data that can be used in supervised training methods; secondly, it creates datasets suitable for use with value functions in reinforcement learning algorithms.

Available functionality:

Collect replays from two websites
Preprocess data into a human readable form
Transform data and load it into the DB.

Limitations to consider:

The only available game mode is 1v1.
Made for game version from 5.0.0 to 5.0.11

Prerequisites

Python <= 3.9 (the latest sc2replay library is available in Python version 3.9).
Access to configured PostgreSQL database.
Packages listed in requirements.txt.
Optionally: jupyter notebook

Setup

Create a new database in postgres (You can use this guide, for linux or this guide for windows)

Create a new database (using psql):

create database sc2replays;
\c sc2replays

Clone the repository by running

git clone https://github.com/dvarkless/sc2_replay_converter.git

Create a python virtual environment:

cd sc2_replay_converter
python -m venv venv

If you are using Linux or Mac:

source ./venv/bin/activate

If you are using Windows:

./venv/Scripts/activate.ps1

Install packages:

pip install -r requirements.txt

Download submodule

git submodule update --init --recursive

Configuration

Configuration files can be found in ./configs directory

Database access:

File ./configs/secrets.yml

db_host: localhost # Database url address
db_name: sc2replays # Database name
db_user: dvarkless # Username which can interract with the DB
db_password: password # Password for this user, set to `None` if it is not set

File ./configs/downloader_config.yml

The only reasonable thing to change here is user-agent:

headers:
  user_agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
  # Chrome from Windows device

If you want to add another site, you should add it into the config and write another method in class ReplayDownloader (def name_yield: ...).

Usage

The example code is provided in the download_and_process.ipynb

Collect replays:

from replay_downloader import ReplayDownloader

REPLAY_DIR = "../replays"
DOWNLOADER_CONFIG = "./configs/downloader_config.yml"

downloader = ReplayDownloader(REPLAY_DIR, DOWNLOADER_CONFIG, max_count=500, jupyter=True)
downloader.start_download("sc2rep")
# downloader.start_download("spawningtool")

Preprocess files

from replay_process import ReplayProcess, ReplayFilter
from datetime import datetime

REPLAY_DIR = "../replays"
SECRETS = "./configs/secrets.yml"
GAME_INFO_FILE = "./starcraft2_replay_parse/game_info.csv"

processor = ReplayProcess(
    SECRETS,
    DATABASE_CONFIG,
    GAME_INFO_FILE,
    jupyter=True
)

# Setup filter
replay_filter = ReplayFilter()
replay_filter.is_1v1 = True # Select only 1v1 games
replay_filter.game_len = [1920, 38400] # Games with length from 2 to 40 mins
replay_filter.time_played = datetime(2021, 1, 1) # Earliest allowed game

# Process replays (this should take a while)
processor.process_replays(REPLAY_DIR, filt=replay_filter)

Create dataset tables

from itertools import product
from pipeline import PipelineComposer

MINS_PER_SAMPLE = 4 # Take first samples every 4 minutes on average
PRED_STEP = 1 # Take every second samples 1 minute later
MIN_LEAGUE = 3 # Min league is Gold

r_pairs = product("ZTP", repeat=2) # ((Z, Z), (Z, T), ...)
matchups = ["v".join((r1, r2)) for r1, r2 in r_pairs] # ['ZvZ', 'ZvT', ...]
composer = PipelineComposer("ZvZ", tick_step=32)

# Create pipelines for each table type
for matchup in matchups:
    composer.change_matchup(matchup)
    comp_pipeline = composer.get_compositon(MINS_PER_SAMPLE, PRED_STEP, MIN_LEAGUE)
    comp_pipeline.run()

Table schemes:

Table schemes can be found in ./queries/create_*.sql

Dataset tables are created dynamically.
PRIMARY KEYS: tick, game_id. FOREIGN KEY: game_id REFERENCES game_info.
Their structure:

*_comp tables:

[NOTE] This tables are used to train which unit the agent should build next based on army composition and scouting info.

player_unit: INTEGER,
...
player_building: INTEGER,
...
player_minerals_available: INTEGER, 
player_vespene_available: INTEGER, 
enemy_unit: INTEGER,
...
out_unit: NUMERIC(4, 3) # 0.001 # player's units in 1 minute from current tick
...

*_winprob tables:

[NOTE] This tables are used to train agents to predict game outcome based on the available information.

game_id: INTEGER,
tick: INTEGER,
player_unit: INTEGER,
...
player_building: INTEGER,
...
player_upgrade: INTEGER,
...
player_minerals_available: INTEGER, 
player_vespene_available: INTEGER, 
enemy_unit: INTEGER,
...
enemy_building: INTEGER,
...
out_winprob: NUMERIC(4, 3) # 0.001 # probability what this game ends in 1 minute
								   # with 1 - player's win 
								   # or 0 - player's defeat

*_enemycomp tables:

[NOTE] This tables are used to train agents to predict enemy composition based on scouted buildings.

game_id: INTEGER,
tick: INTEGER,
enemy_building: INTEGER,
...
out_unit: NUMERIC(4, 3) # 0.001 # enemy units in 1 minute from now

matchups:

First letter of matchup means player's game race.
The last letter is enemy's race.
For example, 'ZvT' means player = 'Zerg', enemy = 'Terran'.
This affect table's unit, building and upgrades columns. Columns can be found in ./starcraft2_replay_parse/data/game_info.csv.

[NOTE] Mirror matchups count twice, player and enemy change their places.

License

Distributed under the MIT License. See LICENSE.txt for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
configs		configs
game_data		game_data
queries		queries
starcraft2_replay_parse @ 33c4053		starcraft2_replay_parse @ 33c4053
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.txt		LICENSE.txt
README.md		README.md
config.py		config.py
database_access.py		database_access.py
download_and_process.ipynb		download_and_process.ipynb
pipeline.py		pipeline.py
pytest.ini		pytest.ini
replay_downloader.py		replay_downloader.py
replay_process.py		replay_process.py
requirements.txt		requirements.txt
run_profiling.py		run_profiling.py
setup_logger.py		setup_logger.py
training_data.py		training_data.py

License

dvarkless/sc2_replay_converter

Folders and files

Latest commit

History

Repository files navigation

Starcraft II replay converter

About the Project

Prerequisites

Setup

Configuration

Database access:

Usage

Table schemes:

*_comp tables:

*_winprob tables:

*_enemycomp tables:

matchups:

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages