GitHub - eqy/artolearn: win/loss data and artosalt mining from artosis vods via vision and learning

artolearn

artolearn is a collection of scripts for (1) parsing Artosis VODs to extract csvs of game data (e.g., opponent rank, mmr, map, and game outcome), and (2) building and tuning machine learning models for online prediction of game outcomes (gamba gamba MODS OPEN CASINO).

Data Extraction

Game data is extracted via OpenCV, where still frames of games are matched against reference game scenes to determine the current stream VOD state (e.g., match-found (where MMRs are displayed), in-game, post-game (where outcomes are displayed), and so on. Once frames are matched, additional OCR is performed if necessary to extract information such as the current map, the MMRs of players, and so on. The "heavy lifting" is done by simple nearest-neighbor matching to reference images for scenes and a few other scenarios (e.g., determining the race selected by each player in the match screen). Additional heavy lifting is done by pytesseract for OCR, with some intermediate steps in OpenCV for thresholding and other operations to make the text more OCR-friendly.

The data extraction CLI is provided by video2csv.py. Extracted data csvs are in data/. Note that for copyright reasons, the reference scene images are omitted from this repository.

Modeling and Prediction

As the extracted data contains a mixture of numerical (player MMRs) and categorical features (map, races chosen by players, ...) the current modeling uses xgboost with optuna for hyperparameter tuning.

Prediction experiments are still at a very early and highly experimental stage, with many changes pending due to some tricky gotchas associated with a time-dependent classification task. For example, it may be useful to include time-dependent features such as the current stream uptime and win/loss streaks within a stream, but this introduces potential leaks of data labels. Similarly, it may be useful to limit the time horizon of included datapoints to account for MMR inflation or metagame shifts (e.g., by pruning very old game results from the dataset), but treating the time horizon as a hyperparameter may increase overfitting.

The tuning CLI is provided by csv2model.py. The live inference CLI is provided by infer.py.

Additional Fun Summary Statistics From Collected Data

Auto-updated 2022-03-07 21:27:44 (639 games)

per-race+rank winrates

rank	vs. p	vs. r	vs. t	vs. z	overall
unranked	51.9%	33.3%	55.6%	40.9%	48.6%
a	48.1%	66.7%	48.6%	57.8%	52.1%
b	86.3%	100.0%	79.4%	84.6%	85.5%
c	100.0%	100.0%	100.0%	80.0%	92.3%
d	100.0%	nan%	nan%	nan%	100.0%
s	33.3%	0.0%	26.1%	17.4%	26.5%
overall	55.3%	79.3%	53.3%	57.1%	56.5%

baseline accuracy from rank alone: 62.9%

baseline accuracy from race alone: 56.5%

race+rank baseline accuracy: 64.5%

baseline accuracy from always picking higher mmr player:66.1%

map/matchup winrates

map	vs. p	vs. r	vs. t	vs. z
eclipse	47.2%	75.0%	55.8%	64.0%
largo	67.3%	100.0%	46.4%	63.8%
polypoid	58.8%	66.7%	61.5%	57.1%
goodnight	50.0%	100.0%	42.3%	33.3%
revolver	nan%	nan%	0.0%	100.0%

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
data		data
scene_reference		scene_reference
.gitignore		.gitignore
README.md		README.md
README_base.md		README_base.md
TODOS.md		TODOS.md
constants.py		constants.py
csv2model.py		csv2model.py
data.py		data.py
infer.py		infer.py
train.py		train.py
train.sh		train.sh
video.py		video.py
video2csv.py		video2csv.py

eqy/artolearn

Folders and files

Latest commit

History

Repository files navigation

artolearn

Data Extraction

Modeling and Prediction

Additional Fun Summary Statistics From Collected Data

per-race+rank winrates

map/matchup winrates

About

Resources

Stars

Watchers

Forks

Languages