Skip to content
Anderson edited this page Sep 10, 2018 · 6 revisions

Welcome to MegaBot wiki!

What is MegaBot?

MegaBot is a proof-of-concept StarCraft bot to test strategy selection. For MegaBot, strategy means the incorporation of the behavior of another bot to play a game. So far, MegaBot can choose between 2015 versions of Skynet, Xelnaga and NUSBot.

Strategy selection in StarCraft

At first, we tested this concept on a general setting, by modeling the process of strategy selection in StarCraft as a normal-form game, where actions were the available bots. We noticed that some bots interact in a rock-paper-scissor fashion from results of previous AI competitions, so we decided to perform a game-theoretic study of this interaction.

With the strategy selection process modeled as a normal-form game, we calculated Nash Equilibrium and obtained a strategy selection method with game-theoretic guarantees: it wins 50% matches in expectation, regardless of what the opponent does. We went further and implemented other strategy selection mechanisms, including safe opponent exploitation: it can achieve higher win rate than Nash Equilibrium with potential losses theoretically bounded.

All this work resulted in a paper [1].

In the paper, however, we have not addressed three issues:

a) The player must know the possible strategies of its opponent and how good they are. In a general setting (e.g.: an AI tournament) the opponent has set of strategies that is possibly unknown to the player.

b) We ignore context information (map, opponent, game state, etc.). Thus we do not exploit the fact that some strategies might perform better than others in a given context. In our paper, we select a strategy at match start, ignoring all context information and never switch.

c) Our strategies (StarCraft bots) evolve with experience. This makes the normal-form game's payoff matrix non-stationary. In the paper we disabled bot evolution but in a general setting this cannot be done.

Moreover, for the paper, experiments were performed with previously recorded match results, i.e., players selected their strategies (bots) and instead of executing a match, we queried the result from the records. Please find more information about the paper experiment engine here.

An actual StarCraft player bot

To address the issues pointed in the previous section, implement an actual StarCraft-playing bot and participate in AI competitions, we proceeded as follows:

For (a), we adopted epsilon-greedy as our strategy selection mechanism (in the paper we called it alpha-greedy). Epsilon-greedy consists of choosing a random strategy with probability epsilon and choosing the best known strategy with probability (1-epsilon). Nash Equilibrium does not makes sense if opponent has an unknown set of strategies, and epsilon-greedy can discover a good strategy even in this case (although a clever opponent can exploit epsilon-greedy and make it perform badly).

For (b), we started using very basic context information, namely, the opponent's identification. Thus, we keep track of how our strategies perform against each specific opponent.

For (c), we update the value of a strategy against a given opponent with the rule:

V[s, o] <- (1-alpha) * V[s, o] + alpha * result

where V[s, o] is the value of strategy 's' against opponent 'o', alpha (0 <= alpha <= 1) is the learning rate and result is -1 for defeat and +1 for victory. This is the famous Q-learning update rule and it is suitable here because it values recent results more than old ones and copes with noisy samples. Thus, if a previously victorious strategy starts losing against a specific opponent, we are able to detect it quickly and switch to another strategy.

References

[1] Tavares, Azpúrua, Santos, Chaimowicz. Rock, Paper, StarCraft: Strategy Selection in Real-Time Strategy Games. In: AIIDE 2016.