# Fighting Game AI

## Introduction
Do you know about Pokemon? Well this turn-based fighting game is similar to Pokemon, but we do not send out monsters to duke it out. Instead, the battle is between humans. Since the actions that can be taken by both players are the same, there is some form of game theory involved and we can determine the Nash equilibrium of the game through the use of Monte Carlo Tree Search (MCTS).

Players start with 100 health. For each turn, players can choose one of four actions available:
1. <b>Attack</b> <i>{DAMAGE SKILL}</i> (opponent is damaged for 8-12 health) 
2. <b>Heal</b> <i>{HEAL SKILL}</i> (self is healed for 7-10 health)
3. <b>Power Up</b> <i>{BUFF SKILL}</i> (effectiveness of damage skills increase by 25%, to a maximum of 100%)
4. <b>Superspeed</b> <i>{BUFF SKILL}</i> (chance of dodging a damage skill or sudden death effect increases by 5%, to a maximum of 100%)

Starting from the 11th turn, sudden death will begin, and the players will be hit for 2 * (number_of_turns_passed - 10) health at the end of their turn. After turn 50, if both players have not fainted yet, the game is a draw. Players are able to dodge the effects of sudden death as well.

For buff skills, we will use the example of the skill "power up" to show how effects can stacked. When the player has used  "power up" twice, the effectiveness of damage skills is increased by 50%.

You can edit the various variables to see how the sequence of actions that lead to the best result would differ.

In [1]:
!pip install mcts_simple

In [2]:
import numpy as np
import random

from gym import Env
from gym.spaces import Discrete, Box

class Skill:
    def __init__(self, player):
        self.name = None
        self.type = None
        self.description = None
        self.player = player
    
    def use(self, opponent):
        pass
    
    def damage(self, opponent, damage):
        if random.random() < opponent.dodge_rate:
            if self.player.output:
                print(f"Player {opponent.name} dodged the incoming attack.")
        else:
            original_health = opponent.health
            opponent.health = max(opponent.health - int(damage * self.player.damage_multiplier), 0)
            if self.player.output:
                print(f"Player {opponent.name} is hit for {original_health - opponent.health} damage by the skill {self.name}.")
    
    def heal(self, damage):
        original_health = self.player.health
        self.player.health = min(self.player.health + damage, 100)
        if self.player.output:
            print(f"Player {self.player.name} is healed for {self.player.health - original_health} damage using the skill {self.name}.")
    
    def increase_damage(self, multiplier):
        original_damage_multiplier = self.player.damage_multiplier
        self.player.damage_multiplier = max(min(self.player.damage_multiplier + multiplier, 2), 0)
        if self.player.output:
            print(f"Player {self.player.name}'s damage has increased by {self.player.damage_multiplier - original_damage_multiplier:.0%} using the skill {self.name}.")
        
    def increase_dodge(self, dodge_rate):
        original_dodge_rate = self.player.dodge_rate
        self.player.dodge_rate = max(min(self.player.dodge_rate + dodge_rate, 1), 0)
        if self.player.output:
            print(f"Player {self.player.name}'s dodge rate has increased by {self.player.dodge_rate - original_dodge_rate:.0%} using the skill {self.name}.")
        
class Attack(Skill):
    def __init__(self, player):
        self.name = "Attack"
        self.type = "Damage"
        self.description = "Opponent is hit for 8-12 health."
        self.player = player
    
    def use(self, opponent):
        self.damage(opponent, random.randint(8, 12))
        
class Heal(Skill):
    def __init__(self, player):
        self.name = "Heal"
        self.type = "Heal"
        self.description = "Player is healed for 7-10 health."
        self.player = player
    
    def use(self, opponent):
        self.heal(random.randint(7, 10))
        
class PowerUp(Skill):
    def __init__(self, player):
        self.name = "Power Up"
        self.type = "Buff"
        self.description = "Effectiveness of damage skills increases by 25%. Damage multiplier can be increased to a maximum of 100%."
        self.player = player
    
    def use(self, opponent):
        self.increase_damage(0.25)
        
class Superspeed(Skill):
    def __init__(self, player):
        self.name = "Superspeed"
        self.type = "Buff"
        self.description = "Chance of dodging damage skills or effect from sudden death increases by 5%. Dodge rate can be increased to a maximum of 100%."
        self.player = player
    
    def use(self, opponent):
        self.increase_dodge(0.05)

class Player:
    def __init__(self, number, player_name, output = False):
        self.number = number
        self.name = player_name
        self.skills = [Attack(self), Heal(self), PowerUp(self), Superspeed(self)]
        self.health = 100
        self.damage_multiplier = 1
        self.dodge_rate = 0
        self.output = output
        
    def has_fainted(self):
        return self.health <= 0
    
class FightingGameEnv(Env): # Multi agent environment
    def __init__(self, output = False):
        # Output
        self.output = output

        # Players
        self.player_1 = Player(1, "1", self.output)
        self.player_2 = Player(2, "2", self.output)
        self.agents = [self.player_1, self.player_2]
        self.agent_mapping = dict(zip(self.agents, list(range(len(self.agents)))))
        self.agent_selection = self.agents[0]

        # Actions
        self.action_spaces = {agent: Discrete(4) for agent in self.agents}
        
        # Observations {}: set of int, []: range of float
        # <<<player: {1, 2}, turn_number: {1, ..., 50}, sudden_death: {0, 1}, sudden_death_damage: {0, ..., 80},
        # player_1_health: {0, ..., 100}, player_2_health: {0, ..., 100}, player_1_damage_multiplier: [1, 2], 
        # player_2_damage_multiplier: [1, 2], player_1_dodge_rate: [0, 1], player_2_dodge_rate: [0, 1]>>>
        self.observation_spaces = {agent: Box(np.array([1., 1., 0., 0., 0., 0., 1., 1., 0., 0.], dtype = np.float64), np.array([1., 50., 1., 80., 100., 100., 2., 2., 1., 1.], dtype = np.float64), dtype = np.float64) for agent in self.agents}

        # Parameters
        self.turn_number = 1
        self.episode_length = 50
        self.sudden_death_damage = 2
        
        # Things to return
        self.observations = {agent: self.get_state() for agent in self.agents}
        self.actions = {agent: None for agent in self.agents}
        self.rewards = {agent: 0 for agent in self.agents} # DO NOT RETURN THIS
        self.cumulative_rewards = {agent: 0 for agent in self.agents}
        self.dones = {agent: False for agent in self.agents}
        self.infos = {agent: {} for agent in self.agents}

    def render(self, mode = "human"):
        # Output
        if self.output:
            print("Turn:", self.turn_number)
            print(f"Player {self.agents[0].name} health: {self.agents[0].health}")
            print(f"Player {self.agents[0].name} damage increase: {self.agents[0].damage_multiplier - 1:.0%}")
            print(f"Player {self.agents[0].name} dodge rate: {self.agents[0].dodge_rate:.0%}")
            print(f"Player {self.agents[1].name} health: {self.agents[1].health}")
            print(f"Player {self.agents[1].name} damage increase: {self.agents[1].damage_multiplier - 1:.0%}")
            print(f"Player {self.agents[1].name} dodge rate: {self.agents[1].dodge_rate:.0%}")

    def get_state(self):
        return np.array([self.agent_mapping[self.agent_selection] + 1,
                         self.turn_number,
                         1 if self.turn_number > 10 else 0,
                         self.sudden_death_damage * max(self.turn_number - 10, 0),
                         self.player_1.health,
                         self.player_2.health,
                         self.player_1.damage_multiplier,
                         self.player_2.damage_multiplier,
                         self.player_1.dodge_rate,
                         self.player_2.dodge_rate],
                         dtype = np.float64)

    def reset(self):
        self.__init__(self.output) # reset classes made

    def step(self, action):
        # Agent's action
        self.actions[self.agent_selection] = action

        # Reset rewards
        for agent in self.rewards:
            self.rewards[self.agent_selection] = 0
        self.cumulative_rewards[self.agent_selection] = 0

        # Track previous health
        self_health = self.agent_selection.health
        opponent_health = self.agents[self.agent_mapping[self.agent_selection] ^ 1].health

        # Agent takes action
        self.agent_selection.skills[action].use(self.agents[self.agent_mapping[self.agent_selection] ^ 1])
        self.infos[self.agent_selection]["action"] = action # logging purposes

        # Sudden death
        if not self.agents[self.agent_mapping[self.agent_selection] ^ 1].has_fainted(): # if opponent has not fainted
            sudden_death_damage = self.sudden_death_damage * max(self.turn_number - 10, 0)
            if sudden_death_damage:
                if random.random() < self.agent_selection.dodge_rate:
                    if self.output:
                        print(f"Player {self.agent_selection.name} dodged the sudden death effect.")
                else:
                    temp_health = self.agent_selection.health
                    self.agent_selection.health = max(self.agent_selection.health - sudden_death_damage, 0)
                    if self.output:
                        print(f"Player {self.agent_selection.name} has been hit for {temp_health - self.agent_selection.health} health by sudden death!")
        
        # Calculate rewards
        if self.agent_selection.has_fainted():
            self.rewards[self.agent_selection] -= 1
            self.rewards[self.agents[self.agent_mapping[self.agent_selection] ^ 1]] += 1
        elif self.agents[self.agent_mapping[self.agent_selection] ^ 1].has_fainted():
            self.rewards[self.agent_selection] += 1
            self.rewards[self.agents[self.agent_mapping[self.agent_selection] ^ 1]] -= 1

        # Determine episode completion
        if self.agent_mapping[self.agent_selection] == 0: # PLAYER 1
            self.dones = {agent: self.player_1.has_fainted() or self.player_2.has_fainted() for agent in self.agents}
        elif self.agent_mapping[self.agent_selection] == 1: # PLAYER 2
            self.turn_number += 1
            self.dones = {agent: self.turn_number >= self.episode_length or self.player_1.has_fainted() or self.player_2.has_fainted() for agent in self.agents} # check for turn number only applies at end of second player's turn

        # Selects the next agent
        self.agent_selection = self.agents[self.agent_mapping[self.agent_selection] ^ 1]
        
        # Next agent's observation
        self.observations[self.agent_selection] = self.get_state()
            
        # Update rewards
        for agent, reward in self.rewards.items():
            self.cumulative_rewards[agent] += reward
            
        # Output next line
        if self.output:
            print()

In [3]:
from mcts_simple import Game
from copy import deepcopy

class FightingGame(Game):
    def __init__(self, output = False):
        self.env = FightingGameEnv(output)
        self.prev_env = None
        
    def render(self):
        self.env.render()
        
    def get_state(self):
        return tuple(self.env.get_state())
        
    def number_of_players(self):
        return len(self.env.agents)
    
    def current_player(self):
        return self.env.agent_selection.name
    
    def possible_actions(self):
        return [str(i) for i in range(4)]
    
    def take_action(self, action):
        if action not in self.possible_actions():
            raise RuntimeError("Action taken is invalid.")
        action = int(action)
        self.prev_env = deepcopy(self.env)
        self.env.step(action)
        
    def delete_last_action(self):
        if self.prev_env is None:
            raise RuntimeError("No last action to delete.")
        if self.env.output:
            raise RuntimeError("Output to terminal should be disabled using output = False when deleting last action.")
        self.env = self.prev_env
        self.prev_env = None
        
    def has_outcome(self):
        return True in self.env.dones.values()
    
    def winner(self):
        if not self.has_outcome():
            raise RuntimeError("winner() cannot be called when outcome is undefined.")
        if self.env.player_2.has_fainted() or self.env.player_1.health > self.env.player_2.health:
            return self.env.player_1.name
        elif self.env.player_1.has_fainted() or self.env.player_2.health > self.env.player_1.health:
            return self.env.player_2.name
        else:
            return None

In [4]:
## This example shows how Open loop MCTS deals with uncertainty ###
from mcts_simple import OpenLoopMCTS, OpenLoopUCT

# Export trained MCTS
print("Export trained MCTS")
mcts = OpenLoopMCTS(FightingGame(output = False))
mcts.run(iterations = 50000)
mcts._export("FightingGame_MCTS.json")
print()

# Import trained MCTS
print("Import trained MCTS")
mcts = OpenLoopMCTS(FightingGame(output = True))
mcts._import("FightingGame_MCTS.json")
mcts.self_play(activation = "best")
print()

# Export trained UCT
print("Export trained UCT")
uct = OpenLoopUCT(FightingGame(output = False))
uct.run(iterations = 100000)
uct._export("FightingGame_UCT.json")
print()

# Import trained UCT
print("Import trained UCT")
uct = OpenLoopUCT(FightingGame(output = True))
uct._import("FightingGame_UCT.json")
uct.self_play(activation = "best")
print()

# Play with UCT agent
print("Play with UCT agent")
uct = OpenLoopUCT(FightingGame(output = True))
uct._import("FightingGame_UCT.json")
uct.play_with_human(activation = "linear")
print()

Export trained MCTS


HBox(children=(FloatProgress(value=0.0, description='Simulating', max=50000.0, style=ProgressStyle(description…



Import trained MCTS
Turn: 1
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 0%
Player 2 health: 100
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Player 1's dodge rate has increased by 5% using the skill Superspeed.

Turn: 1
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 100
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Player 2 is healed for 0 damage using the skill Heal.

Turn: 2
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 100
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Player 2 is hit for 12 damage by the skill Attack.

Turn: 2
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 88
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Player 2's damage has increased by 25% using the skill Power Up.

Turn: 3
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 88
Player 2 

HBox(children=(FloatProgress(value=0.0, description='Simulating', max=100000.0, style=ProgressStyle(descriptio…



Import trained UCT
Turn: 1
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 0%
Player 2 health: 100
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Player 1's dodge rate has increased by 5% using the skill Superspeed.

Turn: 1
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 100
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Player 2's damage has increased by 25% using the skill Power Up.

Turn: 2
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 100
Player 2 damage increase: 25%
Player 2 dodge rate: 0%
Player 1's dodge rate has increased by 5% using the skill Superspeed.

Turn: 2
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 10%
Player 2 health: 100
Player 2 damage increase: 25%
Player 2 dodge rate: 0%
Player 2's dodge rate has increased by 5% using the skill Superspeed.

Turn: 3
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge 

Turn: 1
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 0%
Player 2 health: 100
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Input user action: 3
Player 1's dodge rate has increased by 5% using the skill Superspeed.

Turn: 1
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 100
Player 2 damage increase: 0%
Player 2 dodge rate: 0%
Player 2's damage has increased by 25% using the skill Power Up.

Turn: 2
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 5%
Player 2 health: 100
Player 2 damage increase: 25%
Player 2 dodge rate: 0%
Input user action: 3
Player 1's dodge rate has increased by 5% using the skill Superspeed.

Turn: 2
Player 1 health: 100
Player 1 damage increase: 0%
Player 1 dodge rate: 10%
Player 2 health: 100
Player 2 damage increase: 25%
Player 2 dodge rate: 0%
Player 2's damage has increased by 25% using the skill Power Up.

Turn: 3
Player 1 health: 100
Player 1 damage increase: 0%

Input user action: 1
Player 1 is healed for 10 damage using the skill Heal.
Player 1 has been hit for 16 health by sudden death!

Turn: 18
Player 1 health: 48
Player 1 damage increase: 50%
Player 1 dodge rate: 40%
Player 2 health: 60
Player 2 damage increase: 100%
Player 2 dodge rate: 25%
Player 2 is healed for 7 damage using the skill Heal.
Player 2 has been hit for 16 health by sudden death!

Turn: 19
Player 1 health: 48
Player 1 damage increase: 50%
Player 1 dodge rate: 40%
Player 2 health: 51
Player 2 damage increase: 100%
Player 2 dodge rate: 25%
Input user action: 1
Player 1 is healed for 8 damage using the skill Heal.
Player 1 dodged the sudden death effect.

Turn: 19
Player 1 health: 56
Player 1 damage increase: 50%
Player 1 dodge rate: 40%
Player 2 health: 51
Player 2 damage increase: 100%
Player 2 dodge rate: 25%
Player 2's damage has increased by 0% using the skill Power Up.
Player 2 has been hit for 18 health by sudden death!

Turn: 20
Player 1 health: 56
Player 1 damage in