###### Università degli Studi di Milano, Data Science and Economics Master Degree

# Duels

## A fantasy game for reinforcement learning

### Alfio Ferrara, Luigi Foscari

In **Duels** an autonomous agent fights a unlimited number of duels against other agents to score victory points. The game can be played in different versions depending on the reinforcement learning problem being addressed. For example, it can be played by a single agent learning against fictitious opponents in an MDP, it can be limited to a predefined number of duels (finite horizon), or it can involve autonomous agents competing against each other while learning their own game strategies (MARL).

The set of common rules for the game are described in the following section, followed by specific rules for other settings.

## Base Game

A game of Duels is a sequence of fights that may end with a **victory**, a **retreat**, or the **death** of the hero (the Agent). In case of death, the game ends immediately. In case of a retreat, the hero loses victory points (VP) but can immediatly engage a new duel against a weaker opponent. In case of a win, the hero gains victory points (VP) and immediatly engages a new duel against a stronger opponent.

#### The duel

A single duel is composed by a sequence of rounds. In each round, each duelist **performs an action**. Each **action can either succeed or fail**. If it succeeds, it has **an effect on the opponent in terms of hit points** (HP). The **effect depends on the action chosen by the opponent**, as specified in the following table. **If it fails, there is no effect**. In a **base game version** the action is always a success and the outcome of the action in terms of the HP loss for the opponent depends on the action chosen by the two players as follows.

In [4]:
import gymnasium as gym
import gymbase.environments
import pandas as pd
import numpy as np

env = gym.make("Duels-v0", starting_hp=20, opponent_distr=None)
observation, info = env.reset()

moves = env.unwrapped.ACTION_TO_MOVES
action_outcome = pd.DataFrame(env.unwrapped.EFFECTIVENESS_TABLE, index=moves, columns=moves)
action_outcome

Unnamed: 0,melee,ranged,spell,retreat,heal
melee,-4,-6,-2,0,0
ranged,-2,-4,-4,0,0
spell,-6,-2,-4,0,0
retreat,-2,-6,-4,0,0
heal,-2,-2,-2,1,1


#### Episodic interaction
An episode of `Duels` ends when one of the opponents dies or retreats. In each step, the `BasicDuels` environment chooses actions according to a specific probability distribution over the actions.

In [5]:
pd.Series(env.unwrapped._opponent_distr, index=moves)

melee      0.233832
ranged     0.229199
spell      0.176864
retreat    0.099440
heal       0.260665
dtype: float64

Let's fight!

In [6]:
opponent_preferences = np.array([100, 60, 10, 1, 20])
opponent_dist = opponent_preferences / opponent_preferences.sum()

env = gym.make("Duels-v0", starting_hp=20, opponent_distr=opponent_dist)
observation, info = env.reset()

print(f"Agent starts with {observation['agent']} hit points")
print(f"Opponent starts with {observation['opponent']} hit points\n")

end_episode = False
while not end_episode:
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    print(f"Agent uses {info['agent']} and opponent uses {info['opponent']}")
    print(f"Agent now has {observation['agent']} HP and opponent has {observation['opponent']} HP\n")

    if truncated:
        print("They decided that today was not a good day to fight")
    elif terminated:
        if observation['agent'] <= 0 and observation['opponent'] <= 0:
            print("The hero died facing the evil threat")
        elif reward > 0:
            print("The hero vanquished evil")
        elif reward < 0:
            print("The evil prevailed")
        
    end_episode = terminated or truncated
env.close()

Agent starts with 20 hit points
Opponent starts with 20 hit points

Agent uses spell and opponent uses spell
Agent now has 16 HP and opponent has 16 HP

Agent uses melee and opponent uses heal
Agent now has 16 HP and opponent has 14 HP

Agent uses ranged and opponent uses melee
Agent now has 14 HP and opponent has 8 HP

Agent uses melee and opponent uses melee
Agent now has 10 HP and opponent has 4 HP

Agent uses retreat and opponent uses melee
Agent now has 8 HP and opponent has 4 HP

They decided that today was not a good day to fight
