###### Università degli Studi di Milano, Data Science and Economics Master Degree

# Duels

## A fantasy game for reinforcement learning

### Alfio Ferrara, Luigi Foscari

In **Duels** an autonomous agent fights a unlimited number of duels against other agents to score victory points. The game can be played in different versions depending on the reinforcement learning problem being addressed. For example, it can be played by a single agent learning against fictitious opponents in an MDP, it can be limited to a predefined number of duels (finite horizon), or it can involve autonomous agents competing against each other while learning their own game strategies (MARL).

The set of common rules for the game are described in the following section, followed by specific rules for other settings.

## Base Game

A game of Duels is a sequence of fights that may end with a **victory**, a **retreat**, or the **death** of the hero (the Agent). In case of death, the game ends immediately. In case of a retreat, the hero loses victory points (VP) but can immediatly engage a new duel against a weaker opponent. In case of a win, the hero gains victory points (VP) and immediatly engages a new duel against a stronger opponent.

#### The duel

A single duel is composed by a sequence of rounds. In each round, each duelist **performs an action**. Each **action can either succeed or fail**. If it succeeds, it has **an effect on the opponent in terms of hit points** (HP). The **effect depends on the action chosen by the opponent**, as specified in the following table. **If it fails, there is no effect**. In a **base game version** the action is always a success and the outcome of the action in terms of the HP loss for the opponent depends on the action chosen by the two players as follows.

In [None]:
import gymnasium as gym
import gymbase.environments
import pandas as pd

env = gym.make("Duels-v0", starting_hp=20, opponent_distr=None)
observation, info = env.reset()



AttributeError: 'OrderEnforcing' object has no attribute 'EFFECTIVENESS_TABLE'

In [None]:
env = gym.make("Duels-v0", starting_hp=20, opponent_distr=None)
observation, info = env.reset()

print(f"Agent starts with {observation['agent']} hit points")
print(f"Opponent starts with {observation['opponent']} hit points\n")

end_episode = False
while not end_episode:
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

    print(f"Agent uses {info['agent']} and opponent uses {info['opponent']}")
    print(f"Agent now has {observation['agent']} HP and opponent has {observation['opponent']} HP\n")

    if truncated:
        print("They decided that today was not a good day to fight")
    elif terminated:
        if observation['agent'] <= 0 and observation['opponent'] <= 0:
            print("The hero died facing the evil threat")
        elif reward > 0:
            print("The hero vanquished evil")
        elif reward < 0:
            print("The evil prevailed")
        
    end_episode = terminated or truncated
env.close()