# 🪄 Mulit-Agent Generative Formalization: Tutorial

## 📑 About

Mulit-Agent Generative Formalization (**MAGiF**, formerly GAMA) is a Python- and Prolog-based simulator that enables users to create, simulate, and analyze strategic interactions using autoformalizing agents. The project supports game-theoretic experiments and includes tools for validating autoformalized Prolog programs. Currently, it supports 2×2 simultaneous-move games, but its modular architecture allows for extensions to other types of games.

## 1. 🔧 Setup

To set up the tutorial, run the cells below.

In [None]:
import sys, os
from pathlib import Path

if 'google.colab' in sys.modules:
    from IPython.display import Markdown, display
    display(Markdown("""
> ⚠️ **This notebook is read-only.**
>
> To save your own editable version, go to:
>
> `File → Save a copy in Drive`
>
> Then work from that copy so your changes are saved!
    """))
else:
    print("Local environment detected — no need to save a copy.")

Enter your OpenAI API key below. If you do not have one, leave the field blank. Features that require API calls will be marked with ⚠️.

In [None]:
def running_in_colab():
    return 'google.colab' in sys.modules

if running_in_colab():
    print("📦 Running in Colab...")

    # 1. Install system dependencies
    !sudo apt-get update -qq
    !sudo apt-get install -y swi-prolog

    # 2. Clone repo and change into it
    project_dir = Path("/content/GAMA")
    if not os.path.exists(project_dir):
        !git clone --depth 1 https://github.com/dicelab-rhul/GAMA.git /content/GAMA
    %cd /content/GAMA/tutorial

    # 3. Install Python dependencies (editable mode)
    !pip install -e .

    sys.path.append(os.path.abspath(".."))

    # 5. Ask user for OpenAI key
    import getpass
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Paste your OpenAI API key:")
else:
    print("💻 Running locally.")

    project_root = Path().resolve().parent
    sys.path.insert(0, str(project_root))

    if "OPENAI_API_KEY" not in os.environ:
        raise EnvironmentError("Please set your OPENAI_API_KEY in the environment.")

Load the required modules and configuration data from a config file.

In [None]:
import configparser
import itertools
import logging
from magif.agent.agent import Agent
from magif.environment.agent_pool import AgentPool
from magif.environment.environment import Environment
from magif.utils.data_object import DataObject
from magif.utils.utils import AgentStatus, Mode, read_file, normalize_path, generate_agent_name
from magif.utils.setup_logger import logger
from magif.utils.validator import Validator

In [None]:
logging.debug('Tutorial')
config = configparser.ConfigParser()
config.read(normalize_path("tutorial/CONFIG/tutorial_config.ini"))

## 2. 🤖 The Agent Model

An agent comprises an autoformalization module utilizing an LLM, a mind containing the formal representation of a game and a strategy, and a memory that stores the history of moves.

<p align="center">
  <img src="DATA/assets/agent_model.png" width="400">
</p>

Both the game and the strategy can be either defined using pre-existing rules or autoformalized from an interaction scenario.

### 🚀 Let's interact with our first agent!

We load a predefined PD agent:

In [None]:
agent_json = normalize_path(config.get("Paths", "AGENT_JSON")) # get the path to the agent from a json file
agent = Agent(agent_json=agent_json, autoformalization_on=False) # create an agent by providing a path to a json file

Let's inspect the agent by calling the `describe` method:

In [None]:
agent.describe()

We can print the agent’s formal game representation:

In [None]:
agent.print_game()

Print only the payoffs in the game:

In [None]:
agent.print_game(payoffs_only=True)

Print the agent's strategy representation:

In [None]:
agent.print_strategy()

---
An agent acts (performs an action), observes the opponent’s move, and updates its state. These steps are handled by the `mind`. Let’s have the agent act first:

In [None]:
agent.mind.act()

The agent selected `silent`, which is its default action, performed in the absence of any opponent actions in memory. Now it's our turn to act:

In [None]:
our_move = 'confess'
agent.mind.observe(our_move)

After observing the opponent's move, the agent updates its state using the `think` method:

In [None]:
agent.mind.think()

**Note:** In this example, the agent acted first, followed by our action. However, in the standard Prisoner's Dilemma (PD), both players are assumed to act simultaneously and independently. This is how agents take actions in the tournament implementation (see below).

Let's see what action our agent is going to take now:

In [None]:
agent.mind.act()

Following the tit-for-tat strategy, the agent responded by replicating our previous action and chose to confess. Let’s now stay silent and observe the resulting payoff:

In [None]:
agent.mind.observe('silent')
agent.mind.think()

Finally, let's release the solver thread and remove the agent.

In [None]:
agent.release_solver()
del agent

### 🚀 Let's autoformalize! (⚠️ this section requires an API key)

The core idea behind the framework is to streamline the development of multi-agent simulations by autoformalizing natural language interaction descriptions into formal representations that can be used by an agent's mind to reason about the interaction. Let's test this idea in practice. For this exercise, we will use a real-life scenario that can be modeled as a Prisoner's Dilemma (PD).

In [None]:
game_description = "Two employees are working on a joint project and must decide whether to share all their innovative ideas or keep some to themselves for credit. If both share openly, the project flourishes and they achieve great results, earning joint recognition. If one shares while the other withholds, the sharer contributes more but feels exploited, while the withholder benefits more and gains more recognition. If neither shares openly, the project suffers, and they both receive mediocre evaluations."
print(game_description)

---
First, we will load the prompt template we'll use to formalize this game:

In [None]:
prompt_template_path = normalize_path(config.get("Paths", "GAME_TEMPLATE_PATH"))
prompt_template = read_file(prompt_template_path)
print(prompt_template)

---
We use one-shot learning, utilizing the natural language description of the Prisoner's Dilemma (PD) and its Prolog representation as an example. Now, we will add our previously defined interaction scenario to the prompt: 

In [None]:
prompt = prompt_template.format(game_description=game_description)

We will also need a feedback prompt template in case there are syntactic errors in the generated code:

In [None]:
feedback_template_path = normalize_path(config.get("Paths", "FEEDBACK_TEMPLATE_PATH"))
feedback_prompt=read_file(feedback_template_path)
print(feedback_prompt)

---
Now we can create a `DataObject` that specifies the game representation should be autoformalized from a natural language description using the provided prompt.

In [None]:
game_data = DataObject(nl_description=game_description, instruction_prompt=prompt,
							   feedback_prompt=feedback_prompt, mode=Mode.AUTOFORMALIZATION)

The agent’s strategy will be loaded from a predefined file:

In [None]:
tit_for_tat_path = normalize_path(config.get("Paths", "STRATEGY_PATH"))
strategy_data = DataObject(rules_path=tit_for_tat_path, mode=Mode.RULES_PATH)

Now we create an agent by providing the game data (a natural language description to autoformalize), the strategy data (predefined rules for tit-for-tat), and the maximum number of attempts to generate syntactically correct code.

In [None]:
agent = Agent(game_data, strategy_data, max_attempts=5)

Let's check the agent's status:

In [None]:
agent.status

If it's syntactically correct, we can try interacting with it:

In [None]:
if agent.status == AgentStatus.CORRECT:
    moves = agent.game.game_moves
    
    agent.mind.act()
    agent.mind.observe(moves[0])
    agent.mind.think()

And inspect the payoff matrix to compare it with the original description:

In [None]:
if agent.status == AgentStatus.CORRECT:
    print(game_description, end="\n\n")
    agent.print_game(payoffs_only=True)

---
You may notice that manually comparing the interaction description to the generated payoff matrix is not the most effective way to evaluate the results of autoformalization. This issue is addressed in the next section.

## 3. 🏆 The Tournament

Upon creation, the agent is validated for syntactic correctness. This, however, does not guarantee runtime correctness—for example, parts of the code necessary for selecting an action may be missing. Semantic correctness—that is, whether the formal representation of the game corresponds to its natural language description—is also undetermined. The simulation allows for runtime validation, as well as semantic validation if the target payoffs for the given game are known:

<p align="center">
  <img src="DATA/assets/gama.png" width="600">
</p>

### 🚀 Let's play with autoformalized agents! (⚠️ this section requires an API key)

We will autoformalize and validate an interaction scenario corresponding to the Battle of the Sexes.

In [None]:
bos_description = "Two roommates are deciding whether to watch a comedy or an action movie. One prefers comedies, while the other loves action films. However, they both value watching a movie together more than watching alone. If they agree on a comedy, the comedy enthusiast gets 2 points, and the action lover gets 1 point. If they agree on an action movie, the action lover gets 2 points, and the comedy enthusiast gets 1 point. If they choose different genres, neither watches a movie, and they both score 0 points."
print(bos_description)

---
Let's set the parameters for the agent and the tournament.

In [None]:
# Parameters of the tournament
logdir = normalize_path(config.get("Paths", "OUT_DIR")) # output logs directory
num_rounds = 4 # number of tournament rounds
agent_num = 5 # number of agents that will try to autoformalize the scenario
target_payoff = 3 # target total payoff for an agent after 4 rounds of tit-for-tat vs anti-tit-for-tat

# Parameter of an agent
max_attempts = 5

Now we create an `AgentPool` object and populate it with agents that have autoformalized the interaction description.

In [None]:
agent_pool = AgentPool() # agent pool object to manage agents participating in the tournament

for i in range(agent_num):
    prompt = prompt_template.format(game_description=bos_description)
    
    game_data = DataObject(nl_description=bos_description, instruction_prompt=prompt,
								   feedback_prompt=feedback_prompt, mode=Mode.AUTOFORMALIZATION)
    strategy_data = DataObject(rules_path=tit_for_tat_path, mode=Mode.RULES_PATH)
    
    agent = Agent(game_data, strategy_data, max_attempts=5)
    agent_pool.add_agent(agent)    

To validate autoformalization, agents that generated syntactically valid code will play a tournament against their clones using the anti-tit-for-tat strategy. This setup ensures that all combinations of actions in the game are tested.

In [None]:
valid_agents_num = len(agent_pool.valid_agents) # determine the number of syntactically valid agents

# Add copies of an agent with tat-for-tit strategy
for i in range(valid_agents_num):
	agent = agent_pool.valid_agents[i]
	clone = agent.clone(agent_json)

	agent_pool.add_agent(clone)

Finally, we need to define a matchmaker function that determines the matching protocol for the tournament.

In [None]:
match_maker = lambda agents: [(agents[i], agents[i+valid_agents_num]) for i in range(valid_agents_num)] #each valid agent is paired with its clone

We create a `Tournament` instance to manage the tournament.

In [None]:
tournament = Environment(
    		agent_pool=agent_pool,
			num_rounds=num_rounds,
			match_maker=match_maker,
            target_payoffs=valid_agents_num*[target_payoff]
		)

We run the tournament and retrieve the winners.

In [None]:
tournament.play_tournament()

In [None]:
agent_pool.truncate_pool(valid_agents_num)
tournament.get_winners()

Now we log the results to validate the semantic correctness of the code.

In [None]:
tournament.log_tournament(experiment_dir=logdir, tournament_name="bs_test_tournament")

To validate semantic correctness, we use a custom `Validator` class.

In [None]:
agents_directory = logdir # directory with json files of agents created during the experiment
matrices_filepath = normalize_path("tutorial/DATA/matrices.json") # target payoff matrices
validators_dir = normalize_path("DATA/EVAL") # validation of payoff matrix structure if target payoffs are not available

validator = Validator(agents_directory, matrices_filepath, validators_dir)

Let's run the validation and see the results.

In [None]:
df = validator.validate_all()
df

At the end, we clean up the `AgentPool` object.

In [None]:
agent_pool.clean_agents()

### 🚀 Let's play strategies!

Once we have validated the semantic correctness of the generated game representation, we can compare which strategy is, on average, the most successful for the game. We will use a valid agent generated previously for the above Battle-of-the-Sexes-like scenario.

In [None]:
agent_json_path = normalize_path(config.get("Paths", "STRATEGY_AGENT_JSON"))

We will utilize the following strategies:

| **Strategy**           | **Description**                                                                 |
|------------------------|---------------------------------------------------------------------------------|
| *anti-default-move*    | Always select the move that is the opposite of the default move.                |
| *anti-tit-for-tat*     | Start with a default move. Then, select the move that is the opposite of the opponent's move in the previous round. |
| *best-response*        | Start with a default move. Then, select a move that would give you the highest payoff in response to the opponent's move in the previous round. |
| *default-move*         | Always select the default move.                                                 |
| *random*               | Select one of the possible moves with uniform probability.                      |
| *tit-for-tat*          | Start with a default move. Then, mirror the opponent's move in the previous round. |


In [None]:
strategies_path = normalize_path(config.get("Paths", "STRATEGIES_PATH"))

For each strategy, we add an agent copy with formalized game rules and that strategy.

In [None]:
strategies = [os.path.join(strategies_path, strat_name) for strat_name in os.listdir(strategies_path)]
agent_pool = AgentPool()
agent = Agent(agent_json=agent_json_path, autoformalization_on=False) # agent with formalized game rules

for strategy_path in strategies:
    clone = agent.clone(agent_json_path, strategy_path)
    agent_pool.add_agent(clone)

agent.release_solver()
del agent

We set the parameters of the tournament. This time, the tournament is round-robin, as we want to test each strategy against every other strategy, including itself.

In [None]:
num_rounds = 10
match_maker = lambda agents: list(itertools.combinations_with_replacement(agents, 2))

In this case, target payoffs are not specified.

In [None]:
tournament = Environment(
        	agent_pool=agent_pool,
			num_rounds=num_rounds,
			match_maker=match_maker
		)

In [None]:
tournament.play_tournament()

In [None]:
winners = tournament.get_winners()

In [None]:
print("Winners are:")
for winner in winners:
    print(f"Agent {winner.name} with strategy {winner.strategy_name} and payoff {winner.mind.get_total_payoff()}")

In [None]:
agent_pool.clean_agents()

# 👥 4. Case Studies (⚠️ this section requires an API key)

We will create a function using the code from the examples above to autoformalize case studies.

In [None]:
def case_study(game_description, num_rounds=4):
    # Get prompt templates
    prompt_template_path = normalize_path(config.get("Paths", "GAME_TEMPLATE_PATH"))
    prompt_template = read_file(prompt_template_path)
    prompt = prompt_template.format(game_description=game_description)
    feedback_template_path = normalize_path(config.get("Paths", "FEEDBACK_TEMPLATE_PATH"))
    feedback_prompt=read_file(feedback_template_path)
    agent_json_path = normalize_path(config.get("Paths", "STRATEGY_AGENT_JSON"))

    # Game and strategy data
    game_data = DataObject(nl_description=game_description, instruction_prompt=prompt,
							   feedback_prompt=feedback_prompt, mode=Mode.AUTOFORMALIZATION)

    tit_for_tat_path = normalize_path(config.get("Paths", "STRATEGY_PATH"))
    strategy_data = DataObject(rules_path=tit_for_tat_path, mode=Mode.RULES_PATH)

    # Create agent pool
    agent_pool = AgentPool()
    
    # Create an agent
    agent = Agent(game_data, strategy_data, max_attempts=5)

    if agent.status != AgentStatus.CORRECT:
        print("No syntactically correct agent created, try again!")

    else:
        agent_pool.add_agent(agent) # add an agent to the pool

        # Create and add clone:
        agents_num = len(agent_pool.valid_agents)
        clone = agent.clone(agent_json_path)
        agent_pool.add_agent(clone)

        match_maker = lambda agents: [(agents[i], agents[i+agents_num]) for i in range(agents_num)]

        tournament = Environment(
			agent_pool=agent_pool,
			num_rounds=num_rounds,
			match_maker=match_maker
		)

        tournament.play_tournament()
        
        if agent.status != AgentStatus.CORRECT:
            print(f"Agent status is {agent.status}")

        else:
            agent.describe()
            agent.print_game(payoffs_only=True)

        agent_pool.clean_agents()

### Case study: The Cuban Missile Crisis

<p align="center">
  <img src="https://upload.wikimedia.org/wikipedia/commons/9/96/Soviet_b-59_submarine.jpg" width="400">
</p>

[Source](https://en.wikipedia.org/wiki/Cuban_Missile_Crisis)

In [None]:
nuclear_crisis = "The Cuban Missile Crisis (...) was a 13-day confrontation between the governments of the United States and the Soviet Union, when American deployments of nuclear missiles in Italy and Turkey were matched by Soviet deployments of nuclear missiles in Cuba. The crisis lasted from 16 to 28 October 1962. The confrontation is widely considered the closest the Cold War came to escalating into full-scale nuclear war."
nuclear_crisis

In [None]:
case_study(nuclear_crisis)

### Case study: Autonomous cars

<p align="center">
  <img src="DATA/assets/cars.png" width="400">
</p>

In [None]:
cars = "Two autonomous cars, Car A and Car B, arrive at a four-way intersection at the same time. There is no traffic light or clear priority rule (e.g., right-of-way is ambiguous or not perfectly synchronized). Each car must decide between two strategies: proceed through the intersection without yielding or yield and let the other car go first. If both cars proceed simultaneously, a collision will occur; if both yield, a delay results."
cars

In [None]:
case_study(cars)