### Note

Make sure cmake is installed on your machine before installing "multi_agent_ale_py"

This notebook MIGHT not work on Windows machines(since multi_agent_ale_py is not officially supported for windows) but you can still give it a go

In [1]:
from IPython.display import clear_output, Video

In [2]:
%pip install stable-baselines3[extra] pettingzoo
%pip install multi_agent_ale_py
%pip install supersuit

clear_output()

# Contents

In this notebook, you will a train multi-agent RL policy using Stable-Baselines-3.

Make sure to include a visualizaiton to show the performance of your trained policy

In multi-agent RL, we have an environment where multiple agents interact with each other(cooperative or competitive) and we train model(s)/policy to predict the actions of each of the agents.

## Petting Zoo

In this notebook, you will use the PettingZoo library. It's an open-source framework for multi-agent reinforcement learning(MARL), inspired by gym and follow the same process of standardized interface for the various environments it offers. You can look at the documentation of petting zoo as well as the environments it offers [here](https://pettingzoo.farama.org/index.html)

## [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/)

The environment you'll use is Knights Archers Zombies(KAZ) environment. This environment consists of a 2D game. Here's how it works:

Zombies walk from the top border of the screen down to the bottom border in unpredictable paths. The agents you control are knights and archers (default 2 knights and 2 archers) that are initially positioned at the bottom border of the screen. Each agent can rotate clockwise or counter-clockwise and move forward or backward. Each agent can also attack to kill zombies. When a knight attacks, it swings a mace in an arc in front of its current heading direction. When an archer attacks, it fires an arrow in a straight line in the direction of the archer’s heading. The game ends when all agents die (collide with a zombie) or a zombie reaches the bottom screen border. A knight is rewarded 1 point when its mace hits and kills a zombie. An archer is rewarded 1 point when one of their arrows hits and kills a zombie. There are two possible observation types for this environment, vectorized and image-based.

Hint: Use the vectorized observations because it's faster to train the model that way. It's recommended to use the PPO algorithm but you can use anything else if you want.

In [3]:
from __future__ import annotations

import glob
import os
import time

import supersuit as ss

from pettingzoo.butterfly import knights_archers_zombies_v10

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from IPython.display import HTML

# Add SB3 imports as needed

## Creating the environment

In [4]:
env_fn = knights_archers_zombies_v10
env_kwargs = dict(max_cycles=256, max_zombies=4, vector_state=True)

env = env_fn.parallel_env(**env_kwargs)  # parallel env so we can create a vector env and parallelize the training.

# Hint:
# Use supersuit to convert the env into vectorized env(requires black_death wrapper) for param sharing
# on multi-agent environment

# Then convert the env into SB3 vectorized environment format


  self.pid = os.fork()


'done'

## Training the policy

declare and train your policy here

## Visualization

You're provided with helper functions and code to create a visualization.

But feel free to write your own code if you want

In [8]:
def frames_to_video(frames, fps=24):

    fig = plt.figure(figsize=(frames[0].shape[1] / 100, frames[0].shape[0] / 100), dpi=100)
    ax = plt.axes()
    ax.set_axis_off()

    if len(frames[0].shape) == 2:  # Grayscale image
        im = ax.imshow(frames[0], cmap='gray')
    else:  # Color image
        im = ax.imshow(frames[0])

    def init():
        if len(frames[0].shape) == 2:
            im.set_data(frames[0], cmap='gray')
        else:
            im.set_data(frames[0])
        return im,

    def update(frame):
        if len(frames[frame].shape) == 2:
            im.set_data(frames[frame], cmap='gray')
        else:
            im.set_data(frames[frame])
        return im,

    interval = 1000 / fps
    anim = FuncAnimation(fig, update, frames=len(frames), init_func=init, blit=True, interval=interval)
    plt.close()
    return HTML(anim.to_html5_video())

In [9]:
t_env = env_fn.env(render_mode="rgb_array", **env_kwargs)

In [10]:
t_env.reset()
frames = []
running = True

while running:
    for agent in t_env.agent_iter():
        obs, reward, termination, truncation, info = t_env.last()

        if termination or truncation:
            running  = False
            break
        else:

            # Change and use your own policy to generate the action
            act = t_env.action_spaces[agent].sample()

        t_env.step(act)
        frames.append(t_env.render())

In [None]:
frames_to_video(frames, 60)

In [None]:
t_env.close()