# SuperSuit Test
This notebook exists to test the functionality and capabilities of Super Suit, a miniwrapper developed by Farama Foundation for reinforcement learning. More information can be found here: ```https://github.com/Farama-Foundation/SuperSuit```

The motivation for using this library was this article from Towards Data Science: ```https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b```

In the article, this package was used in-conjunction with stable_baselines3 to train an agent in a multiagent style, where thepolicy outputs the action for a single agent, but the policy is shared.

In order to first test if this library will function for our intended purpose, I must check how PettingZoo (the multiagent RL library this package was developed for) gives obervations and actions. PettingZoo was developed for a new generation of reinforcement learning environments for testing new algorithms. More information about PettingZoo can be found here: ```https://www.pettingzoo.ml/#```

In [19]:
from pettingzoo.butterfly import pistonball_v4
import numpy as np
import supersuit as ss

env = pistonball_v4.parallel_env()

In [2]:
env.reset()
print('Number of agents in environment: ',env.num_agents)
print()
print('All agents:', env.agents)
print()
print('Action spaces:', env.action_spaces)

# Collecting all observation spaces
obs_spaces = []
for agent in env.agents:
    obs_spaces.append(env.observation_space(agent))

Number of agents in environment:  20

All agents: ['piston_0', 'piston_1', 'piston_2', 'piston_3', 'piston_4', 'piston_5', 'piston_6', 'piston_7', 'piston_8', 'piston_9', 'piston_10', 'piston_11', 'piston_12', 'piston_13', 'piston_14', 'piston_15', 'piston_16', 'piston_17', 'piston_18', 'piston_19']

Action spaces: {'piston_0': Box([-1.], [1.], (1,), float32), 'piston_1': Box([-1.], [1.], (1,), float32), 'piston_2': Box([-1.], [1.], (1,), float32), 'piston_3': Box([-1.], [1.], (1,), float32), 'piston_4': Box([-1.], [1.], (1,), float32), 'piston_5': Box([-1.], [1.], (1,), float32), 'piston_6': Box([-1.], [1.], (1,), float32), 'piston_7': Box([-1.], [1.], (1,), float32), 'piston_8': Box([-1.], [1.], (1,), float32), 'piston_9': Box([-1.], [1.], (1,), float32), 'piston_10': Box([-1.], [1.], (1,), float32), 'piston_11': Box([-1.], [1.], (1,), float32), 'piston_12': Box([-1.], [1.], (1,), float32), 'piston_13': Box([-1.], [1.], (1,), float32), 'piston_14': Box([-1.], [1.], (1,), float32), 'p



In [3]:
obs_spaces[2]

Box([[[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 ...

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]
  [0 0 0]
  ...
  [0 0 0]
  [0 0 0]
  [0 0 0]]], [[[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 ...

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
  [255 255 255]
  [255 255 255]
  [255 255 255]]

 [[255 255 255]
  [255 255 255]
  [255 255 255]
  ...
 

Eureka! So Super Suit WILL work for our environment! As stated by the documentation when using the multiagent functionality of SuperSuit:

``...(takes) the following assumptions: no agent death or generation, homogenous action and observation spaces. Returns a gym vector environment where each 'environment' in the vector represents on agent.''

Let's see an example of how this works:


In [None]:
# Code taken from:
# https://github.com/Farama-Foundation/SuperSuit
from stable_baselines3 import PPO

if __name__=='__main__':
    env = pistonball_v4.parallel_env(n_pistons=20, time_penalty=-0.1, continuous=True, random_drop=True, random_rotate=True, ball_mass=0.75, ball_friction=0.3, ball_elasticity=1.5, max_cycles=125)
    env = ss.color_reduction_v0(env, mode='B')
    env = ss.resize_v0(env, x_size=84, y_size=84)
    env = ss.frame_stack_v1(env, 3) # This will also come in very handy soon for our environment!
    env = ss.pettingzoo_env_to_vec_env_v1(env)
    env = ss.concat_vec_envs_v1(env, 8, num_cpus=1, base_class='stable_baselines3')
    model = PPO('CnnPolicy', env, verbose=3, gamma=0.95, n_steps=256, ent_coef=0.0905168, learning_rate=0.00062211, vf_coef=0.042202, max_grad_norm=0.9, gae_lambda=0.99, n_epochs=5, clip_range=0.3, batch_size=256)    
    model.learn(total_timesteps=2_000)

I am having issues with getting my custom environment setup for training using the methods provided by SuperSuit.

The below cell exists to allow a user to understand the affects of a SuperSuit modifying wrapper on the observation_space, action_space, and other attributes of the environment after they have been wrapped.

Play with the cell below by commenting the lines you do not wish to execute, thus investigating the affects of only the cells of interest.

In [51]:
env = pistonball_v4.parallel_env(n_pistons=20, time_penalty=-0.1, continuous=True, random_drop=True, random_rotate=True, ball_mass=0.75, ball_friction=0.3, ball_elasticity=1.5, max_cycles=125)
env = ss.color_reduction_v0(env, mode='B')
env = ss.resize_v0(env, x_size=84, y_size=84)
# env = ss.frame_stack_v1(env, 3) # This will also come in very handy soon for our environment!
env = ss.pettingzoo_env_to_vec_env_v1(env)
# env = ss.concat_vec_envs_v1(env, 1, num_cpus=1, base_class='stable_baselines3')

In [52]:
obs = env.reset()
env.observation_space.shape

(84, 84)

In [13]:
env = pistonball_v4.parallel_env()
obs = env.reset()
obs

{'piston_0': array([[[ 58,  64,  65],
         [ 58,  64,  65],
         [ 58,  64,  65],
         ...,
         [255, 255, 255],
         [255, 255, 255],
         [255, 255, 255]],
 
        [[ 58,  64,  65],
         [ 58,  64,  65],
         [ 58,  64,  65],
         ...,
         [255, 255, 255],
         [255, 255, 255],
         [255, 255, 255]],
 
        [[ 58,  64,  65],
         [ 58,  64,  65],
         [ 58,  64,  65],
         ...,
         [255, 255, 255],
         [255, 255, 255],
         [255, 255, 255]],
 
        ...,
 
        [[ 58,  64,  65],
         [ 58,  64,  65],
         [ 58,  64,  65],
         ...,
         [255, 255, 255],
         [255, 255, 255],
         [255, 255, 255]],
 
        [[ 58,  64,  65],
         [ 58,  64,  65],
         [ 58,  64,  65],
         ...,
         [255, 255, 255],
         [255, 255, 255],
         [255, 255, 255]],
 
        [[ 58,  64,  65],
         [ 58,  64,  65],
         [ 58,  64,  65],
         ...,
         [255, 2