# Tutorial 4: Environment conversions

Cogment Lab supports a few environment conversions out of the box. Those are mainly meant to enable human-AI interactions in situations that are not inherently multiagent.

## Observer conversion

`cogment_lab.envs.conversions.observer`

This conversion comes in two varieties: `GymObserverAEC` and `GymObserverParallel`. They use the AEC and Parallel PettingZoo formalisms respectively. 

The difference is that AEC API goes step by step, making it a bit slower, but it makes it possible to include the RL agent's action in the observer's observation. On the other hand, the Parallel API processes the RL agent's action and the observer's observation in parallel, making it faster, but it does not include the RL agent's action in the observer's observation.

In this environment conversion, we take a Gymnasium environment, and turn it into a PettingZoo environment with an extra agent, the observer. The observer does not take any actions, but receives the main agent's observations, and optionally the main agent's actions (in the AEC API).

The typical way to use it is with a human web_ui agent as the observer, to watch an agent's performance.

In [1]:
from cogment_lab.actors import RandomActor
from cogment_lab.envs.pettingzoo import ParallelEnvironment
from cogment_lab.process_manager import Cogment
from cogment_lab.utils.runners import process_cleanup


In [2]:
process_cleanup()

cog = Cogment(log_dir="logs/tutorial4")

Processes terminated successfully.


GymObserverParallel is just a class implementing CogmentEnv defined inside `cogment_lab`, so we can create it like any other parallel environment.

In [3]:
cenv = ParallelEnvironment(
  env_path="cogment_lab.envs.conversions.observer.GymObserverParallel",
  make_kwargs={"gym_env_name": "MountainCar-v0"},
  render=True
)

await cog.run_env(cenv, "mcar-observer", 9011, log_file="env.log")

True

In [4]:
actor = RandomActor(cenv.env.action_space("gym"))

await cog.run_actor(actor, "random", 9021, log_file="random.log")

True

In [5]:
MOUNTAIN_CAR_ACTIONS = ["no-op", "ArrowLeft", "ArrowRight"]  # Ignored in the environment

await cog.run_web_ui(actions=MOUNTAIN_CAR_ACTIONS, log_file="human.log", fps=60)

True

In [6]:
trial_id = await cog.start_trial(
    env_name="mcar-observer",
    session_config={"render": True},
    actor_impls={
        "gym": "random",
        "observer": "web_ui",
    },
)

data = await cog.get_trial_data(trial_id)

In [8]:
# Make sure to stop the web UI before moving on. You will need to open a new tab in your browser for the next experiment.
cog.stop_service("web_ui")  

## Teacher conversion

`cogment_lab.envs.conversions.teacher`

Another thing we can do is introduce a teacher agent. Just like the observer, the teacher receives the main agent's observations. Unlike the observer, the teacher can also take actions, overriding those of the main agent. 

The teacher's action space is always a dictionary of the form `{"active": Discrete(2), "action": <original action space>}`. The `active` field indicates whether the teacher's action should be used instead of the main agent's action. The `action` field is the action to use if `active` is `1`.

In [9]:
cenv = ParallelEnvironment(
  env_path="cogment_lab.envs.conversions.teacher.GymTeacherParallel",
  make_kwargs={"gym_env_name": "MountainCar-v0"},
  render=True
)

await cog.run_env(cenv, "mcar-teacher", 9012, log_file="env.log")

True

The action map for the teacher is more complex - we need to specify whether the teacher is actively overriding the actions, and if so, what action to take. Remember that `no-op` corresponds to no buttons being pressed.

In [10]:
MOUNTAIN_CAR_ACTIONS = {
  "no-op": {"active": 0, "action": 0},
  "ArrowDown": {"active": 1, "action": 0},   # Stop
  "ArrowLeft": {"active": 1, "action": 1},   # Left
  "ArrowRight": {"active": 1, "action": 2},  # Right
}

await cog.run_web_ui(actions=MOUNTAIN_CAR_ACTIONS, log_file="human.log", fps=60)

True

In [11]:
trial_id = await cog.start_trial(
    env_name="mcar-teacher",
    session_config={"render": True},
    actor_impls={
        "gym": "random",
        "teacher": "web_ui",
    },
)

data = await cog.get_trial_data(trial_id)


In [12]:
data["teacher"].actions

{'action': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        0]),
 'active': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
     

In [13]:
await cog.cleanup()