# Tutorial 2: Cogment Lab Usage

In this tutorial, we introduce basic usage of Cogment Lab. It assumes that you already installed Cogment Lab, and launched the background services via `cogmentlab launch base`. If not, please refer to the first tutorial.

## Some background

Cogment is based on a microservice architecture. This means that each environment and agent runs as a separate service (typically, but not strictly necessarily, in a separate process). These services communicate with each other via gRPC, through the intermediate layer of the Cogment Orchestrator.

A basic unit of interaction in Cogment (and by extension, in Cogment Lab) is a *trial*, which contains a single RL episode -- from reset to termination, involving a single environment and one or more agents. While this is a bit restrictive, the upside is that the agents aren't restricted to being RL agents, but can in fact be real humans. This is the main use-case for Cogment Lab: to run RL experiments with human participants. 

First, let's import some useful components.

In [1]:
from cogment_lab import Cogment
from cogment_lab.envs import GymEnvironment
from cogment_lab.actors import RandomActor, ConstantActor

import gymnasium as gym

The central piece of Cogment Lab is the `Cogment` class. It is the main entry point to interact with Cogment itself, and is used to launch environments, actors, and environments.

In [2]:
cog = Cogment(log_dir="logs/tutorial2")

Let's launch an environment. You can use any Gymnasium or PettingZoo environments. 
In this tutorial, we focus on Gymnasium, and use the `CartPole-v1` environment.

## Environment

We create an environment locally by instantiating a subclass of `CogmentEnv`, in this case - GymEnvironment.

In [3]:
cenv = GymEnvironment(
  env_id="CartPole-v1",  # Environment ID, as registered in Gymnasium
  render=True,  # True if we want to ever render the environment; requires pygame
)

Given a local environment, we can run it in a subprocess by calling `cog.run_env()`. Currently, all environments have to run in a separate process due to how Cogment works. This is likely to become optional in the future. In any case, the environment has to run as an async coroutine or task. Fortunately, Jupyter notebooks support this out of the box.

In [4]:
await cog.run_env(cenv, 
  env_name="cartpole",  # Unique name for the environment 
  port=9011,  # Port through which the env communicates with Cogment; has to be free and unique
  log_file="env.log"  # File to which the environment logs are written
)

True

Now we have two copies of the environment in existence. One is the local instance, which we usually don't need to use, but it can be useful e.g. to extract the observation and action spaces:

In [5]:
assert isinstance(cenv.env, gym.Env)
obs_space = cenv.env.observation_space
act_space = cenv.env.action_space

print(f"Observation space: {obs_space}")
print(f"Action space: {act_space}")

Observation space: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
Action space: Discrete(2)


The second copy is the one that runs in the separate process. Once we launched it, we cannot interact with it directly. Instead, it will be automatically handled by Cogment when we start running experiments.

## Actor

The second thing we need to run an experiment is an actor. Let's start with a simple random actor -- it will just sample actions from the action space.

In [6]:
actor = RandomActor(action_space=act_space)

Just like the environment, we'll need to run the actor in a subprocess:

In [7]:
await cog.run_actor(actor, 
  actor_name="random",  # Unique name for the actor
  port=9021,  # Port through which the actor communicates with Cogment; has to be free and unique
  log_file="actor.log"  # File to which the actor logs are written
)

True

We can create and use multiple actors -- let's create a second one, which will always output the same action.

In [8]:
actor2 = ConstantActor(action=0)

await cog.run_actor(actor2, 
  actor_name="constant",  # Unique name for the actor
  port=9022,  # Port through which the actor communicates with Cogment; has to be free and unique
  log_file="actor2.log"  # File to which the actor logs are written
)

True

## Experiment

Now that we have an environment and an actor, we can run an experiment. We do this by calling `cog.run_trial()`. This will ask Cogment to run a single episode with a given environment and an actor. This is done asynchronously, so in more complex scenarios, we can launch an experiment, do something else, and then fetch the data. Here, we'll just get the data immediately.

In [9]:
trial_id = await cog.start_trial(
    env_name="cartpole",  # Name of the environment to use
    actor_impls={
        "gym": "random",  # Name of the actor to use. For Gymnasium environments, the key is always "gym"
    },
)
data = await cog.get_trial_data(trial_id)

The data is a dictionary indexed by the actor name (here, "gym"). Each entry is a TrialData object, which contains the typical information we need from an RL experiments:

In [10]:
print(f"Observation shape: {data['gym'].observations.shape}")
print(f"Action shape: {data['gym'].actions.shape}")
print(f"Reward shape: {data['gym'].rewards.shape}")
print(f"Done shape: {data['gym'].done.shape}")
print(f"Next observation shape: {data['gym'].next_observations.shape}")

Observation shape: (38, 4)
Action shape: (38,)
Reward shape: (38,)
Done shape: (38,)
Next observation shape: (38, 4)


Note that you can customize the fields in `data['gym']` by passing the `fields` argument to `cog.get_trial_data()`.

Since we used a random actor, the actions should be random as well:

In [11]:
print(data["gym"].actions)

[0 1 0 0 0 1 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 0 1 0 0 1 1 0 0 1 1 1 1 0 0 1
 0]


We can also use the other agents we defined earlier. Let's run another trial, this time with the constant actor:

In [12]:
trial_id = await cog.start_trial(
    env_name="cartpole",
    actor_impls={
        "gym": "constant",
    },
)
data = await cog.get_trial_data(trial_id)

As expected, the actions are always the same:

In [13]:
print(data["gym"].actions)

[0 0 0 0 0 0 0 0 0 0 0]


## Human experiments

Everything we've done so far is pretty basic -- we could have done the same thing with a simple Python script. The real power of Cogment Lab comes from the ability to run experiments with human participants. Let's see how this works.

In [14]:
await cog.run_web_ui(
  app_port=8000,  # Port through which the web UI is accessible
  cogment_port=8999,  # Port through which the web UI communicates with Cogment; has to be free and unique
  actions=[
    "no-op",  # If nothing is pressed, the action is 0, i.e. the index of "no-op" 
    "ArrowRight"  # If the right arrow is pressed, the action is 1, i.e. the index of "ArrowRight"
  ],
  log_file="human.log"
)

True

Let's launch a trial using the web UI as one of the actors:

In [15]:
trial_id = await cog.start_trial(
    env_name="cartpole",
    session_config={"render": True},  # Tell cogment that we want to use the renders of the environment
    actor_impls={
        "gym": "web_ui",
    },
)

data = await cog.get_trial_data(trial_id)

You may see that the cell above is still running. This is because Cogment is waiting for the human - you!

Open your browser, and go to `http://localhost:8000`. Click the Start button, and then press (and depress) your right arrow to try to balance the cartpole.

After it inevitably falls (or times out, if you're good), go back here and check your result below:

In [16]:
print(f"Your reward is {data['gym'].rewards.sum()}")

Your reward is 13.0


## Cleanup

Finally, let's clean up the resources we used. This is particularly important if you launched the web UI, as it can be pretty moody when it comes to closing automatically.

In [17]:
await cog.cleanup()