[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1n3uLs7P5wg0yLsgaa0ipfudQ_QwknT5v?usp=sharing)

# About the tutorial
Datasets are essential in both supervised and unsupervised machine learning settings. In a typical reinforcement learning (RL) setting, the agent must interact with the environment in order to collect data for learning. Thus, environments serve a kind of similar function in RL as datasets do in supervised and unsupervised learning. In this tutorial, we will explain how to use RLHive environments. Note that this tutorial is on single-agent environments.

# Introduction and Setup

### RLHive Installation

For installation, you can check [this notebook](https://colab.research.google.com/drive/11YirxgoVD7gjN02TdAeyFXOL1qH7Eydv?usp=sharing).

### How to install environments

RLHive currently supports the following environments:



*   Gym classic control
*   Atari
* Minigrid (single-agent grid world)
* Marlgrid (multi-agent)
* Pettingzoo (multi-agent)

To install Gym, you could simply run `pip install gym==0.26.0`. You can also install dependencies necessary for the environments that RLHive comes with by running `pip install rlhive[<env_names>]` where `<env_names>` is a comma separated list made up of `atari`, `gym_minigrid`, and `pettingzoo`.

Marlgrid are also supported, but must be installed separately. Moreover, MinAtar could be reached directly via Gym.

* To install Marlgrid, run `pip install marlgrid@https://github.com/kandouss/marlgrid/archive/refs/heads/master.zip`

In [None]:
!pip install ruamel.yaml
!pip install pyglet
!pip install git+https://github.com/chandar-lab/RLHive.git@dev
!pip install gymnasium
!pip install RLHive['gym_minigrid']

In [None]:
import torch
import hive
from hive.utils.registry import registry
from hive.envs.base import BaseEnv
from hive.envs.gym_env import GymEnv
from hive.envs.env_spec import EnvSpec
from ruamel import yaml
import sys
import os
import os.path
import numpy as np
%matplotlib inline

In [3]:
import gymnasium as gym
import gym_minigrid
from gym_minigrid.wrappers import ReseedWrapper
from gym.spaces.discrete import Discrete

# Creating environments

Every environment used in RLHive should be a subclass of `hive.envs.base.BaseEnv`. It should provide a `reset` function that resets the environment to a new episode and returns a tuple of `(observation, turn)` and a `step` function that takes in an action, performs the step in the environment, and returns a tuple of `(observation, reward, terminated, truncated, turn, info)`. The `terminated` is `True` if environment terminates, like task completion. The `truncated` is `True` if episode truncates due to a time limit or a reason that is not defined as part of the task MDP. Note that the `info` is a dictionary containing auxiliary diagnostic information for debugging, learning, and logging. For instance, it could contain individual reward terms that are combined to produce the total reward. The `turn` corresponds to the index of the agent whose turn it is (in multi-agent environments).

The `reward` return value can be a single number, an array, or a dictionary. If it’s a number, then that same reward will be given to every single agent. If it’s an array, the agents get the reward corresponding to their index in the runner. If it’s a dictionary, the keys should be the agent ids, and the value the reward for that agent.


### `GymEnv`

The [OpenAI gym](https://www.gymlibrary.dev/), which provides a flexible manner of designing environments, initializing them, and interacting with them, has become well-known between RL researchers.

If your environment is a gym environment, and you do not need to preprocess the observations generated by the environment, then you can directly use the `hive.envs.gym_env.GymEnv`.

In [None]:
env = GymEnv("CartPole-v0")

### `EnvSpec`

Each environment should also provide an `EnvSpec` environment that indicates what space is for the observations, action. These should be lists with one element for each agent. The agent uses this information to create its network according to provided format of valid actions and observations.

In [5]:
env_spec = env.env_spec
obs_spec, act_spec = env_spec.observation_space[0], env_spec.action_space[0]
print("Environment name : \n", env_spec.env_name)
print("Environment observation space: \n", obs_spec)
print("Environment action space: \n", act_spec)
print("Environment info: \n", env_spec.env_info)

Environment name : 
 CartPole-v0
Environment observation space: 
 Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)
Environment action space: 
 Discrete(2)
Environment info: 
 {}


### Environment basic methods

To work with any environment, we `reset` the environment to a new initial state, and then use `step` to perform the specified action and return updated information collected from the environment. Moreover, since for image-based environments rendering is important, you can use use `render` function.Finally, when we're done with the environment, we can `close` it.

In [6]:
obs, turn = env.reset()
print("Environment initial observation : \n", obs)
print("Environment initial turn: \n", turn)

Environment initial observation : 
 [ 0.03139541  0.0282384  -0.02267022  0.02850169]
Environment initial turn: 
 0


The `turn` indicats the agent ID, which is 0 in the case of a single agent setting.

In [7]:
num_steps = 100

for t in range(num_steps):
    obs, reward, terminated, truncated, turn, info = env.step(act_spec.sample()) # Random policy
    if terminated or truncated:
        break

env.close()

### Custom environment

You can also create your own custom environment using `GymEnv`. If you need to add extra preprocessing or change the default way that environment/`EnvSpec` creation is done, you can simply subclass this class and override either `create_env()` and/or `create_env_spec()`.


In [8]:
class MiniGridEnv(GymEnv):
    def __init__(self, env_name, num_players=1, seed=42, **kwargs):
        super().__init__(env_name, num_players, seed=seed, **kwargs)

    def create_env(self, env_name, seed, **kwargs):
        self._env = gym.make(env_name, **kwargs)
        self._env = ReseedWrapper(self._env, seeds=[seed])

    def create_env_spec(self, env_name, **kwargs):
        env_spec = super().create_env_spec(env_name, **kwargs)
        return env_spec

    def step(self, action):
        return super().step(action)

We can also create an environment from scratch inherting `hive.envs.base.BaseEnv`. For instance, in the following cell we have `GridEnv`; it is a 1$\times$7 grid, indexed from -3 to 3 from left to right. The agent always starts in cell number 0, and at each step, it can choose to walk right (if possible), left (if possible), or stay in the current cell. The agent would be rewarded only when it is in cell 1.

In [9]:
class GridEnv(BaseEnv):
    def __init__(self, env_name = 'GridEnv', max_steps = 20, **kwargs):
        self._num_grid = 7
        self._observation = 0
        self._num_steps = 0
        self._max_steps = max_steps

        super().__init__(self.create_env_spec(env_name, **kwargs), 1)

    def create_env_spec(self, env_name, **kwargs):
        observation_spaces = [Discrete(self._num_grid, start = self._num_grid // 2)]
        action_spaces = [Discrete(3, start = -1)]
        return EnvSpec(
            env_name=env_name,
            observation_space=observation_spaces,
            action_space=action_spaces,
        )

    def reset(self):
        self._observation = self._num_steps = 0
        return self._observation, self._turn

    def step(self, action):
        self._num_steps += 1

        if action == 1:
            self._observation = min(self._num_grid // 2, self._observation+1)
        elif action == -1:
            self._observation = max(-self._num_grid // 2, self._observation-1)
        
        if self._observation == 1:
            reward = 1
        else:
            reward = 0

        truncated = self._num_steps == self._max_steps
        info = {}

        return self._observation, reward, False, truncated, self._turn, info

    def render(self):
        pass
    def close(self):
        pass
    def save(self):
        pass
    def seed(self):
        pass
    

In [10]:
env = GridEnv()
env_spec = env.env_spec
obs_spec, act_spec = env_spec.observation_space[0], env_spec.action_space[0]
print("Environment name : \n", env_spec.env_name)
print("Environment observation space: \n", obs_spec)
print("Environment action space: \n", act_spec)
print("Environment info: \n", env_spec.env_info)

Environment name : 
 GridEnv
Environment observation space: 
 Discrete(7, start=3)
Environment action space: 
 Discrete(3, start=-1)
Environment info: 
 {}


In [12]:
terminated = truncated = False
env.reset()

while not terminated and not truncated:
    obs, reward, terminated, truncated, turn, info = env.step(act_spec.sample())
    print("Cell {}, Reward {}".format(obs, reward))

env.close()

Cell 1, Reward 1
Cell 2, Reward 0
Cell 2, Reward 0
Cell 2, Reward 0
Cell 1, Reward 1
Cell 2, Reward 0
Cell 2, Reward 0
Cell 2, Reward 0
Cell 1, Reward 1
Cell 0, Reward 0
Cell 1, Reward 1
Cell 0, Reward 0
Cell 0, Reward 0
Cell -1, Reward 0
Cell -2, Reward 0
Cell -1, Reward 0
Cell -1, Reward 0
Cell 0, Reward 0
Cell 1, Reward 1
Cell 0, Reward 0


#### Registering environments
The registry module `hive.utils.registry` is used to register classes in the RLHive Registry. Consider registering `GridEnv` we created before:

In [13]:
registry.register(name = 'GridEnv', constructor = GridEnv, type = GridEnv)

Also, when you're using the gym-based environments (e.g. `MiniGridEnv`), you can simply use `gym.register`:



In [14]:
gym.register(id = 'MyMiniGrid', entry_point = MiniGridEnv)

More than one environment can be registered at once using the `register_all` method. Consider registering two environments `Env1` and `Env2` (inheriting `BaseEnv`):

In [15]:
class Env1(BaseEnv):
    def __init__(self, env_name = 'Env1', **kwargs):
        pass
    def reset(self):
        pass
    def step(self):
        pass
    def render(self):
        pass
    def close(self):
        pass
    def save(self):
        pass

class Env2(BaseEnv):
    def __init__(self, env_name = 'Env2', **kwargs):
        pass
    def reset(self):
        pass
    def step(self):
        pass
    def render(self):
        pass
    def close(self):
        pass
    def save(self):
        pass

In [16]:
registry.register_all(
    BaseEnv,
    {
        "Env1": Env2,
        "Env1": Env2,
    },
)