# Tutorial 3: Processors & Spaces

In this tutorial, we will discuss [spaces](https://eagerx.readthedocs.io/en/master/guide/api_reference/utilities/space.html) and [processors](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html).

The following will be covered:
- How to specify a [Space](https://eagerx.readthedocs.io/en/master/guide/api_reference/utilities/space.html).
- Creating a custom [Processor](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html).
- How to add a [Processor](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html).

In the remainder of this tutorial we will go more into detail on these concepts.

Furthermore, at the end of this notebook you will find exercises.
For the exercises you will have to add/modify a couple of lines of code, which are marked by

```python

# START EXERCISE [BLOCK_NUMBER]

# END EXERCISE [BLOCK_NUMBER]
```

## Pendulum Swing-up

We will create an environment for solving the classic control problem of swinging up an underactuated pendulum, very similar to the [Pendulum-v1 environment](https://www.gymlibrary.ml/environments/classic_control/pendulum/).
Our goal is to swing up this pendulum to the upright position and keep it there, while minimizing the velocity of the pendulum and the input voltage.

Since the dynamics of a pendulum actuated by a DC motor are well known, we can simulate the pendulum by integrating the corresponding ordinary differential equations (ODEs):


$\mathbf{x} = \begin{bmatrix} \theta \\ \dot{\theta} \end{bmatrix} \\ \dot{\mathbf{x}} = \begin{bmatrix} \dot{\theta} \\ \frac{1}{J}(\frac{K}{R}u - mgl \sin{\theta} - b \dot{\theta} - \frac{K^2}{R}\dot{\theta})\end{bmatrix}$

with $\theta$ the angle w.r.t. upright position, $\dot{\theta}$ the angular velocity, $u$ the input voltage, $J$ the inertia, $m$ the mass, $g$ the gravitational constant, $l$ the length of the pendulum, $b$ the motor viscous friction constant, $K$ the motor constant and $R$ the electric resistance.


## Activate GPU (Colab only)

When in Colab, you'll need to enable GPUs for the notebook:

- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

In [1]:
#@title Notebook Setup

#@markdown In order to be able to run the code, we need to install the *eagerx_tutorials* package.

try:
    import eagerx_tutorials
except ImportError:
    if "google.colab" in str(get_ipython()):
        print("Installing pybullet with pip.")
        import gdown
        gdown.download(id="15IKv71tEK11p1B6iZ1HX8r_MY2ibcS_h", quiet=True)
        !pip install pybullet-3.2.5-cp310-cp310-linux_x86_64.whl >> /tmp/pybullet_install.txt 2>&1
    !{"echo 'Installing eagerx-tutorials with pip.' && pip install eagerx-tutorials >> /tmp/eagerx_install.txt 2>&1"}

# Setup interactive notebook
# Required in interactive notebooks only.
from eagerx_tutorials import helper
helper.setup_notebook()

# Import eagerx
import eagerx
eagerx.set_log_level(eagerx.WARN)

Not running on CoLab.
Installing eagerx-gui



## Let's get started

We will again create an environment with the *Pendulum* object, like we did in the [first](https://colab.research.google.com/github/eager-dev/eagerx_tutorials/blob/master/tutorials/pendulum/1_environment_creation.ipynb) and [second](https://colab.research.google.com/github/eager-dev/eagerx_tutorials/blob/master/tutorials/pendulum/2_reset_and_step.ipynb) tutorial. 

In essence, an [Object](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html) in eagerx is a collection of sensors, actuators and states. In the object definition, an appropriate [Space](https://eagerx.readthedocs.io/en/master/guide/api_reference/utilities/space.html) is associated with each of them, so that it is easy to infer what legitimate values can be send to the actuators, what values can be expected to be received from the sensors, or what the set of valid initial states are. [Space](https://eagerx.readthedocs.io/en/master/guide/api_reference/utilities/space.html) is a subclass of [gym.spaces.Space](https://www.gymlibrary.ml/content/spaces/) and essentially wraps the [Box](https://www.gymlibrary.ml/content/spaces/#box) Space. It allows the Space bounds to be optionally defined when they are unknown.

Let's go the *Pendulum* object to explain this.
Remember that we can print information of an object as follows:

<!-- However, first we would like to clarify the converter types of EAGERx, i.e. [Converter](https://eagerx.readthedocs.io/en/master/guide/api_reference/converter/converter.html), [SpaceConverter](https://eagerx.readthedocs.io/en/master/guide/api_reference/converter/space_converter.html) and [Processor](https://eagerx.readthedocs.io/en/master/guide/api_reference/converter/processor.html). -->
<!-- The [Converter](https://eagerx.readthedocs.io/en/master/guide/api_reference/converter/converter.html) allows to convert messages from one message type into another. -->
<!-- The [SpaceConverter](https://eagerx.readthedocs.io/en/master/guide/api_reference/converter/space_converter.html) allows to connect entities to actions and observations and create the appropriate [Gym spaces](https://gym.openai.com/docs/#spaces). -->
<!-- Finally, the [Processor](https://eagerx.readthedocs.io/en/master/guide/api_reference/converter/processor.html) allows to convert messages without changing the message type. -->

In [2]:
from eagerx_tutorials.pendulum.objects import Pendulum
Pendulum.info()

   entity_type: `Pendulum`
   module: `eagerx_tutorials.pendulum.objects`
   file: `/home/r2ci/eagerx-dev/eagerx_tutorials/eagerx_tutorials/pendulum/objects.py`

Supported engines:
 - eagerx_ode.engine/OdeEngine

Make this spec with:
   spec = Pendulum.make(name: str, actuators: List[str] = None, sensors: List[str] = None, states: List[str] = None, rate: float = 30.0, render_shape: List[int] = None, render_fn: str = None)

class Pendulum:
   make(name: str, actuators: List[str] = None, sensors: List[str] = None, states: List[str] = None, rate: float = 30.0, render_shape: List[int] = None, render_fn: str = None):
      sensors:
       - theta: Space(-999.0, 999.0, (), float32)
       - theta_dot: Space(-999.0, 999.0, (), float32)
       - image: Space(uint8)
       - u_applied: Space([-2.], [2.], (1,), float32)
      actuators:
       - u: Space([-2.], [2.], (1,), float32)
      engine_states:
       - model_state: Space([-3.14 -9.  ], [3.14 9.  ], (2,), float32)
       - model_paramete

The printed info shows, amongst other things, the sensors, actuators and states of the *Pendulum* and their associated spaces.

For example, the sensor `theta` has data type [np.float32](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.single) with the expected sensors values to lie within [-999, 999]. From this, we can already notice that theta is unwrapped (i.e. not wrapped to [$-\pi$, $\pi$]). We will see later on that this hampers learning, hence we will introduce a processor that wraps the angle.

The actuator `u` also has data type [np.float32](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.single) and it accepts legitimate values between [-2, 2].

In order to specify a `Space`, you must always specify the data type so that eagerx knows how to (de)serialize messages when running distributed. You are encouraged to also specify the expected `shape` and bounds (`low`, `high`) whenever possible, so that eagerx can perform checks on individual parts of your environment and provide informative error messages. Moreover, if an `actuator`/`sensor` is connected to the environment as an `action`/`observation`, you are required to fully specify the `Space` so that eagerx can infer the action and observaton space. Sometimes this may not be possible as is the case with sensor `image`. Its associated space only specifies the data type [np.uint8](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.ubyte), because the shape can only be inferred at run-time when the `render_shape` argument to the `Pendulum.make` method is given.

In [3]:
#@markdown Let's make the *Pendulum* object and inspect the space for the `image` sensor.

pendulum = Pendulum.make("pendulum", actuators=["u"], sensors=["theta", "theta_dot", "image"], states=["model_state"], render_shape=[480, 480])
pendulum.sensors.image

Params for ('pendulum', 'sensors', 'image'): 

processor: null
rate: 15.0
space:
  dtype: uint8
  high: 255
  low: 0
  shape:
  - 480
  - 480
  - 3

After making the pendulum and inspecting the `image` sensor, we see that it now is a fully defined space with data type [np.uint8](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.ubyte) with the expected sensors values to lie within [0, 255] and have a shape of [480, 480, 3]. This corresponds to the `render_shape` argument we provided to the `make` method.

In some cases, you may want to modify the space (i.e. modify the bounds `low`/`high`, changes the `dtype`, or `shape`) to fit the context of a specific environment after have made the specification of an `Object` or `Node`. For example, we may easily modify the `shape` for sensor `image` as follows:


In [4]:
pendulum.sensors.image.space.shape = [500, 500, 3]
pendulum.sensors.image

Params for ('pendulum', 'sensors', 'image'): 

processor: null
rate: 15.0
space:
  dtype: uint8
  high: 255
  low: 0
  shape:
  - 500
  - 500
  - 3

When we connect the object within a [graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html?highlight=graph) to, for example, another object or node, we should make sure that at least the data types match (else an error is raised). In addition, we probably want the `shape` and bounds (`low`/`high`) to match as well (though this is not strongly enforced), as otherwise unexpected errors may occur.

It may happen that the output of an object does not perfectly fit our desired format. In this tutorial, for example, we would like the angle sensor measurements `theta` to be wrapped to always lie between [$-\pi$, $\pi$]. This is easily achieved with a [Processor](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html). Processors preprocess incoming our outgoing messages into the valid format (e.g. changing the datatype, modifying the shape, changing the bounds). This avoids us having to modify the `Pendulum` object's code, and increases modularity & compatibility between various object defined within eagerx. **Note! The user is responsible for modifying the corresponding Space if a Processor is used.**

Next, we will define a new processor that wraps sensor `theta` so that it always lies between [$-\pi$, $\pi$] by subclassing [Processor](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html) and specifying the following:
- [make()](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html#eagerx.core.entities.Processor.make): Makes the parameter specification of the processor.
- [initialize()](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html#eagerx.core.entities.Processor.initialize): Initializes the processor.
- [convert()](https://eagerx.readthedocs.io/en/master/guide/api_reference/processor/index.html#eagerx.core.entities.Processor.convert): Converts message.


In [5]:
from eagerx.core.specs import ProcessorSpec
import numpy as np


class WrappingProcessor(eagerx.Processor):

    @classmethod
    def make(cls, dtype: str = "float32") -> ProcessorSpec:
        spec = cls.get_specification()  # Creates a base parameter specification object
        spec.config.dtype = dtype  # Adds dtype parameter to specification.
        return spec

    def initialize(self, spec: ProcessorSpec):
        self.dtype = spec.config.dtype

    def convert(self, theta: np.ndarray):
        
        # START EXERCISE 1.1
        # Instead of the normalized angle, the convert method should return the decomposed angle: [cos(theta), sin(theta)].  
        processed_theta = theta - 2 * np.pi * np.floor((theta + np.pi) / (2 * np.pi))
        # END EXERCISE 1.1
        
        return processed_theta.astype(self.dtype)

In [6]:
#@markdown We make the specification of this processor by calling classmethod `make` in a similar way to how we made the specification for Object `Pendulum`. We then proceed to add the processor to sensor `theta` and modify its space accordingly.

# Add a processor the sensor `theta`.
pendulum.sensors.theta.processor = WrappingProcessor.make(dtype="float32")

# START EXERCISE 1.2
# Modify the bounds `low`, `high` according to the expected decomposed angle bounds.
pendulum.sensors.theta.space.low = -np.pi
pendulum.sensors.theta.space.high = np.pi
pendulum.sensors.theta.space.shape = []  # For now, shape==[] because the wrapped angle is simply a scalar. Modify for the decomposed angle!
# END EXERCISE 1.2

In [7]:
#@markdown Next we will construct the graph with the Pendulum similar to the previous tutorials.

# Define rate in Hz
rate = 30.0

# Initialize empty graph
graph = eagerx.Graph.create()

# Add pendulum to the graph
graph.add(pendulum)

# Connect the pendulum to an action and observation
# We will now explicitly set the window size
graph.connect(action="voltage", target=pendulum.actuators.u, window=1)
graph.connect(source=pendulum.sensors.theta, observation="angle", window=1)
graph.connect(source=pendulum.sensors.theta_dot, observation="angular_velocity", window=1)

# Render image
graph.render(source=pendulum.sensors.image, rate=rate)

# Make OdeEngine
from eagerx_ode.engine import OdeEngine
engine = OdeEngine.make(rate=rate)

In [8]:
#@markdown Finally, we will define the environment and create it like we did in the previous tutorials.

from typing import Dict


class PendulumEnv(eagerx.BaseEnv):
    def __init__(self, name: str, rate: float, graph: eagerx.Graph, engine: eagerx.Engine, render_mode="human"):
        """Initializes an environment with EAGERx dynamics.

        :param name: The name of the environment. Everything related to this environment
                     (parameters, topics, nodes, etc...) will be registered under namespace: "/[name]".
        :param rate: The rate (Hz) at which the environment will run.
        :param graph: The graph consisting of nodes and objects that describe the environment's dynamics.
        :param engine: The physics engine that will govern the environment's dynamics.
        :param render_mode: Defines the render mode (e.g. "human", "rgb_array").
        """
        # Make the backend specification
        from eagerx.backends.single_process import SingleProcess
        backend = SingleProcess.make()
        
        self.eval = eval
        
        # Maximum episode length
        self.max_steps = 100
        
        # Step counter
        self.steps = None
        super().__init__(name, rate, graph, engine, backend, force_start=True, render_mode=render_mode)
    
    def step(self, action: Dict):
        """A method that runs one timestep of the environment's dynamics.

        :params action: A dictionary of actions provided by the agent.
        :returns: A tuple (observation, reward, terminated, truncated, info).

              - observation: Observations of the current timestep that comply with
                             the :func:`~eagerx.core.env.BaseEnv.observation_space`.

              - reward: amount of reward returned after previous action

              - terminated: whether the episode has ended due to a terminal state, in which case further step() calls will
                            return undefined results

              - truncated: whether the episode has ended due to a time limit, in which case further step() calls will
                           return undefined results

              - info: contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
        """
        # Take step
        observation = self._step(action)
        self.steps += 1
        
        # Get angle and angular velocity
        # Take first element because of window size (covered in other tutorial)

        # START EXERCISE 1.3
        # Reconstruct theta, since it is no longer observed directly by the agent.
        th = observation["angle"][-1]
        # END EXERCISE 1.3

        thdot = observation["angular_velocity"][-1]
        
        # Convert from numpy array to float
        u = float(action["voltage"])

        # Calculate cost
        # Penalize angle error, angular velocity and input voltage
        cost = th**2 + 0.1 * (thdot / (1 + 10 * abs(th))) ** 2 + 0.01 * u ** 2  

         # Determine when is the episode over
        # currently just a timeout after 100 steps
        terminated = False
        truncated = self.steps > self.max_steps
        info = {}

        # Render
        if self.render_mode == "human":
            self.render()
        return observation, -cost, terminated, truncated, info
    
    def reset(self, seed=None, options=None):
        """Resets the environment to an initial state and returns an initial observation.

        :returns: The initial observation.
        """
        # Determine reset states
        states = self.state_space.sample()
            
        # Perform reset
        observation = self._reset(states)
        info = {}

        # Reset step counter
        self.steps = 0
        
        # Render
        if self.render_mode == "human":
            self.render()
        return observation, info

    
# Initialize Environment
env = PendulumEnv(name="PendulumEnv", rate=rate, graph=graph, engine=engine)

[31m[WARN]: Backend 'SINGLE_PROCESS' does not support multiprocessing, so all nodes are launched in the ENVIRONMENT process.[0m


When we print the `action_space` and `observation_space` that eagerx infers from the spaces, we notice that, indeed, the agent uses the processed space for `theta`.

In [9]:
# Print action & observation space
print("action_space: ", env.action_space)
print("observation_space: ", env.observation_space)

action_space:  Dict('voltage': Space([-2.], [2.], (1,), float32))
observation_space:  Dict('angle': Box(-3.1415927, 3.1415927, (1,), float32), 'angular_velocity': Box(-999.0, 999.0, (1,), float32))


In [10]:
#@title Training

#@markdown Finally, we will initialize the environment and train the agent using [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/), again similar to the first two tutorials.

import stable_baselines3 as sb3
from stable_baselines3.common.env_checker import check_env
from eagerx.wrappers import Flatten

# Stable Baselines3 expects flattened actions & observations
# Convert observation and action space from Dict() to Box(), normalize actions
env = Flatten(env)
env = helper.RescaleAction(env, min_action=-1.0, max_action=1.0)

# Check that env follows Gym API and returns expected shapes
check_env(env)

# Initialize learner
model = sb3.SAC("MlpPolicy", env, verbose=1)

# Train for 1 minute (sim time)
model.learn(total_timesteps=int(150 * rate))

env.shutdown()

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 101      |
|    ep_rew_mean     | -790     |
| time/              |          |
|    episodes        | 4        |
|    fps             | 61       |
|    time_elapsed    | 6        |
|    total_timesteps | 404      |
| train/             |          |
|    actor_loss      | 14.3     |
|    critic_loss     | 1.22     |
|    ent_coef        | 0.914    |
|    ent_coef_loss   | -0.143   |
|    learning_rate   | 0.0003   |
|    n_updates       | 303      |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 101      |
|    ep_rew_mean     | -789     |
| time/              |          |
|    episodes        | 8        |
|    fps             | 59       |
|    time_elapsed    | 13       |
|    total_timesteps | 808      |
| train/             

# Exercises

In these exercises you will improve the sample efficiency of the learning problem by modifying the processor.

For these exercises, you will need to modify or add some lines of code in the cells above.
These lines are indicated by the following comments:

```python
# START EXERCISE [BLOCK_NUMBER]

# END EXERCISE [BLOCK_NUMBER]
```

However, feel free to play with the other code as well if you are interested.
We recommend you to restart and run all code after each section (in Colab there is the option *Restart and run all* under *Runtime*).


## 1. Angle Decomposition

In the code as provided above, we reduced the observation space by normalizing $\theta$.
This will improve the sample efficiency, but we can do even better.
Normalizing $\theta$ results in discontinous observations of $\theta$, i.e. there is a sign switch increasing the angle over $\pi$ or decreasing the angle smaller than $-\pi$.
Many (reinforcement) learning algorithms have difficulties with such discontinuities.
Therefore it is better to choose a representation for $\theta$ without discontinuities, e.g. its cosine and sine component: $[\cos(\theta), \sin(\theta)]$.


### Add your code to the following blocks: 

1.1 Instead of the normalized angle, the `convert` method should return the decomposed angle: $[\cos(\theta), \sin(\theta)]$.  
1.2 The bounds (`low`, `high`) and `shape`  of the space of `theta` should be updated accordingly.  
1.3 The `step()` mehtod should be updated as well. Reconstruct $\theta$, since it is no longer observed directly by the agent.  