# Tutorial 2: Specifying the Reset and Step Function of the Environment

In this tutorial, we will show how to create a gym environment using [EAGERx](https://eagerx.readthedocs.io/en/master/) while specifying the [step function](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.step_fn) and [reset function](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.reset_fn).

The aim of this tutorial cover the following concepts of EAGERx:
- Extracting observations in the [step_fn](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.step_fn)
- Resetting states using the [reset_fn](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.reset_fn)
- The `window` argument of the [connect method](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html?highlight=connect#eagerx.core.graph.Graph.connect)
- Simulating delays using the `delay` argument of the [connect method](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html?highlight=connect#eagerx.core.graph.Graph.connect)

In the remainder of this tutorial we will go more into detail on these concepts.

## Pendulum Swing-up

We will create an environment for solving the classic control problem of swinging up an underactuated pendulum, very similar to the [Pendulum-v0 environment](https://gym.openai.com/envs/Pendulum-v0/).
Our goal is to swing up this pendulum to the upright position and keep it there, while minimizing the velocity of the pendulum and the input voltage.

Since the dynamics of a pendulum actuated by a DC motor are well known, we can simulate the pendulum by integrating the corresponding ordinary differential equations (ODEs):


$\mathbf{x} = \begin{bmatrix} \theta \\ \dot{\theta} \end{bmatrix} \\ \dot{\mathbf{x}} = \begin{bmatrix} \dot{\theta} \\ \frac{1}{J}(\frac{K}{R}u - mgl \sin{\theta} - b \dot{\theta} - \frac{K^2}{R}\dot{\theta})\end{bmatrix}$

with $\theta$ the angle w.r.t. upright position, $\dot{\theta}$ the angular velocity, $u$ the input voltage, $J$ the inertia, $m$ the mass, $g$ the gravitational constant, $l$ the length of the pendulum, $b$ the motor viscous friction constant, $K$ the motor constant and $R$ the electric resistance.

## Notebook Setup

In order to be able to run the code, we need to install the *eagerx_tutorials* package and ROS.

In [5]:
try:
    import eagerx_tutorials
except ImportError:
    !{"pip install eagerx-tutorials  >> /tmp/eagerx_install.txt"}
if 'google.colab' in str(get_ipython()):
  !{"curl 'https://raw.githubusercontent.com/eager-dev/eagerx_tutorials/master/scripts/setup_colab.sh' > ~/setup_colab.sh"}
  !{"bash ~/setup_colab.sh"}

# Setup interactive notebook
# Required in interactive notebooks only.
from eagerx_tutorials import helper
helper.setup_notebook()
env = None

# Allows reloading of registered entites from changed files
# Required in interactive notebooks only.
%reload_ext autoreload
%autoreload 1

Not running on CoLab.
Execute ROS commands as "!...".
ROS noetic available.


## Let's get started

We start by importing the required packages and initializing EAGERx.

In [6]:
import eagerx
import eagerx_tutorials.pendulum  # Registers Pendulum
import eagerx_ode  # Registers OdeBridge

# Initialize eagerx (starts roscore if not already started.)
eagerx.initialize("eagerx_core")

... logging to /home/jelle/.ros/log/9f2e2914-c14f-11ec-899a-774a4722e19d/roslaunch-jelle-Alienware-m15-R4-66309.log
[1mstarted roslaunch server http://145.94.60.89:41743/[0m
ros_comm version 1.15.14


SUMMARY

PARAMETERS
 * /rosdistro: noetic
 * /rosversion: 1.15.14

NODES



[WARN] [1650544220.688696]: Roscore cannot run as another roscore/master is already running. Continuing without re-initializing the roscore.


Next, we make the *Pendulum* object and add it to an empty graph, just like we did in the [first tutorial](https://colab.research.google.com/github/eager-dev/eagerx_tutorials/blob/master/tutorials/pendulum/pendulum_1.ipynb).

We will again connect the *voltage* actuator of the *Pendulum* to an action that we will call *voltage_action* and connect the *angle_sensor* to an observation, which we will call *angle_observation*.
However, we will now go a bit more into detail on the [connect method](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html?highlight=connect#eagerx.core.graph.Graph.connect).
When connecting outputs, sensors or actions, we can specify among other things the `window` of the connection.
It specifies how to deal with messages that are sent between nodes in between calls to their callback.
In some cases it makes sense to use the last one only; in others you would like to receive all messages between calls.
This can be achieved by setting the `window` size:

- `window` $= 1$: Only the last received input message are available to the receiver.
- `window` $= x \ge 1$: The trailing last $x$ received input messages are available to the receiver ($1 \le$ received number of messages $\le$ `window` ).
- `window` $= 0$: All input messages received since the last call to the node's callback are available.

This is in particular relevant when connecting to observations, since it has consequences for the size of the observation space.
When connecting to an observation with `window` $= 0$, this observation will **not** be included in the observation space of the agent, because its dimensions might change every time step and are therefore unknown on beforehand.
Also worth noting, is that for observations if `window` $= x > 1$, at time step $t < x$, the first message is repeated $x - t$ times to ensure that the dimensions of the observation space are consistent.

In [13]:
# Define rate (Hz)
rate = 30.0

# Make pendulum
pendulum = eagerx.Object.make("Pendulum", "pendulum", actuators=["voltage"], sensors=["angle_sensor"], states=["model_state"])

# Initialize empty graph
graph = eagerx.Graph.create()

# Add pendulum to the graph
graph.add(pendulum)

# Connect the pendulum to an action and observation
# We will now explicitly set the window size
graph.connect(action="voltage_action", target=pendulum.actuators.voltage, window=1)
graph.connect(source=pendulum.sensors.angle_sensor, observation="angle_observation", window=1)

# Make OdeBridge
bridge = eagerx.Bridge.make("OdeBridge", rate=rate)

Using the [*eagerx_gui* package](https://github.com/eager-dev/eagerx_gui), we see that the graph looks as follows:


```python
graph.gui()
```
<img src="./figures/tutorial_1_gui.svg" width=720>

We will now define the [step function](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.step_fn).
Here we define the `reward` and fill the `info` dictionary at each time step.
Since we want to stabilize the pendulum in upright position — while minimising the input voltage — we define the reward to be a weighted sum of $\theta^2$, $\dot{\theta^2}$ and $u^2$.

We will elaborate a bit more on this step function.
The step function is an argument to the [EagerxEnv](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv).
This function is called by the EAGERx environment every time step and it returns the same things as the `step()` method of OpenAI Gym environments, i.e. `observation` (**dict**), `reward` (**float**), `done` (**boolean**) and `info` (**dict**).
More information on this can be found [here](https://gym.openai.com/docs/#observations).
The input to the step function in EAGERx are:

- `previous_observation` (**dict**): The `observation` at the previous timestep.
- `observation` (**dict**): The `observation` at the current timestep.
- `action` (**dict**): The agent's action at the current timestep. 
- `steps` (**int**): The number of timesteps since the start of the episode (since the last reset).

Note that the `observation` is both an input and output of this function and should only be used for extracting information and should not be manipulated.

The keys of observations and dictionaries correspond to respectively the value of the `observation` and `action` argument provided in the [connect method](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html?highlight=connect#eagerx.core.graph.Graph.connect).


In [None]:
import numpy as np
from typing import Dict

# Define step function
def step_fn(previous_observation: Dict[str, np.ndarray], observation: Dict[str, np.ndarray], action: Dict[str, np.ndarray], steps: int):
    
    # Get observation and action
    # Take first element because of window size
    state = observation["angle_observation"][0] 
    # Convert from numpy array to float
    u = float(action["voltage_action"])
    
    # Get angle and angular velocity
    th, thdot = state
    
    # Normalize angle so it lies in [-pi, pi]
    th -= 2 * np.pi * np.floor((th + np.pi) / (2 * np.pi))
    
    # Calculate cost
    # Penalize angle error, angular velocity and input voltage
    cost = th**2 + 0.1 * thdot**2 + 0.001 * u**2  
    
    # Determine when is the episode over
    # currently just a timeout after 100 steps
    done = steps > 100
    
    # Set info, tell the algorithm the termination was due to a timeout
    # (the episode was truncated)
    info = {"TimeLimit.truncated": steps > 100}
    
    return observation, -cost, done, info