In [2]:
import helper
if  'google.colab' in str(get_ipython()):
    !{"pip install eagerx-tutorials  >> /tmp/eagerx_install.txt"}
    !{helper.get_tutorial_path() + "/../scripts/setup_colab.sh"}

# Setup interactive notebook
helper.setup_notebook()

# Required in interactive notebooks only.
# Allows reloading of registered entites from changed files
%load_ext autoreload
%autoreload 1

You should consider upgrading via the '/home/r2ci/.cache/pypoetry/virtualenvs/eagerx-tutorials-t4w5hBSU-py3.8/bin/python -m pip install --upgrade pip' command.[0m[33m
[0mUnsupported Ubuntu version: 20.04
This colab setup script only works with 16.04, 18.04, or 20.04
Not running on CoLab
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Tutorial 1: EAGERx Environment Creation and Training

In this tutorial, we will show a simple example of how to create a gym environment using [EAGERx](https://eagerx.readthedocs.io/en/master/).
Also, we will use this environment to train a policy using [Stable Baselines 3](https://stable-baselines3.readthedocs.io/en/master/).

The aim of this tutorial is to show some of the key concepts of EAGERx:
- Creating a [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) with an [Object](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html)
- How to use this [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) and a [Bridge](https://eagerx.readthedocs.io/en/master/guide/api_reference/bridge/index.html) to create an [Eagerx Environment](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html)
- How to train a policy with the [Eagerx Environment](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html)

In the remainder of this tutorial we will go more into detail on these concepts.


## Pendulum Swing-up

We will create an environment for solving the classic control problem of swinging up an underactuated pendulum, very similar to the [Pendulum-v0 environment](https://gym.openai.com/envs/Pendulum-v0/).
Our goal is to swing up this pendulum to the upright position and keep it there, while minimizing the velocity of the pendulum and the input voltage.

Since the dynamics of a pendulum actuated by a DC motor are well known, we can simulate the pendulum by integrating its equations of motion:


<img src="./figures/eom.svg" />

where theta is the angle w.r.t. upright position, theta dot the angular velocity, u the input voltage, J the inertia, m the mass, g the gravitational constant, l the length of the pendulum, b the motor viscous friction constant, K the motor constant and R the electric resistance.



<img src="./figures/pendulum.GIF" width="480" height="480" />


## How to run this Notebook

Note that EAGERx makes use of ROS 1 functionality.
Therefore ROS 1 should be [installed](http://wiki.ros.org/ROS/Installation) on your system.
Note that it should also be sourced:
```bash
source /opt/ros/<distro>/setup.bash.
```
Where `<distro>` should be replaced with your ROS distribution, i.e. `melodic` or `noetic`.

There are two ways to install the Python dependencies, i.e. using pip or from source.

### Installation using pip
Furthermore, the Python dependencies can be installed by running (this will also install `eagerx`):
```bash
pip3 install eagerx-tutorials
```

### Installation from source
Clone this repository:
```bash
git clone git@github.com:eager-dev/eagerx_tutorials.git ; cd eagerx_tutorials
```
then install [Poetry](https://python-poetry.org/) (if not installed yet):
```bash
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
```
Install this package and its dependencies using Poetry (will also install Jupyterlab if not yet installed):
```bash
poetry install
```
Then open Jupyter Notebook from within the Poetry environment
```
poetry run jupyter lab
```

## Let's get started

First we will import EAGERx.
Also, we will initialize it.
As mentioned before, EAGERx makes use of ROS functionality for communication and during initialization a ROS master is started if there isn't one running already. Note that we set the log level here to `INFO`, putting it to `DEBUG` will give you more output and can be useful when debugging.

In [3]:
import eagerx
eagerx.initialize("eagerx_core", anonymous=True, log_level=eagerx.log.INFO)

... logging to /home/r2ci/.ros/log/c3d64436-bc18-11ec-acc3-0d651ac60889/roslaunch-r2ci-Alienware-m15-R4-138473.log
[1mstarted roslaunch server http://127.0.0.1:43629/[0m
ros_comm version 1.15.14


SUMMARY

PARAMETERS
 * /rosdistro: noetic
 * /rosversion: 1.15.14

NODES



[WARN] [1650040092.655931]: Roscore cannot run as another roscore/master is already running. Continuing without re-initializing the roscore.


An Object is an entitity within EAGERx that consists of sensors, actuators and states. An actuator is an input to an object, a sensor is an output of an object and a state is something that we can reset at the beginning of an episode.

We are going to create one object (the pendulum). For this first tutorial, we don't want to go into details too much and start with an existing object. If you are interested, you can find its definition [here](https://github.com/eager-dev/eagerx_tutorials/blob/master/eagerx_tutorials/pendulum/objects.py).
Note that we import the pendulum.
While this might look like an unused import, it is not.
During the import, the pendulum object is registered and we can therefore make it based on its ID, i.e. *Pendulum*.

In [4]:
import eagerx_tutorials.pendulum  # Registers Pendulum

# Create pendulum
pendulum = eagerx.Object.make(
    "Pendulum", "pendulum", actuators=["voltage"], sensors=["angle_sensor"], states=["model_state"],
)

Next, we create a [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) and add the pendulum to it.

The graph describes the interconnect of nodes and objects.
In this way, the creation of an environment becomes modular.
This allows users to create an implementation for nodes and objects once, and easily create new environments by reusing these implementations.
Also, this allows to construct complex environments using a the nodes and objects as basic building blocks.

In [5]:
# Define rate (depends on rate of ode)
rate = 30.0

# Initialize empty graph
graph = eagerx.Graph.create()

# Add pendulum to the graph
graph.add(pendulum)

# Connect the pendulum to an action and observation
graph.connect(action="action", target=pendulum.actuators.voltage)
graph.connect(source=pendulum.sensors.angle_sensor, observation="observation", window=1)

It is also possible to inspect the graph using the eagerx-gui package.
It can be installed as follows:
```bash
pip3 install eagerx-gui
```
Jupyter notebooks have limited support for interactive applications, so we cannot open the GUI here.
But if we were to run
```python
graph.gui()
```
The ouput would be as follows:
<img src="./figures/tutorial_1_gui.svg">

Next, we will create the [Bridge](https://eagerx.readthedocs.io/en/master/guide/api_reference/bridge/index.html).
Since objects can have implementions for multiple physics engines and real systems, we need to initialize the appropriate bridge.
In our case, we will use the [OdeBridge](https://github.com/eager-dev/eagerx_ode), which allows to simulate systems based on ordinary differential equations (ODEs).
In other tutorials we will go more into detail on the bridge and how you can create your own bridge.
For now, what want to make clear is that the bridge has the `is_reactive` argument.
When set to `True`, the environment will run reactive which will ensure synchronicity of all messages, e.g. the actions and observations.

In [6]:
import eagerx_ode  # Registers OdeBridge

# Define bridges
bridge = eagerx.Bridge.make("OdeBridge", rate=rate, is_reactive=True, process=eagerx.process.ENVIRONMENT)

Just like in normal Gym environments, we will create a step function in which we will calculate the reward at each time step and check for termination conditions.

Note that we obtaion the values of the keys *action* and *observation*, which correspond to the names of the actions and observations above in the screenshot of the GUI.

In [7]:
import numpy as np

# Define step function
def step_fn(prev_obs, obs, action, steps):
    # Get observation and action
    state = obs["observation"][0]
    u = action["action"][0]
    
    # Calculate reward
    sin_th, cos_th, thdot = state
    th = np.arctan2(sin_th, cos_th)
    
    cost = th**2 + 0.1 * thdot**2 + 0.001 * u**2
    
    # Determine done flag
    done = steps > 500
    
    # Set info:
    info = dict()
    
    return obs, -cost, done, info

Having created a graph, a bridge and a step function, we can now construct the EAGERx environment.

In [8]:
from eagerx.wrappers import Flatten

# Initialize Environment
env = Flatten(eagerx.EagerxEnv(name="rx", rate=rate, graph=graph, bridge=bridge, step_fn=step_fn))

[INFO] [1650040093.062496]: Pre-existing parameters under namespace "/rx" deleted.
[INFO] [1650040093.089659]: Node "/rx/env/supervisor" initialized.
[INFO] [1650040093.248857]: Node "/rx/bridge" initialized.
[INFO] [1650040093.373650]: Node "/rx/environment" initialized.
Error : 'states'
[35m[2022-04-15 18:28:13.382225][0m[35m[13847][0m[35m[register_object][0m[1m[35m[bridge      ][0m[35m[cb_pre_reset][0m[35m on_error: 'states', None[0m
[35m[2022-04-15 18:28:13.382545][0m[35m[13847][0m[35m[register_object][0m[1m[35m[bridge      ][0m[35m[init_bridge_][0m[35m on_error: 'states', None[0m
[35m[2022-04-15 18:28:13.383329][0m[35m[13847][0m[35m[register_object][0m[1m[35m[bridge      ][0m[35m[cb_post_rese][0m[35m on_error: 'states', None[0m
Error : 'states'
Error : 'states'
Error : 'states'


  File "/home/r2ci/.cache/pypoetry/virtualenvs/eagerx-tutorials-t4w5hBSU-py3.8/lib/python3.8/site-packages/rx/core/operators/map.py", line 37, in on_next
    result = _mapper(value)
  File "/home/r2ci/.cache/pypoetry/virtualenvs/eagerx-tutorials-t4w5hBSU-py3.8/lib/python3.8/site-packages/eagerx/core/rx_operators.py", line 966, in get_object_params
    state_params = obj_params["states"]
  File "/home/r2ci/.cache/pypoetry/virtualenvs/eagerx-tutorials-t4w5hBSU-py3.8/lib/python3.8/site-packages/rx/core/operators/map.py", line 37, in on_next
    result = _mapper(value)
  File "/home/r2ci/.cache/pypoetry/virtualenvs/eagerx-tutorials-t4w5hBSU-py3.8/lib/python3.8/site-packages/eagerx/core/rx_operators.py", line 966, in get_object_params
    state_params = obj_params["states"]
  File "/home/r2ci/.cache/pypoetry/virtualenvs/eagerx-tutorials-t4w5hBSU-py3.8/lib/python3.8/site-packages/rx/core/operators/map.py", line 37, in on_next
    result = _mapper(value)
  File "/home/r2ci/.cache/pypoetry/vir

The environment we have created, can be used like any other Gym environment.
Here we will now train a policy to swing up the pendulum using the Soft Actor Critic (SAC) reinforcement learning algorithm implementation from [Stable Baselines 3](https://stable-baselines3.readthedocs.io/en/master/).

In [9]:
import stable_baselines3 as sb

# Initialize learner
model = sb.SAC("MlpPolicy", env, verbose=1, device="cpu")

# Train for 3 minutes (sim time)
model.learn(total_timesteps=int(180 * rate))

env.shutdown()

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
[INFO] [1650040094.024678]: Nodes initialized.


[WARN] [1650040098.170840]: Parameters for object registry request (/rx/bf) not found on parameter server. Timeout: object (/rx/bf) not registered.


[reset] KEYBOARD INTERRUPT


KeyboardInterrupt: 