# Tutorial 1: EAGERx Environment Creation and Training

In this tutorial, we will show a simple example of how to create a gym environment using [EAGERx](https://eagerx.readthedocs.io/en/master/).
Also, we will use this environment to train a policy using [Stable Baselines 3](https://stable-baselines3.readthedocs.io/en/master/).

The aim of this tutorial is to show some of the key concepts of EAGERx:
- Creating a [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) with an [Object](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html)
- How to use this [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) and a [Bridge](https://eagerx.readthedocs.io/en/master/guide/api_reference/bridge/index.html) to create an [Eagerx Environment](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html)

In the remainder of this tutorial we will go more into detail on these concepts.


## Pendulum Swing-up

We will create an environment for solving the classic control problem of swinging up an underactuated pendulum, very similar to the [Pendulum-v0 environment](https://gym.openai.com/envs/Pendulum-v0/).
Our goal is to swing up this pendulum to the upright position and keep it there, while minimizing the velocity of the pendulum and the input voltage.


## How to run this Notebook

Note that EAGERx makes use of ROS 1 functionality.
Therefore ROS 1 should be [installed](http://wiki.ros.org/ROS/Installation) on your system.
Note that it should also be sourced:
```bash
source /opt/ros/<distro>/setup.bash.
```
Where `<distro>` should be replaced with your ROS distribution, i.e. `melodic` or `noetic`.
Furthermore, the Python dependencies can be installed by running (this will also install `eagerx`):
```bash
pip install eagerx-tutorials
```

Now we are ready to go!
First we will import EAGERx.
Also, we will initialize it.
As mentioned before, EAGERx makes use of ROS functionality for communication and during initialization a ROS master is started if there isn't one running already. Note that we set the log level here to `INFO`, putting it to `DEBUG` will give you more output and can be useful when debugging.

In [1]:
import eagerx
eagerx.initialize("eagerx_core", anonymous=True, log_level=eagerx.log.INFO)

... logging to /home/jelle/.ros/log/b4cdd1e0-bbd5-11ec-92a9-31cb7b131af7/roslaunch-jelle-Alienware-m15-R4-36551.log
[1mstarted roslaunch server http://145.94.158.246:41279/[0m
ros_comm version 1.15.14


SUMMARY

PARAMETERS
 * /rosdistro: noetic
 * /rosversion: 1.15.14

NODES

auto-starting new master
[1mprocess[master]: started with pid [36585][0m
[1mROS_MASTER_URI=http://localhost:11311[0m
[1msetting /run_id to b4cdd1e0-bbd5-11ec-92a9-31cb7b131af7[0m
[1mprocess[rosout-1]: started with pid [36610][0m
started core service [/rosout]


<roslaunch.parent.ROSLaunchParent at 0x7f1c88b94d30>

Next, we create a [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) and add an [Object](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html) to it.

The Graph describes the interconnect of Nodes and Objects.
In this way, the creation of an environment becomes modular.
This allows users to create an implementation for Nodes and Objects once, and easily create new environments by reusing these implementations.
Also, this allows to construct complex environments using a the Nodes and Objects as basic building blocks.

An Object is an entitity within EAGERx that consists of sensors, actuators and states. An actuator is an input to an object, a sensor is an output of an object and a state is something that we can reset at the beginning of an episode.

We are going to create one object (the pendulum).



In [2]:
import eagerx_tutorials.pendulum  # Registers Pendulum
from eagerx.core.graph import Graph


# Define rate (depends on rate of ode)
rate = 30.0

# Initialize empty graph
graph = Graph.create()

# Create pendulum
pendulum = eagerx.Object.make(
    "Pendulum", "pendulum", actuators=["voltage"], sensors=["angle_sensor"], states=["model_state"],
)

graph.add(pendulum)

# Connect the pendulum to an action and observation
graph.connect(action="action", target=pendulum.actuators.voltage)
graph.connect(source=pendulum.sensors.angle_sensor, observation="observation", window=1)

In [3]:
import eagerx_ode  # Registers OdeBridge

# Define bridges
bridge = eagerx.Bridge.make("OdeBridge", rate=rate, is_reactive=True, process=eagerx.process.ENVIRONMENT)

In [4]:
import numpy as np

# Define step function
def step_fn(prev_obs, obs, action, steps):
    # Get observation and action
    state = obs["observation"][0]
    u = action["action"][0]
    
    # Calculate reward
    sin_th, cos_th, thdot = state
    th = np.arctan2(sin_th, cos_th)
    
    cost = th**2 + 0.1 * thdot**2 + 0.001 * (u**2)
    
    # Determine done flag
    done = steps > 500
    
    # Set info:
    info = dict()
    
    return obs, -cost, done, info

In [5]:
from eagerx.core.env import EagerxEnv
from eagerx.wrappers import Flatten

# Initialize Environment
env = Flatten(EagerxEnv(name="rx", rate=rate, graph=graph, bridge=bridge, step_fn=step_fn))

[INFO] [1649928709.106482]: Node "/rx/env/supervisor" initialized.
[INFO] [1649928709.255296]: Node "/rx/bridge" initialized.
[INFO] [1649928709.376627]: Node "/rx/environment" initialized.


In [6]:
import stable_baselines3 as sb

# Initialize learner
model = sb.SAC("MlpPolicy", env, verbose=1, device="cpu")

# Train for 3 minutes (sim time)
model.learn(total_timesteps=int(180 * rate))

env.shutdown()

[INFO] [1649928709.450252]: Adding object "pendulum" of type "Pendulum" to the simulator.
[INFO] [1649928709.632045]: Node "/rx/pendulum/angle_sensor" initialized.
[INFO] [1649928709.718788]: Node "/rx/pendulum/pendulum_actuator" initialized.
[INFO] [1649928709.820664]: Waiting for nodes "['pendulum/pendulum_actuator', 'pendulum/applied']" to be initialized.
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
[INFO] [1649928710.061550]: Nodes initialized.
[INFO] [1649928710.244832]: Pipelines initialized.
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 501       |
|    ep_rew_mean     | -3.82e+03 |
| time/              |           |
|    episodes        | 4         |
|    fps             | 100       |
|    time_elapsed    | 19        |
|    total_timesteps | 2004      |
| train/             |           |
|    actor_loss      | 68.6      |
|    critic_loss     | 0.557     |
|    ent_coef        | 0.70