# Tutorial 1: EAGERx Environment Creation and Training

In this tutorial, we will show a simple example of how to create a gym environment using [EAGERx](https://eagerx.readthedocs.io/en/master/).
Also, we will use this environment to train a policy using [Stable Baselines 3](https://stable-baselines3.readthedocs.io/en/master/).

The aim of this tutorial is to show some of the key concepts of EAGERx:
- Creating a [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) with an [Object](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html)
- How to use this [Graph](https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html) and a [Bridge](https://eagerx.readthedocs.io/en/master/guide/api_reference/bridge/index.html) to create an [Eagerx Environment](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html)

In the remainder of this tutorial we will go more into detail on these concepts.


## Pendulum Swing-up

We will create an environment for solving the classic control problem of swinging up an underactuated pendulum, very similar to the [Pendulum-v0 environment](https://gym.openai.com/envs/Pendulum-v0/).
Our goal is to swing up this pendulum to the upright position and keep it there, while minimizing the velocity of the pendulum and the input voltage.


## How to run this Notebook

Note that EAGERx makes use of ROS 1 functionality.
Therefore ROS 1 should be [installed](http://wiki.ros.org/ROS/Installation) on your system.
Note that it should also be sourced:
```bash
source /opt/ros/<distro>/setup.bash.
```
Where `<distro>` should be replaced with your ROS distribution, i.e. `melodic` or `noetic`.
Furthermore, the Python dependencies can be installed by running (this will also install `eagerx`):
```bash
pip install eagerx-tutorials
```

Now we are ready to go!
First we will import EAGERx.
Also, we will initialize it.
As mentioned before, EAGERx makes use of ROS functionality for communication and during initialization a ROS master is started if there isn't one running already. Note that we set the log level here to `INFO`, putting it to `DEBUG` will give you more output and can be useful when debugging.

In [1]:
import eagerx
eagerx.initialize("eagerx_core", anonymous=True, log_level=eagerx.log.INFO)

Next, we create a [Graph]((https://eagerx.readthedocs.io/en/master/guide/api_reference/graph/graph.html)) and add an [Object](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html) to it.

The Graph describes the interconnect of [Nodes](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/node.html) and [Objects](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html).
In this way, the creation of an environment becomes modular.
This allows users to create an implementation for [Nodes](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/node.html) and [Objects](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html) once, and easily create new environments by reusing these implementations.
Also, this allows to construct complex environments using a the [Nodes](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/node.html) and [Objects](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html) as basic building blocks.

An [Object](https://eagerx.readthedocs.io/en/master/guide/api_reference/object/index.html) is an entitity within EAGERx that consists of sensors, actuators and states. An actuator is an input to an object, a sensor is an output of an object and a state is something that we can reset at the beginning of an episode.



In [None]:
import eagerx_tutorials  # Registers Pendulum
from eagerx.core.graph import Graph


# Define rate (depends on rate of ode)
rate = 30.0

# Initialize empty graph
graph = Graph.create()

# Create pendulum
pendulum = eagerx.Object.make(
    "Pendulum", "pendulum", actuators=["pendulum_input"], sensors=["pendulum_output"], states=["model_state"],
)

graph.add(pendulum)

# Connect the pendulum to an action and observation
graph.connect(action="action", target=pendulum.actuators.pendulum_input)
graph.connect(source=pendulum.sensors.pendulum_output, observation="observation", window=1)

[INFO] [1649867476.421185]: Node "/rx/env/supervisor" initialized.
[INFO] [1649867476.552694]: Waiting for nodes "['bridge']" to be initialized.
[INFO] [1649867477.188952]: Node "/rx/environment" initialized.
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
[INFO] [1649867478.324800]: Nodes initialized.
[INFO] [1649867478.585955]: Pipelines initialized.
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 501       |
|    ep_rew_mean     | -4.02e+03 |
| time/              |           |
|    episodes        | 4         |
|    fps             | 76        |
|    time_elapsed    | 26        |
|    total_timesteps | 2004      |
| train/             |           |
|    actor_loss      | 69.7      |
|    critic_loss     | 1.06      |
|    ent_coef        | 0.642     |
|    ent_coef_loss   | -0.0887   |
|    learning_rate   | 0.0003    |
|    n_updates       | 1903      |
----------------------------------
----

In [None]:
import eagerx_ode  # Registers OdeBridge

# Define bridges
bridge = eagerx.Bridge.make("OdeBridge", rate=rate, is_reactive=True)

In [None]:
import numpy as np

# Define step function
def step_fn(prev_obs, obs, action, steps):
    # Get observation and action
    state = obs["observation"][0]
    u = action["action"][0]
    
    # Calculate reward
    sin_th, cos_th, thdot = state
    th = np.arctan2(sin_th, cos_th)
    
    cost = th**2 + 0.1 * thdot**2 + 0.001 * (u**2)
    
    # Determine done flag
    done = steps > 500
    
    # Set info:
    info = dict()
    
    return obs, -cost, done, info

In [None]:
from eagerx.core.env import EagerxEnv
from eagerx.wrappers import Flatten

# Initialize Environment
env = Flatten(EagerxEnv(name="rx", rate=rate, graph=graph, bridge=bridge, step_fn=step_fn))

In [None]:
import stable_baselines3 as sb

# Initialize learner
model = sb.SAC("MlpPolicy", env, verbose=1, device="cpu")

# Train for 3 minutes (sim time)
model.learn(total_timesteps=int(1800 * rate))