# Tutorial 7: Reset Routines

Resetting an object to a desired state is non-trivial in the real-world compared to simulation, where resets can often be caried out with a single command. In this tutorial, we will discuss methods to include reset routines into your graph that can reset an object's state that work both in simulation and the real-world. 

The following will be covered:
<!-- - Defining an object's state with an [`EngineState`](https://eagerx.readthedocs.io/en/master/guide/api_reference/engine_state/index.html).  -->
- Defining the reset routine with a [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html).
- Reset the object's state with the [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html).

In the remainder of this tutorial, we will go more into detail on this concept.

Furthermore, at the end of this notebook you will find an exercise.
For the exercise you will have to add/modify a couple of lines of code, which are marked by

```python

# START EXERCISE [BLOCK_NUMBER]

# END EXERCISE [BLOCK_NUMBER]
```

## Pendulum Swing-up

We will assume that we already have the object definition of the underactuated pendulum that we used in the [first](https://colab.research.google.com/github/eager-dev/eagerx_tutorials/blob/master/tutorials/pendulum/1_environment_creation.ipynb) tutorial with its dynamics simulated with the [OdeEngine](https://github.com/eager-dev/eagerx_ode). 

Our goal is to create a [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) that can reset the pendulum to a desired state (i.e. $\theta=\theta_\text{des}$ and $\dot{\theta}=0$) without requiring a simulator reset. In other words, the [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) will receive the desired state as a target and it will send actuator commands until the pendulum has reached this state.

## Activate GPU (Colab only)

When in Colab, you'll need to enable GPUs for the notebook:

- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

## Notebook Setup

In order to be able to run the code, we need to install the *eagerx_tutorials* package.

In [2]:
try:
    import eagerx_tutorials
except ImportError:
    !{"echo 'Installing eagerx-tutorials with pip.' && pip install eagerx-tutorials >> /tmp/eagerx_install.txt 2>&1"}

# Setup interactive notebook
# Required in interactive notebooks only.
from eagerx_tutorials import helper
helper.setup_notebook()

# Import eagerx
import eagerx
eagerx.set_log_level(eagerx.WARN)

Not running on CoLab.


## How do [ResetNodes](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) work?

As mentioned before, resetting an object is non-trivial in the real-world compared to simulation, where resets can often be caried out with a single command. We developed [`ResetNodes`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) in EAGERx to allow users to easily define reset routines that may, for example use pre-defined controllers, to reset an Object to a desired state.

The structure of a [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) is very similar to a conventional [`Node`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/node.html). However, the [`callback()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.callback) of a [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) is skipped until the agent/user calls [`.reset()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.reset) on the gym evironment. At that moment, the desired state that that was selected in the [reset function](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.reset_fn) (convered in [this tutorial](https://colab.research.google.com/github/eager-dev/eagerx_tutorials/blob/master/tutorials/pendulum/2_reset_and_step.ipynb)) is send to the [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) as a `target` state.

The [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) takes over control and starts sending commands to the object's actuators until the object's current state is equal (or close to) that target state. In other words, after the agent/user calls [`.reset()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.reset) the [`ResetNode.callback()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.callback) is called at the specified node `rate` with the connected `inputs` together with the `target` state and will produce `outputs` that bring the object's state closer to the desired state. During each callback, the [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) assess the status of the reset routine (i.e. whether the `target` state was reached) and communicates this status to the engine with a message.    

**Important**: To assure input-output synchronization in [`sync`](https://eagerx.readthedocs.io/en/master/guide/api_reference/engine/index.html#eagerx.core.entities.Engine.sync) mode, the [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) must be placed in-between the actions commanded by the agent/user and the object actuator. Over the course of an episode, the reset node simply feeds through all commands, and only after [`.reset()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.reset) is called, will the commands be produced by [`ResetNode.callback()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.callback). The reset node's `rate` is constrained to be equal to the rate of the commands that it must feedthrough. 
## How to define a [ResetNode](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html)?

We can create a reset node by inheriting from the class [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html).
This class has the following abstract methods we need to implement:

- [`make()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.make): Makes the parameter specification of the node.
- [`initialize()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.initialize): Initializes the node.
- [`reset()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.reset): Resets the node at the beginning of an episode.
- [`callback()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.callback): Called at the rate of the node after the agent/user calls [`.reset()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/env/index.html#eagerx.core.env.EagerxEnv.reset) on the gym evironment. It receives all connected `inputs` and `targets` as arguments.


## An example [ResetNode](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html)

To illustrate how [`ResetNodes`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) work, we will again create an environment with the *Pendulum* object, like we did in the [first](https://colab.research.google.com/github/eager-dev/eagerx_tutorials/blob/master/tutorials/pendulum/1_environment_creation.ipynb) and [second](https://colab.research.google.com/github/eager-dev/eagerx_tutorials/blob/master/tutorials/pendulum/2_reset_and_step.ipynb) tutorials. We will add a [`ResetNode`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html) called `angle_reset` and connect it as illustrated in the graph below:

<img src="./figures/tutorial_7_gui.svg" width=720>

- We connect the actions commanded by the agent (i.e. `voltage`) to the `feedthrough` connection `u` (light blue color) and we connect the `angle_reset`'s output `u` to the pendulum's actuator `u`. In this way, the `voltage` actions will be fed through to the output `u` during an episode, while the `angle_reset`'s [`callback()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.callback) will produce the outputs during a reset.
- We connect the pendulum's `model_state` to the `angle_reset`'s target `goal`. In this way, the `angle_reset`'s [`callback()`](https://eagerx.readthedocs.io/en/master/guide/api_reference/node/reset_node.html#eagerx.core.entities.ResetNode.callback) will receive the desired `model_state` as an argument.
- To assess the status of the reset routine (i.e. whether the `model_state` state was reached, we connect the two pendulum sensors `theta` and `dtheta` as inputs to `angle_reset`.

Below, the definition of this reset node is given. Currently, the node uses a [PID](https://en.wikipedia.org/wiki/PID_controller) to reset the pendulum to a angle with zero angular velocity. If the reset takes too long, we timeout and consider the reset finished regardless of whether the target state was reached.

In the exercise of this tutorial, we will modify this routine to, instead, apply random actions for a fixed amount of time (disregarding the desired state).

In [3]:
from typing import Optional, List
from eagerx import Space, ResetNode
from eagerx.core.specs import ResetNodeSpec
from eagerx.utils.utils import Msg
import numpy as np


def wrap_angle(angle):
    return angle - 2 * np.pi * np.floor((angle + np.pi) / (2 * np.pi))


class ResetAngle(ResetNode):
    @classmethod
    def make(
        cls,
        name: str,
        rate: float,
        threshold: float = 0.1,
        timeout: float = 5.0,
        gains: Optional[List[float]] = None,
        u_range: Optional[List[float]] = None,
    ) -> ResetNodeSpec:
        """This AngleReset node resets the pendulum to a desired angle with zero angular velocity. Note that this controller
        only works properly when resetting the pendulum near the downward facing equilibrium.

        :param name: Node name
        :param rate: Rate at which callback is called. Must be equal to the rate of the nodes that are connect to the feedthroughs.
        :param threshold: Absolute difference between the desired and goal state before considering the reset complete.
        :param timeout: Maximum time (seconds) before considering the reset finished (regardless whether the goal was reached).
        :param gains: Gains of the PID controller used to reset.
        :param u_range: Min and max action.
        :return: Specification.
        """
        # Get base parameter specification with defaults parameters
        spec = cls.get_specification()

        # Modify default node params
        spec.config.update(name=name, rate=rate, process=eagerx.process.ENVIRONMENT, color="grey")
        spec.config.update(inputs=["theta", "dtheta"], targets=["goal"], outputs=["u"])
        spec.config.update(u_range=u_range, threshold=threshold, timeout=timeout)
        # Proportional (Kp), derivative (Kd) and integral (Ki) gains
        spec.config.gains = gains if isinstance(gains, list) else [1.0, 0.5, 0.0]

        # Add space_converter
        c = Space(low=[u_range[0]], high=[u_range[1]], dtype="float32")
        spec.outputs.u.space = c
        return spec

    def initialize(self, spec: ResetNodeSpec):
        self.threshold = spec.config.threshold
        self.timeout = spec.config.timeout
        self.u_min, self.u_max = spec.config.u_range
        
        # Creat a simple PID controller
        from eagerx_tutorials.pendulum.pid import PID
        gains = spec.config.gains
        self.controller = PID(u0=0.0, kp=gains[0], kd=gains[1], ki=gains[2], dt=1 / self.rate)

    @eagerx.register.states()
    def reset(self):
        # Reset the internal state of the PID controller (ie the error term).
        self.controller.reset()
        self.ts_start_routine = None

    @eagerx.register.inputs(theta=Space(shape=(), dtype="float32"), 
                            dtheta=Space(shape=(), dtype="float32"))
    @eagerx.register.targets(goal=Space(low=[-3.14, -9.0], high=[3.14, 9.0], dtype="float32"))
    @eagerx.register.outputs(u=Space(dtype="float32"))
    def callback(self, t_n: float, goal: Msg, theta: Msg, dtheta: Msg):
        if self.ts_start_routine is None:
            self.ts_start_routine = t_n

        # Convert messages to floats and numpy array
        theta = theta.msgs[-1]  # Take the last received message
        dtheta = dtheta.msgs[-1]  # Take the last received message
        goal = np.array(goal.msgs[-1], dtype="float32")  # Take the last received message

        # Define downward angle as theta=0 (resolve downward discontinuity)
        theta += np.pi
        goal[0] += np.pi

        # Wrap angle between [-pi, pi]
        theta = wrap_angle(theta)
        goal[0] = wrap_angle(goal[0])

        # Overwrite the desired velocity to be zero.
        goal[1] = 0.0

        # Calculate the action using the PID controller
        # START EXERCISE 1.2
        # Select random actions instead.
        u = self.controller.next_action(theta, ref=goal[0])
        
        # PID: Determine If we have reached our goal state
        # Random Actions: We timeout if the routine takes too long and simply assume that we are done.
        done = np.isclose(np.array([theta, dtheta]), goal, atol=self.threshold).all().item()
        # END EXERCISE 1.2
        
        # Clip actions
        u = np.clip(u, self.u_min, self.u_max)  # Clip u to range
        
        # Prepare output message for transmission.
        # This must contain a message for every registered & selected output and target.
        # For targets, this message decides whether the goal state has been reached (or we, for example, timeout the reset).
        # The name for this target message is the registered target name + "/done".
        output_msgs = {"u": np.array([u], dtype="float32"), "goal/done": done}
        return output_msgs


After defining & registering the reset node above, we can create it and add it to the graph. We will then proceed to connect it according to the GUI visualization of the intended graph shown above.

In [4]:
# Define rate in Hz
rate = 30.0

# Initialize empty graph
graph = eagerx.Graph.create()

# Create a pendulum
from eagerx_tutorials.pendulum.objects import Pendulum
pendulum = Pendulum.make("pendulum", actuators=["u"], sensors=["theta", "dtheta", "image"], states=["model_state"])

# Add pendulum to the graph
graph.add(pendulum)

# Connect the pendulum to an action and observation
graph.connect(source=pendulum.sensors.theta, observation="angle")
graph.connect(source=pendulum.sensors.dtheta, observation="angular_velocity")

# Create the reset node
u_min = pendulum.actuators.u.space.low
u_max = pendulum.actuators.u.space.high
reset = ResetAngle.make("reset_angle", rate, gains=[2.0, 0.2, 1.0], u_range=[u_min, u_max])

# Add the reset node to the graph
graph.add(reset)

# Connect the pendulum state as the reset's target.
graph.connect(source=pendulum.states.model_state, target=reset.targets.goal)

# Connect the action we are feeding through during the course of an episode, but will be produced by the reset node during a reset.
# During normal operations, the ResetNode simply feeds through the voltage actionto reset.outputs.u.
graph.connect(action="voltage", target=reset.feedthroughs.u)

# When env.reset() is called, no voltage actions are being send by the agent, because we are resetting.
# At that moment, the ResetNode's callback will be called instead to produce the voltages. 
graph.connect(source=reset.outputs.u, target=pendulum.actuators.u)

# To decide on the voltage actions that will bring the current pendulum state closer to the desired target state,
# The ResetNode requires knowledge of the current pendulum angle information. Hence, we connect them as inputs.
# These inputs are also used by the reset node to determine whether the target state has been reached.
# If so, the reset node signals EagerxEnv that the desired state was reached.
graph.connect(source=pendulum.sensors.theta, target=reset.inputs.theta)
graph.connect(source=pendulum.sensors.dtheta, target=reset.inputs.dtheta)

# Define the render source
graph.render(source=pendulum.sensors.image, rate=rate)

Next, we will define a reset function for the environment that selects desired states. Unfortunately, we cannot sample any angle because the PID controller won't be able to reset to angles near the upright position due to the underactuation. Hence, we will start by always selecting the downward position of the pendulum. 

In the exercise, we will selecte angles that are sampled around the downward position of the pendulum to improve state-space coverage.

In [5]:
from typing import Dict
import numpy as np


class PendulumEnv(eagerx.BaseEnv):
    def __init__(self, name: str, rate: float, graph: eagerx.Graph, engine: eagerx.Engine):
        """Initializes an environment with EAGERx dynamics.

        :param name: The name of the environment. Everything related to this environment
                     (parameters, topics, nodes, etc...) will be registered under namespace: "/[name]".
        :param rate: The rate (Hz) at which the environment will run.
        :param graph: The graph consisting of nodes and objects that describe the environment's dynamics.
        :param engine: The physics engine that will govern the environment's dynamics.
        """
        # Make the backend specification
        from eagerx.backends.single_process import SingleProcess
        backend = SingleProcess.make()
        
        self.eval = eval
        
        # Maximum episode length
        self.max_steps = 100
        
        # Step counter
        self.steps = None
        super().__init__(name, rate, graph, engine, backend, force_start=True)
    
    def step(self, action: Dict):
        """A method that runs one timestep of the environment's dynamics.

        :params action: A dictionary of actions provided by the agent.
        :returns: A tuple (observation, reward, done, info).

            - observation: Dictionary of observations of the current timestep.

            - reward: amount of reward returned after previous action

            - done: whether the episode has ended, in which case further step() calls will return undefined results

            - info: contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
        """
        # Take step
        observation = self._step(action)
        self.steps += 1
        
        # Get angle and angular velocity
        # Take first element because of window size (covered in other tutorial)
        th = observation["angle"][0]
        thdot = observation["angular_velocity"][0]

        # Convert from numpy array to float
        u = float(action["voltage"])

        # Calculate cost
        # Penalize angle error, angular velocity and input voltage
        cost = th**2 + 0.1 * thdot**2 + 0.001 * u**2  

        # Determine when is the episode over
        # currently just a timeout after 100 steps
        done = self.steps > self.max_steps

        # Set info, tell the algorithm the termination was due to a timeout
        # (the episode was truncated)
        info = {"TimeLimit.truncated": self.steps > self.max_steps}
        
        return observation, -cost, done, info
    
    def reset(self) -> Dict:
        """Resets the environment to an initial state and returns an initial observation.

        :returns: The initial observation.
        """
        # Determine reset states
        states = self.state_space.sample()
        
        # START EXERCISE 1.1
        # Sample angles near the downward position.
        # Hint: angles are in [-pi, pi] and upright is theta=0.
        states["pendulum/model_state"] = np.array([3.14, 0], dtype="float32")
        # END EXERCISE 1.1

        # Perform reset
        observation = self._reset(states)

        # Reset step counter
        self.steps = 0
        return observation

We will proceed with defining the engine and initializing the environment. 

Finally, we train the agent using [Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/), again similar to the preceding tutorials.

In [6]:
import stable_baselines3 as sb3
from stable_baselines3.common.env_checker import check_env
from eagerx.wrappers import Flatten
from gym.wrappers.rescale_action import RescaleAction

# Make the engine
from eagerx_ode.engine import OdeEngine
engine = OdeEngine.make(rate=rate)

# Initialize Environment
env = PendulumEnv(name="PendulumEnv", rate=rate, graph=graph, engine=engine)

# Print action & observation space
print("action_space: ", env.action_space)
print("observation_space: ", env.observation_space)

# Stable Baselines3 expects flattened actions & observations
# Convert observation and action space from Dict() to Box(), normalize actions
env = Flatten(env)
env = RescaleAction(env, min_action=-1.0, max_action=1.0)

# Check that env follows Gym API and returns expected shapes
check_env(env)

# Toggle render
env.render("human")

# Initialize learner
model = sb3.SAC("MlpPolicy", env, verbose=1)

# Train for 1 minute (sim time)
model.learn(total_timesteps=int(60 * rate))

env.shutdown()

[31m[WARN]: Backend 'SINGLE_PROCESS' does not support multiprocessing, so all nodes are launched in the ENVIRONMENT process.[0m
action_space:  Dict(voltage:Space([-2.], [2.], (1,), float32))
observation_space:  Dict(angle:Box([-999.], [999.], (1,), float32), angular_velocity:Box([-999.], [999.], (1,), float32))
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 101       |
|    ep_rew_mean     | -1.03e+03 |
| time/              |           |
|    episodes        | 4         |
|    fps             | 71        |
|    time_elapsed    | 5         |
|    total_timesteps | 404       |
| train/             |           |
|    actor_loss      | 19        |
|    critic_loss     | 0.462     |
|    ent_coef        | 0.915     |
|    ent_coef_loss   | -0.134    |
|    learning_rate   | 0.0003    |
|    n_updates       | 303       |
------------------------------

# Exercise

In this exercise you will modify the reset routine defined above. 

For this exercise, you will need to modify or add some lines of code in the cells above.
These lines are indicated by the following comments:

```python
# START EXERCISE [BLOCK_NUMBER]

# END EXERCISE [BLOCK_NUMBER]
```

However, feel free to play with the other code as well if you are interested.
We recommend you to restart and run all code after each section (in Colab there is the option *Restart and run all* under *Runtime*).

## 1. Modify the reset procedure


### Add your code to the following blocks: 

1.1 Change the `reset()` method of the environment, such that the desired angles are sampled randomly around the downward position of the pendulum.
This will improve state-space coverage and improve the learning rate.  
1.2 Next, modify the callback of the reset node such that we do not use the PID controller, but perform random actions for 2 seconds before considering the reset finished. 
This will improve state-space coverage even more, because we now also allow for non-zero angular velocity resets. 