# ICAPS24 SkDecide Tutorial: solving problems (possibly imported from Gym) with Reinforcement Learning and Cartesian Genetic Programming

Alexandre Arnold, Guillaume Povéda, Florent Teichteil-Königsbuch

Credits to [IMACS](https://imacs.polytechnique.fr/) and especially to Nolwen Huet

This tutorial shows how to load a domain in scikit-decide and try to solve it with techniques from different communities:

*   [Reinforcement Learning](https://en.wikipedia.org/wiki/Reinforcement_learning) (RL)
*   [Cartesian Genetic Programming](https://en.wikipedia.org/wiki/Cartesian_genetic_programming) (CGP)

<div class="alert alert-block alert-warning">
<b>Special notice for binder + sb3:</b>
it seems that <a href=https://stable-baselines3.readthedocs.io/en/master/>stable-baselines3</a> algorithms are <em>extremely slow</em> on <a href=https://mybinder.org/>binder</a>. We could not find a proper explanation about it. We strongly advise you to either launch the notebook locally or on colab, or to skip the cells that are using sb3 algorithms (here PPO solver).
</div>

## Prerequisites

Concerning the python kernel to use for this notebook:
- If running locally, be sure to use an environment with
  - `scikit-decide[all]`
  - `renderlab` (to render `gymnasium` environments)
  - `moviepy==1.0.3` and `opencv-python` needed by `renderlab`
- If running on colab, the next cell does it for you.
- If running on binder, the environment should be ready.

In [None]:
# On Colab: install the library
on_colab = "google.colab" in str(get_ipython())
if on_colab:
    import glob
    import json
    import sys

    using_nightly_version = True

    if using_nightly_version:
        # look for nightly build download url
        release_curl_res = !curl -L   -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/airbus/scikit-decide/releases/tags/nightly
        release_dict = json.loads(release_curl_res.s)
        release_download_url = sorted(
            release_dict["assets"], key=lambda d: d["updated_at"]
        )[-1]["browser_download_url"]
        print(release_download_url)

        # download and unzip
        !wget --output-document=release.zip {release_download_url}
        !unzip -o release.zip

        # get proper wheel name according to python version used
        wheel_pythonversion_tag = f"cp{sys.version_info.major}{sys.version_info.minor}"
        wheel_path = glob.glob(
            f"dist/scikit_decide*{wheel_pythonversion_tag}*manylinux*.whl"
        )[0]

        skdecide_pip_spec = f"{wheel_path}[all]"
    else:
        skdecide_pip_spec = "scikit-decide[all]"

    # install scikit-decide with all extras + renderlab
    !pip install {skdecide_pip_spec} renderlab moviepy==1.0.3 opencv-python

## Loading a domain

Once a problem is formalized as a scikit-decide domain, it can be tackled by any compatible solver. Domains can be created from scratch or imported from various formats. Here we demonstrate how to import an environment from [Gymnasium](https://gymnasium.farama.org) (the new official fork of OpenAI Gym, a standard API often used in RL communities), like [Cart Pole](https://gymnasium.farama.org/environments/classic_control/cart_pole):

In [None]:
# patch for renderlab, based on older version of IPython:
import IPython.core.display
from IPython.display import display

IPython.core.display.display = display

import gymnasium as gym
from renderlab import RenderFrame

from skdecide.hub.domain.gym import GymDomain

# Select a Gymnasium environment
ENV_NAME = "CartPole-v1"

# Create a domain factory, a callable returning a skdecide domain (used by solvers)
def domain_factory(record_videos=False):

    # Create a Gymnasium environment
    env = gym.make(ENV_NAME, render_mode="rgb_array")

    # Maybe wrap it with RenderFrame to record/play episode videos (works in Colab)
    if record_videos:
        env = RenderFrame(env, "./render")

    # Return a skdecide domain from a Gymnasium environment
    return GymDomain(env)


# In simple cases, domain_factory can be created in one line:
# domain_factory = lambda: GymDomain(gym.make(ENV_NAME))

The rollout utility provides a quick way to run episodes by taking random actions (or a solver policy as shown later) in the domain:

In [None]:
from skdecide.utils import rollout

# Instantiate one domain (used for rollouts)
domain = domain_factory(record_videos=True)

# Do a random rollout of the domain (random actions are taken when no solver is specified)
rollout(
    domain, num_episodes=1, max_steps=1000, verbose=False
)  # try verbose=True for more printing
domain.unwrapped().play()  # watch last episode in video by calling play() on the underlying Gymnasium environment (works in Colab)

## Solving the domain

One of the key benefits of scikit-decide is its ability to connect the same domain definition to many different solvers from various communities. To demonstrate this versatility, we show how to solve the domain loaded above with both Reinforcement Learning and Cartesian Genetic Programming:

### With Reinforcement Learning (RL)

Scikit-decide provides wrappers for several RL solvers, such as [RLlib](https://docs.ray.io/en/latest/rllib/index.html) and [Stable-Baselines3](https://stable-baselines3.readthedocs.io). We use the latter in this example:

In [None]:
from stable_baselines3 import PPO

from skdecide.hub.solver.stable_baselines import StableBaseline

# Check domain compatibility with StableBaseline RL solver (good practice)
assert StableBaseline.check_domain(domain)

# Instantiate solver with parameters of choice (e.g. type of algo/neural net, learning steps...)
solver = StableBaseline(
    domain_factory,
    algo_class=PPO,
    baselines_policy="MlpPolicy",
    learn_config={"total_timesteps": 10000},
    verbose=1,
)

# Solve with RL
solver.solve()

# Save solution
solver.save("saved_solution")

Now we can run episodes with rollout using the latest solver policy:

In [None]:
# Visualize solution (pass solver to rollout to use its policy)
rollout(domain, solver, num_episodes=1, max_steps=1000, verbose=False)
domain.unwrapped().play()

It is always possible to reload a saved solution (especially useful in a new Python session) and possibly continue learning from there. By running this cell a couple of times, you should see increasingly better solutions:

In [None]:
# Optional: reload solution (required if reloading in a new Python session)
solver.load("saved_solution")

# Continue learning
solver.solve()

# Save updated solution
solver.save("saved_solution")

# Visualize updated solution
rollout(domain, solver, num_episodes=1, max_steps=1000, verbose=False)
domain.unwrapped().play()

After using a solver, it is good practice to do a cleanup as shown below (not critical here, but sometimes useful for C++ parallel solvers in scikit-decide). Note that this is automatically done if you use the solver within a `with` statement, which will be shown in the CGP sub-section below as an alternative.

In [None]:
# Clean up solver after use (good practice)
solver._cleanup()

### With Cartesian Genetic Programming (CGP)

In [None]:
from skdecide.hub.solver.cgp import CGP

# Check domain compatibility with CGP solver (good practice)
assert CGP.check_domain(domain)

# Instantiate solver with parameters of choice (using "with" syntax to avoid manual clean up)
with CGP(domain_factory, folder_name="TEMP_CGP", n_it=50) as solver:

    # Solve with CGP
    solver.solve()

    # Visualize solution
    rollout(domain, solver, num_episodes=1, max_steps=1000, verbose=False)
    domain.unwrapped().play()

In this example, you may find that RL often finds better solutions than CGP (although this depends on the solver parameters and the random seed). Note however that this is highly problem-dependent: try re-running this notebook after setting `ENV_NAME = "MountainCarContinuous-v0"` at the beginning and you may find opposite results. That shows the power of having a wide catalog of solvers to find the best solution for each specific problem!