#  Gym environment demo: Continuous Mountain Car

In this notebook we will solve the continuous mountain car problem taken from [OpenAI Gym](https://gym.openai.com/), a toolkit for developing environments, usually to be solved by reinforcement learning algorithms.
Continuous Mountain Car, a standard testing domain in Reinforcement Learning (RL), is a problem in which an under-powered car must drive up a steep hill. Note that we use here the *continuous* version of the mountain car because 
it has a shaped reward (i.e.not sparse) which can be used successfully when solving, as opposed to the other "Mountain Car" environments. ***Citation needed about what is a shaped/sparse reward***


This problem has been chosen for three reasons:
  - Show how scikit-decide can be used to solve Gym environments (the de-facto standard in the RL community),
  - Highlight that by doing so, you will be able to use not only solvers from the RL community (like the ones in StableBaselines3 for example), but also other solvers coming from other communities like genetic programming and planning/search (use of an underlying search graph) that can sometimes be very efficient.
  - We use the "continuous" version of the mountain car because 

Therefore in this notebook we will go through the following steps:
  - Wrap a Gym environment in a scikit-decide domain;
  - Use a classical RL algorithm like PPO to solve our problem;
  - Give CGP (Cartesian Genetic Programming)  a try on the same problem;
  - Finally use IW (Iterated Width) coming from the planning community on the same problem.

We will conclude the notebook by an analysis of the 3 solvers.


**For local Jupyter users:** you will need to install [`ffmpeg`](https://www.ffmpeg.org/) before running this notebook as the solutions are showed as mp4 movies thanks to it.

In [None]:
from enum import Enum
from typing import NamedTuple, Optional, Any, List, Callable
from copy import deepcopy
from time import sleep
from collections import deque
import random
from math import sqrt, ceil
from base64 import b64encode
import glob
import os

from IPython.display import HTML
import ipywidgets as widgets
import matplotlib.pyplot as plt
from stable_baselines3 import PPO
import gym

from skdecide.hub.solver.stable_baselines import StableBaseline
from skdecide import DeterministicPlanningDomain, Space, Value
from skdecide.hub.domain.gym import GymDomain, GymWidthDomain, GymDiscreteActionDomain, GymPlanningDomain
from skdecide.builders.domain import UnrestrictedActions, Renderable
from skdecide.utils import rollout, match_solvers, load_registered_solver
from skdecide.hub.space.gym import ListSpace, EnumSpace, MultiDiscreteSpace
from skdecide.hub.solver.iw import IW
from skdecide.hub.solver.cgp import CGP  # Cartesian Genetic Programming


When running this notebook on remote servers like with Colab or Binder, rendering of gym environment will fail as no actual display device exists. Thus we need to start a virtual display to make it work.

In [None]:
if "DISPLAY" not in os.environ:
    import pyvirtualdisplay
    _display = pyvirtualdisplay.Display(visible=False, size=(1400, 900))
    _display.start()

## Domain selection

*Add an explanation of this gym environment: goal, actions available, ...*

Choose the gym environment we would like to use.

In [None]:
ENV_NAME = 'MountainCarContinuous-v0'

Define a domain factory using `GymDomain` proxy available in scikit-decide, for solving purpose.
And another domain factory using a `gym.wrapper.Monitor` for rolling-out to record movies.

In [None]:
domain_factory = lambda: GymDomain(gym.make(ENV_NAME))
domain4movie_factory = lambda: GymDomain(gym.wrappers.Monitor(gym.make(ENV_NAME), "tmp_gym_recording", force=True))

Here is a screenshot of such an environment.

In [None]:
domain = domain_factory()
domain.reset()
plt.imshow(domain.render(mode="rgb_array"))
plt.axis("off")
domain.close()

## Solve & Play

### with Reinforcement Learning (StableBaseline)
*Small text describing the algo needed*

Check the compatibility of the domain with the chosen solver

In [None]:
domain = domain_factory()
assert StableBaseline.check_domain(domain)
domain.close()

Define a solver factory (class to use with default arguments).

In [None]:
solver_factory = lambda: StableBaseline(PPO, 'MlpPolicy', learn_config={'total_timesteps': 50000})

Solve and play the solution using a monitor wrapper to get a movie. The statement `with` ensure that the solver is properly cleaned after use.

In [None]:
with solver_factory() as solver:
    # solve
    GymDomain.solve_with(solver, domain_factory)
    # create a domain wrapped in a monitor for recording during rollout
    domain4movie = domain4movie_factory()
    # rollout
    try:
        rollout(domain4movie, solver, num_episodes=1, max_steps=1000, max_framerate=None, outcome_formatter=None)
    finally:
        domain4movie.close()

Display recorded movie

In [None]:
videofilename = glob.glob("tmp_gym_recording/openaigym.video.*.video000000.mp4")[0]
with open(videofilename,'rb') as mp4:
    data_url = "data:video/mp4;base64," + b64encode(mp4.read()).decode()
display(HTML(f"<video alt='solution movie' controls autoplay preload'><source src='{data_url}' type='video/mp4'></video>"))

### With Cartesian Genetic Programming (CGP)
*Small text describing the algo needed*

Check the compatibility of the domain with the chosen solver

In [None]:
domain = domain_factory()
assert StableBaseline.check_domain(domain)
domain.close()

Define a solver factory (class to use with default arguments).

In [None]:
solver_factory = lambda: CGP('TEMP_CGP', n_it=25)

Solve and play the solution using a monitor wrapper to get a movie. The statement `with` ensure that the solver is properly cleaned after use.

In [None]:
with solver_factory() as solver:
    # solve
    GymDomain.solve_with(solver, domain_factory)
    # create a domain wrapped in a monitor for recording during rollout
    domain4movie = domain4movie_factory()
    # rollout
    try:
        rollout(domain4movie, solver, num_episodes=1, max_steps=1000, max_framerate=None, outcome_formatter=None)
    finally:
        domain4movie.close()

Display recorded movie

In [None]:
videofilename = glob.glob("tmp_gym_recording/openaigym.video.*.video000000.mp4")[0]
with open(videofilename,'rb') as mp4:
    data_url = "data:video/mp4;base64," + b64encode(mp4.read()).decode()
display(HTML(f"<video alt='solution movie' controls autoplay preload'><source src='{data_url}' type='video/mp4'></video>"))

### With Classical Planning  (IW)
*Small text describing the algo needed*

Here we need to further wraps the domain so that IW can be used on it.

In [None]:
class D(GymPlanningDomain, GymWidthDomain, GymDiscreteActionDomain):
    pass


class GymDomainForWidthSolvers(D):
    def __init__(self, gym_env: gym.Env,
                 set_state: Callable[[gym.Env, D.T_memory[D.T_state]], None] = None,
                 get_state: Callable[[gym.Env], D.T_memory[D.T_state]] = None,
                 termination_is_goal: bool = True,
                 continuous_feature_fidelity: int = 5,
                 discretization_factor: int = 3,
                 branching_factor: int = None,
                 max_depth: int = 1000) -> None:
        GymPlanningDomain.__init__(self,
                                   gym_env=gym_env,
                                   set_state=set_state,
                                   get_state=get_state,
                                   termination_is_goal=termination_is_goal,
                                   max_depth=max_depth)
        GymDiscreteActionDomain.__init__(self,
                                         discretization_factor=discretization_factor,
                                         branching_factor=branching_factor)
        GymWidthDomain.__init__(self, continuous_feature_fidelity=continuous_feature_fidelity)
        gym_env._max_episode_steps = max_depth
    
    def state_features(self, s):
        return self.bee2_features(s)
    
    def heuristic(self, s):
        return Value(cost=0)


We redefine accordingly the domain factories.

In [None]:
domain_factory = lambda: GymDomainForWidthSolvers(gym.make(ENV_NAME))
domain4movie_factory = lambda: GymDomainForWidthSolvers(gym.wrappers.Monitor(gym.make(ENV_NAME), "tmp_gym_recording", force=True))

Check the compatibility of the domain with the chosen solver

In [None]:
domain = domain_factory()
assert IW.check_domain(domain)
domain.close()

Define a solver factory (class to use with default arguments).

In [None]:
default_args = {
    'state_features': lambda d, s: d.bee2_features(s),
    'node_ordering': lambda a_gscore, a_novelty, a_depth, b_gscore, b_novelty, b_depth: a_novelty > b_novelty,
    'parallel': False,
    'debug_logs': False,
    'domain_factory': domain_factory,
}
solver_factory = lambda: IW(**default_args)

Solve and play the solution using a monitor wrapper to get a movie. The statement `with` ensure that the solver is properly cleaned after use.

In [None]:
with solver_factory() as solver:
    # solve
    GymDomain.solve_with(solver, domain_factory)
    # create a domain wrapped in a monitor for recording during rollout
    domain4movie = domain4movie_factory()
    # rollout
    try:
        rollout(domain4movie, solver, num_episodes=1, max_steps=1000, max_framerate=None, outcome_formatter=None)
    finally:
        domain4movie.close()

Display recorded movie

In [None]:
videofilename = glob.glob("tmp_gym_recording/openaigym.video.*.video000000.mp4")[0]
with open(videofilename,'rb') as mp4:
    data_url = "data:video/mp4;base64," + b64encode(mp4.read()).decode()
display(HTML(f"<video alt='solution movie' controls autoplay preload'><source src='{data_url}' type='video/mp4'></video>"))