#  Gym environment demo: Mountain Car continuous

[OpenAI Gym](https://gym.openai.com/) is a toolkit for developing and comparing reinforcement learning algorithms.

In this notebook, we show how to use a Gym domain (namely "Mountain Car continuous") and try several solvers available in scikit-decide hub.

In [None]:
from enum import Enum
from typing import NamedTuple, Optional, Any, List
from copy import deepcopy
from time import sleep
from collections import deque
import random
from math import sqrt, ceil

import ipywidgets as widgets
import matplotlib.pyplot as plt
from stable_baselines3 import PPO
import gym

from skdecide.hub.solver.stable_baselines import StableBaseline
from skdecide import DeterministicPlanningDomain, Space, Value
from skdecide.hub.domain.gym import GymDomain
from skdecide.builders.domain import UnrestrictedActions, Renderable
from skdecide.utils import rollout, match_solvers, load_registered_solver
from skdecide.hub.space.gym import ListSpace, EnumSpace, MultiDiscreteSpace
from skdecide.hub.solver.lazy_astar import LazyAstar
from skdecide.hub.solver.cgp import CGP  # Cartesian Genetic Programming


## Domain selection

Choose the gym environment we would like to use:

In [None]:
ENV_NAME = 'MountainCarContinuous-v0'

Define a domain factory using `GymDomain` proxy available in scikit-decide.

In [None]:
domain_factory = lambda: GymDomain(gym.make(ENV_NAME))
domain = domain_factory()

Test rendering gym environment in a matplotlib figure.

In [None]:
domain.reset()
_, ax = plt.subplots(1, 1)
img = ax.imshow(domain.render(mode='rgb_array'))

## Solve & Play

### with Reinforcement Learning (StableBaseline)

Check that the domain is compatible.

In [None]:
assert StableBaseline.check_domain(domain)

Define factory (class to use with default arguments)

In [None]:
solver_factory = lambda: StableBaseline(PPO, 'MlpPolicy', learn_config={'total_timesteps': 30000})

Solve and store for later reuse. The statement `with` ensure that the solver is properly cleaned after use.

In [None]:
with solver_factory() as solver:
    GymDomain.solve_with(solver, domain_factory)
    solver.save('TEMP_Baselines')

Play the solution

In [None]:
with solver_factory() as solver:
    GymDomain.solve_with(solver, domain_factory, load_path='TEMP_Baselines')
    rollout(domain, solver, num_episodes=1, max_steps=1000, max_framerate=30, outcome_formatter=None)

### With Cartesian Genetic Programming (CGP)

Check that the domain is compatible.

In [None]:
assert CGP.check_domain(domain)

Define factory (class to use with default arguments)

In [None]:
solver_factory = lambda: CGP('TEMP_CGP', n_it=25)

Solve ~~and store for later reuse~~ and play the solution. The statement `with` ensure that the solver is properly cleaned after use.

*NB: CGP derive from Restorable but `_load()` is not yet implemented.*

In [None]:
with solver_factory() as solver:
    GymDomain.solve_with(solver, domain_factory)
    rollout(domain, solver, num_episodes=2, max_steps=1000, max_framerate=30, outcome_formatter=None)