## Classic Control

### MountainCar Continuous (v0)

_An underpowered car must climb a one-dimensional hill to reach a target.
Unlike MountainCar v0, the action (engine force applied) is allowed to be a continuous value._

_The target is on top of a hill on the right-hand side of the car. 
If the car reaches it or goes beyond, the episode terminates._

_On the left-hand side, there is another hill. 
Climbing this hill can be used to gain potential energy and accelerate towards the target. 
On top of this second hill, the car cannot go further than a position equal to -1, as if there was a wall. 
Hitting this limit does not generate a penalty (it might in a more challenging version)._

[Environment Source code (GitHub)](https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py)

In [1]:
from IPython.display import Markdown, display, clear_output

In [2]:
import engine
import envs

In [3]:
env = engine.instantiate(envs.MOUNTAIN_CAR_CONTINUOUS)

print("Observation space type:", type(env.observation_space))
print("Observation space size:", env.observation_space.shape, "*", env.observation_space.dtype)
print("Observation space max values:", env.observation_space.high)
print("Observation space min values:", env.observation_space.low)

display(Markdown('---'))

print("Action space type:", env.action_space)

display(Markdown('---'))

print("Reward range:", env.reward_range)

Observation space type: <class 'gym.spaces.box.Box'>
Observation space size: (2,) * float32
Observation space max values: [0.6  0.07]
Observation space min values: [-1.2  -0.07]


---

Action space type: Box([-1.], [1.], (1,), float32)


---

Reward range: (-inf, inf)


### Observation Space - _Box(2)_

| Index | Observation Type | Min Value | Max Value |
|:-----:|------------------|:---------:|:---------:|
|   0   | Car Position     |   -1.2    |    0.6    |
|   1   | Car Velocity     |   -0.07   |   0.07    |

### Action Space - _Box(1)_

| Action | Action Type           | Min Value | Max Value |
|:------:|-----------------------|:---------:|:---------:|
|   0    | The Power Coefficient |   -1.0    |    1.0    |

_**Note**: Actual driving force is calculated by multiplying the power coefficient by power (0.0015)._

### Reward
Reward of 100 is awarded if the agent reached the flag (position = 0.45) on top of the mountain.
Reward is decreased based on amount of energy consumed per each step.

### Starting State
The position of the car is assigned a uniform random value in [-0.6, -0.4].
The starting velocity of the car is always assigned to 0.

### Termination Conditions
1) The Car position is more than 0.45
2) Episode length is greater than 200

### Solved Requirements
Considered solved reward is over 90.

In [4]:
from agents import RandomAgent

agent = RandomAgent(env)

In [5]:
for i in range(5):
    clear_output(wait=True)
    reward = engine.run(env, agent)
    print("Iteration", i, "->", "Reward", reward)

Iteration 4 -> Reward -34.19517459694104
