## Classic Control

### CartPole (v1)

_A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track.  
The system is controlled by applying a force of +1 or -1 to the cart.  
The pendulum starts upright, and the goal is to prevent it from falling over.  
A reward of +1 is provided for every step that the pole remains upright.  
The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center._

[Environment Source code (GitHub)](https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py)

In [1]:
from IPython.display import Markdown, display

In [2]:
import engine
import envs

In [3]:
from utils import log

logger = log.setup_logger()

In [4]:
env = engine.instantiate(envs.CART_POLE)

print("Observation space type:", type(env.observation_space))
print("Observation space size:", env.observation_space.shape, "*", env.observation_space.dtype)
print("Observation space max values:", env.observation_space.high)
print("Observation space min values:", env.observation_space.low)

display(Markdown('---'))

print("Action space type:", env.action_space)

display(Markdown('---'))

print("Reward range:", env.reward_range)

2022-05-22 00:45:05,949 - INFO - engine - New environment created - CartPole-v1 


Observation space type: <class 'gym.spaces.box.Box'>
Observation space size: (4,) * float32
Observation space max values: [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
Observation space min values: [-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


---

Action space type: Discrete(2)


---

Reward range: (-inf, inf)


### Observation Space - _Box(4)_

| Index | Observation Type     | Min Value           | Max Value         |
| :---: | -------------------- | :-----------------: | :---------------: |
| 0     | Cart Position        | -4.8                | 4.8               |
| 1     | Cart Velocity        | _-Inf_              | _Inf_             |
| 2     | Pole Angle           | -0.41887 rad (-24°) | 0.41887 rad (24°) |
| 3     | Pole Velocity At Tip | _-Inf_              | _Inf_             |

### Action Space - _Discrete(2)_

| Action | Action Type            |
|:------:| ---------------------- |
|   0    | Push Cart to the Left  |
|   1    | Push Cart to the Right |

### Reward
Reward is 1 for every step taken, including the termination step. The threshold is 475.

### Starting State
Observation space is initialized to the uniform random values between ±0.05: `env.reset()` method.
![image](https://user-images.githubusercontent.com/53531617/148705742-ec6e0832-1f2d-422b-adce-9f1af5a264dc.png)

### Termination Conditions
1) Pole Angle is more than ±12°  
2) Cart Position is more than ±2.4 (center of the cart reaches the edge of the display)  
3) Episode length is greater than 200 (500 for v1)  

### Solved Requirements
Considered solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials.

## Random Agent

In [5]:
from agents import RandomAgent

agent = RandomAgent(env)

In [6]:
engine.run_episode(env, agent, train=False, render=True)

36.0

## Actor - Critic Agent

In [7]:
import torch.optim as optim

from agents import ActorCriticAgent

agent = ActorCriticAgent(env)
optimizer = optim.Adam(agent.parameters(), lr=3e-2)

In [8]:
engine.run(env, agent, optimizer=optimizer, train=True, verbose=True, remember_rewards=True, clear_output=True, render=False)

2022-05-22 00:47:33,544 - INFO - engine - Solved after 352 episodes -> 475.68605369556013
2022-05-22 00:47:33,546 - INFO - engine - Terminating


In [9]:
from utils import save_model

save_model(agent, env='carpole', gamma=0.99, lr='3e-2')

2022-05-22 00:48:00,301 - INFO - serialization - Saving model at location D:\Programming\retro-ai\models\ActorCriticAgent_env=carpole_gamma=0.99_lr=3e-2_v=2.pickle


In [10]:
from utils import load_model

agent = load_model('ActorCriticAgent', env='carpole', gamma=0.99, lr='3e-2')

In [11]:
engine.run_episode(env.env, agent, train=False, render=True)


KeyboardInterrupt: 