# Pendulum-v0
The inverted pendulum swingup problem is a classic problem in the control literature. In this version of the problem, the pendulum starts in a random position, and the goal is to swing it up so it stays upright.
https://gym.openai.com/envs/Pendulum-v0/

https://github.com/openai/gym/blob/master/gym/envs/classic_control/pendulum.py

### Action Space
The action is a `ndarray` with shape `(1,)` representing the torque applied to free end of the pendulum. Min = -2.0, Max = 2.0

### Observation Space
The observation is a `ndarray` with shape `(3,)` representing the x-y coordinates of the pendulum's free end and its angular velocity.
x = cos(theta) (-1.0, 1.0), y = sin(angle) (-1.0, 1.0), Angular Velocity (-8.0, 8.0)

### Starting State
The starting state is a random angle in *[-pi, pi]* and a random angular velocity in *[-1,1]*.

### Episode Termination
The episode terminates at 200 time steps.

## Loading

In [1]:
# Standard Libraries
import os
from pathlib import Path

# Third party libraries
import gym
import numpy as np
from stable_baselines3 import A2C, SAC, PPO, TD3
from stable_baselines3.ppo.policies import MlpPolicy #  MlpPolicy because the observation of the CartPole task is a feature vector, not images.
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.monitor import Monitor 
from stable_baselines3.common.env_util import make_vec_env

# Local imports
from utils import file_exists, evaluate_model, eval_env_random_actions



## Initialization

In [2]:
env_name = 'Pendulum-v0'
# env = gym.make(env_name)
env = make_vec_env(env_name, n_envs=4, seed=0)

# Create folder to save models
directory_path = 'models'
Path(directory_path).mkdir(parents=True, exist_ok=True)

In [3]:
print(env.action_space)
print(env.observation_space)

Box([-2.], [2.], (1,), float32)
Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)


In [4]:
eval_env_random_actions(env)

2022-03-15 08:37:35.307 Python[61711:2039890] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to (null)


Episode: 1
	Score: -1797.0309192717953
Episode: 2
	Score: -1285.3918029458525
Episode: 3
	Score: -1527.9164403148561
Episode: 4
	Score: -986.1787407837879
Episode: 5
	Score: -1743.828233613667
Episode: 6
	Score: -924.1606614664096
Episode: 7
	Score: -1605.583462796717
Episode: 8
	Score: -1584.9500539956225
Episode: 9
	Score: -898.4340564061871

		Mean reward: -1372.608263510544 Num episodes: 10


## Create model

In [5]:
model = SAC('MlpPolicy', env, train_freq=1, gradient_steps=2, verbose=0)
evaluate_model(model, num_episodes=100)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Mean reward: -1389.8263 Num episodes: 100


-1389.8263

# Training

In [6]:
num_steps = 100_000
model.learn(num_steps)

# Evaluate
### Rewards
# The minimum reward that can be obtained is -16.2736044, while the maximum reward is zero (pendulum is upright with zero velocity and no torque applied).

mean_reward, std_reward = evaluate_policy(model, Monitor(env), n_eval_episodes=100)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

----------------------------------
| rollout/           |           |
|    ep_len_mean     | 200       |
|    ep_rew_mean     | -1.62e+03 |
| time/              |           |
|    episodes        | 4         |
|    fps             | 20        |
|    time_elapsed    | 39        |
|    total_timesteps | 800       |
| train/             |           |
|    actor_loss      | 48.7      |
|    critic_loss     | 0.108     |
|    ent_coef        | 0.663     |
|    ent_coef_loss   | -0.617    |
|    learning_rate   | 0.0003    |
|    n_updates       | 1398      |
----------------------------------
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 200       |
|    ep_rew_mean     | -1.45e+03 |
| time/              |           |
|    episodes        | 8         |
|    fps             | 18        |
|    time_elapsed    | 87        |
|    total_timesteps | 1600      |
| train/             |           |
|    actor_loss      | 92        |
|    critic_loss    

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -377     |
| time/              |          |
|    episodes        | 64       |
|    fps             | 17       |
|    time_elapsed    | 722      |
|    total_timesteps | 12800    |
| train/             |          |
|    actor_loss      | 60       |
|    critic_loss     | 0.914    |
|    ent_coef        | 0.0655   |
|    ent_coef_loss   | -0.124   |
|    learning_rate   | 0.0003   |
|    n_updates       | 25398    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -362     |
| time/              |          |
|    episodes        | 68       |
|    fps             | 17       |
|    time_elapsed    | 764      |
|    total_timesteps | 13600    |
| train/             |          |
|    actor_loss      | 52       |
|    critic_loss     | 0.702    |
|    ent_coef 

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -157     |
| time/              |          |
|    episodes        | 124      |
|    fps             | 18       |
|    time_elapsed    | 1363     |
|    total_timesteps | 24800    |
| train/             |          |
|    actor_loss      | 35.1     |
|    critic_loss     | 0.7      |
|    ent_coef        | 0.0327   |
|    ent_coef_loss   | -0.519   |
|    learning_rate   | 0.0003   |
|    n_updates       | 49398    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -156     |
| time/              |          |
|    episodes        | 128      |
|    fps             | 18       |
|    time_elapsed    | 1405     |
|    total_timesteps | 25600    |
| train/             |          |
|    actor_loss      | 27.4     |
|    critic_loss     | 0.515    |
|    ent_coef 

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -157     |
| time/              |          |
|    episodes        | 184      |
|    fps             | 18       |
|    time_elapsed    | 1973     |
|    total_timesteps | 36800    |
| train/             |          |
|    actor_loss      | 26       |
|    critic_loss     | 0.397    |
|    ent_coef        | 0.0345   |
|    ent_coef_loss   | 0.0634   |
|    learning_rate   | 0.0003   |
|    n_updates       | 73398    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -161     |
| time/              |          |
|    episodes        | 188      |
|    fps             | 18       |
|    time_elapsed    | 2008     |
|    total_timesteps | 37600    |
| train/             |          |
|    actor_loss      | 24.5     |
|    critic_loss     | 0.336    |
|    ent_coef 

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -157     |
| time/              |          |
|    episodes        | 244      |
|    fps             | 19       |
|    time_elapsed    | 2491     |
|    total_timesteps | 48800    |
| train/             |          |
|    actor_loss      | 21.1     |
|    critic_loss     | 0.346    |
|    ent_coef        | 0.0188   |
|    ent_coef_loss   | 0.995    |
|    learning_rate   | 0.0003   |
|    n_updates       | 97398    |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -159     |
| time/              |          |
|    episodes        | 248      |
|    fps             | 19       |
|    time_elapsed    | 2526     |
|    total_timesteps | 49600    |
| train/             |          |
|    actor_loss      | 16.6     |
|    critic_loss     | 0.8      |
|    ent_coef 

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -148     |
| time/              |          |
|    episodes        | 304      |
|    fps             | 19       |
|    time_elapsed    | 3044     |
|    total_timesteps | 60800    |
| train/             |          |
|    actor_loss      | 18.8     |
|    critic_loss     | 0.415    |
|    ent_coef        | 0.0198   |
|    ent_coef_loss   | 0.00761  |
|    learning_rate   | 0.0003   |
|    n_updates       | 121398   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -148     |
| time/              |          |
|    episodes        | 308      |
|    fps             | 19       |
|    time_elapsed    | 3082     |
|    total_timesteps | 61600    |
| train/             |          |
|    actor_loss      | 18.1     |
|    critic_loss     | 0.317    |
|    ent_coef 

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -140     |
| time/              |          |
|    episodes        | 364      |
|    fps             | 20       |
|    time_elapsed    | 3573     |
|    total_timesteps | 72800    |
| train/             |          |
|    actor_loss      | 14.9     |
|    critic_loss     | 0.312    |
|    ent_coef        | 0.018    |
|    ent_coef_loss   | -0.115   |
|    learning_rate   | 0.0003   |
|    n_updates       | 145398   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -139     |
| time/              |          |
|    episodes        | 368      |
|    fps             | 20       |
|    time_elapsed    | 3608     |
|    total_timesteps | 73600    |
| train/             |          |
|    actor_loss      | 16.8     |
|    critic_loss     | 0.265    |
|    ent_coef 

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -135     |
| time/              |          |
|    episodes        | 424      |
|    fps             | 20       |
|    time_elapsed    | 4097     |
|    total_timesteps | 84800    |
| train/             |          |
|    actor_loss      | 15.1     |
|    critic_loss     | 0.345    |
|    ent_coef        | 0.0172   |
|    ent_coef_loss   | 0.251    |
|    learning_rate   | 0.0003   |
|    n_updates       | 169398   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -135     |
| time/              |          |
|    episodes        | 428      |
|    fps             | 20       |
|    time_elapsed    | 4133     |
|    total_timesteps | 85600    |
| train/             |          |
|    actor_loss      | 18.8     |
|    critic_loss     | 0.206    |
|    ent_coef 

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -142     |
| time/              |          |
|    episodes        | 484      |
|    fps             | 20       |
|    time_elapsed    | 4620     |
|    total_timesteps | 96800    |
| train/             |          |
|    actor_loss      | 12.5     |
|    critic_loss     | 0.252    |
|    ent_coef        | 0.0176   |
|    ent_coef_loss   | -0.666   |
|    learning_rate   | 0.0003   |
|    n_updates       | 193398   |
---------------------------------
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -141     |
| time/              |          |
|    episodes        | 488      |
|    fps             | 20       |
|    time_elapsed    | 4655     |
|    total_timesteps | 97600    |
| train/             |          |
|    actor_loss      | 15.8     |
|    critic_loss     | 0.494    |
|    ent_coef 

# Save 

In [7]:
model_file_name = Path(directory_path, env_name + '_' + str(num_steps))
model.save(model_file_name)

In [8]:
evaluate_model(model, num_episodes=100)

Mean reward: -154.47655 Num episodes: 100


-154.47655