## Ray RLlib and PPO

PPO of Proximal Policy Optimization is a more powerful (and more complicated) algorithm than the DQN we've looked at.

But thanks to Ray's implementations, you can swap it in easily.

Note that we import `ppo` from `ray.rllib.agents`

By replacing "DQN" with "PPO" you can quickly get better results.

>
> Interested in PPO details? Check out this writeup: https://jonathan-hui.medium.com/rl-proximal-policy-optimization-ppo-explained-77f014ec3f12
>

In [None]:
from ray.rllib.algorithms.ppo import PPOConfig

config = (  # 1. Configure the algorithm,
    PPOConfig()
    .environment("CartPole-v1")
    .rollouts(num_rollout_workers=2)
    .framework("torch")
    .training(model={"fcnet_hiddens": [64, 64]})
    .evaluation(evaluation_num_workers=1)
)

algo = config.build()  # 2. build the algorithm,

# for _ in range(5):
#    print(algo.train())  # 3. train it,

# algo.evaluate()  # 4. and evaluate it.

In [None]:
fmt = '{:3d},{:8.4f},{:8.4f},{:8.4f}'
last_checkpoint = ''
for n in range(3):
    result = algo.train()
    min  = result['episode_reward_min']
    mean = result['episode_reward_mean']
    max  = result['episode_reward_max']
    #last_checkpoint = trainer.save(checkpoint_dir)
    print(fmt.format(n, min, mean, max))
print(f'last checkpoint file: {last_checkpoint}')