# RLlib Sample Application: FrozenLake-v0

This example uses [RLlib](https://ray.readthedocs.io/en/latest/rllib.html) to trains a policy with the `FrozenLake-v0` environment:

  - <https://gym.openai.com/envs/FrozenLake-v0/>

For more background about this problem, see:

  - ["Introduction to Reinforcement Learning: the Frozen Lake Example"](https://reinforcementlearning4.fun/2019/06/09/introduction-reinforcement-learning-frozen-lake-example/)  
[Rodolfo Mendes](https://twitter.com/rodmsmendes)
  - ["Gym Tutorial: The Frozen Lake"](https://reinforcementlearning4.fun/2019/06/16/gym-tutorial-frozen-lake/)  
[Rodolfo Mendes](https://twitter.com/rodmsmendes)
  
---

First, let's make sure that Ray and RLlib are installed, as well as Gym…

In [None]:
!pip install ray[rllib]
!pip install gym

Then start Ray…

In [2]:
import ray
import ray.rllib.agents.ppo as ppo

ray.shutdown()
ray.init(ignore_reinit_error=True)

2020-07-06 13:52:49,950	INFO resource_spec.py:212 -- Starting Ray with 3.27 GiB memory available for workers and up to 1.65 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 13:52:50,569	INFO services.py:1165 -- View the Ray dashboard at [1m[32mlocalhost:8266[39m[22m


{'node_ip_address': '192.168.1.65',
 'raylet_ip_address': '192.168.1.65',
 'redis_address': '192.168.1.65:24796',
 'object_store_address': '/tmp/ray/session_2020-07-06_13-52-49_937536_84181/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-07-06_13-52-49_937536_84181/sockets/raylet',
 'webui_url': 'localhost:8266',
 'session_dir': '/tmp/ray/session_2020-07-06_13-52-49_937536_84181'}

After a successful launch, the Ray dashboard will be running on a local port:

In [3]:
print("Dashboard URL: http://{}".format(ray.get_webui_url()))

Dashboard URL: http://localhost:8266


Open that URL in another tab to view the Ray dashboard as the example runs. We'll also set up a checkpoint location to store the trained policy:

In [8]:
import os
import shutil

CHECKPOINT_ROOT = "tmp/ppo/froz"
shutil.rmtree(CHECKPOINT_ROOT, ignore_errors=True, onerror=None)

ray_results = os.getenv("HOME") + "/ray_results/"
shutil.rmtree(ray_results, ignore_errors=True, onerror=None)

Next we'll train an RLlib policy with the `FrozenLake-v0` environment <https://gym.openai.com/envs/FrozenLake-v0/>

In [9]:
SELECT_ENV = "FrozenLake-v0"

config = ppo.DEFAULT_CONFIG.copy()
config["log_level"] = "WARN"

agent = ppo.PPOTrainer(config, env=SELECT_ENV)



By default, training runs for `10` iterations. Increase the `N_ITER` setting if you want to see the resulting rewards improve.

In [10]:
N_ITER = 10
s = "{:3d} reward {:6.2f}/{:6.2f}/{:6.2f} len {:6.2f} saved {}"

for n in range(N_ITER):
    result = agent.train()
    file_name = agent.save(CHECKPOINT_ROOT)

    print(s.format(
        n + 1,
        result["episode_reward_min"],
        result["episode_reward_mean"],
        result["episode_reward_max"],
        result["episode_len_mean"],
        file_name
        ))

  1 reward   0.00/  0.02/  1.00 len   7.83 saved tmp/ppo/froz/checkpoint_1/checkpoint-1
  2 reward   0.00/  0.02/  1.00 len   7.40 saved tmp/ppo/froz/checkpoint_2/checkpoint-2
  3 reward   0.00/  0.02/  1.00 len   7.21 saved tmp/ppo/froz/checkpoint_3/checkpoint-3
  4 reward   0.00/  0.03/  1.00 len   7.36 saved tmp/ppo/froz/checkpoint_4/checkpoint-4
  5 reward   0.00/  0.03/  1.00 len   7.26 saved tmp/ppo/froz/checkpoint_5/checkpoint-5
  6 reward   0.00/  0.05/  1.00 len   7.57 saved tmp/ppo/froz/checkpoint_6/checkpoint-6
  7 reward   0.00/  0.05/  1.00 len   7.82 saved tmp/ppo/froz/checkpoint_7/checkpoint-7
  8 reward   0.00/  0.07/  1.00 len   7.42 saved tmp/ppo/froz/checkpoint_8/checkpoint-8
  9 reward   0.00/  0.07/  1.00 len   7.87 saved tmp/ppo/froz/checkpoint_9/checkpoint-9
 10 reward   0.00/  0.09/  1.00 len   8.84 saved tmp/ppo/froz/checkpoint_10/checkpoint-10


Do the episode rewards increase after multiple iterations?
That shows whether the policy is improving.

Also, print out the policy and model to see the results of training in detail…

In [11]:
policy = agent.get_policy()
model = policy.model
print(model.base_model.summary())

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
observations (InputLayer)       [(None, 16)]         0                                            
__________________________________________________________________________________________________
fc_1 (Dense)                    (None, 256)          4352        observations[0][0]               
__________________________________________________________________________________________________
fc_value_1 (Dense)              (None, 256)          4352        observations[0][0]               
__________________________________________________________________________________________________
fc_2 (Dense)                    (None, 256)          65792       fc_1[0][0]                       
______________________________________________________________________________________________

Next we'll use the [`rollout` script](https://ray.readthedocs.io/en/latest/rllib-training.html#evaluating-trained-policies) to evaluate the trained policy.

This visualizes the "character" agent operating within the simulation: trying to find a walkable path to a goal tile.

In [12]:
! rllib rollout \
    tmp/ppo/froz/checkpoint_10/checkpoint-10 \
    --config "{\"env\": \"FrozenLake-v0\"}" \
    --run PPO \
    --steps 2000

2020-07-06 13:56:12,503	INFO resource_spec.py:212 -- Starting Ray with 3.47 GiB memory available for workers and up to 1.76 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-07-06 13:56:13,156	INFO services.py:1165 -- View the Ray dashboard at [1m[32mlocalhost:8267[39m[22m
2020-07-06 13:56:13,480	INFO trainer.py:585 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
2020-07-06 13:56:13,481	INFO trainer.py:612 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
2020-07-06 13:56:19,194	INFO trainable.py:423 -- Restored on 192.168.1.65 from checkpoint: tmp/ppo/froz/checkpoint_10/checkpoint-10
2020-07-06 13:56:19,194	INFO trainable.py:430 -- Current state after restoring: {'_iteration': 10, '_timesteps_total': None, '_time_total': 44.25560688972473, '_episodes_total': 5237}
  (Down)
[41mS[0mFFF
FHFH
FFFH
HFFG
  (Right)
[41mS[0m

The rollout uses the second saved checkpoint, evaluated through `2000` steps.
Modify the path to view other checkpoints.

---

Finally, launch [TensorBoard](https://ray.readthedocs.io/en/latest/rllib-training.html#getting-started) then follow the instructions (copy/paste the URL it generates) to visualize key metrics from training with RLlib…

In [None]:
!pip install tensorflow
!tensorboard --logdir=$HOME/ray_results/