# WindyHighway

This notebook demonstrates the use of the WindyHighway class, which provides the game as described in the Policy Gradient Methods video (Part 6) and implements REINFORCE (optionally with baseline) as a solution.

In [1]:
from high_reward_highway import WindyHighway
import numpy as np

### Specify model hyperparameters and other configurations

In [2]:
protos_per_dim = 7
distance_scaler = .2
alpha = .02 # If this is large, we can get stuck in deterministic strategies. We'll be exploiting too early.
n_iters = 1200
seed = 0

### WindyHighway with no baseline and no reward shift.

First, initialize and run the algorithm for `n_iters` episodes.

In [3]:
WH_no_reward_shift =  WindyHighway(alpha=alpha, distance_scaler = distance_scaler, protos_per_dim=protos_per_dim, seed=seed, with_baseline=False, reward_shift=0)
WH_no_reward_shift.run(n_iters)
ave_return = sum(WH_no_reward_shift.G0s[-100:])/100
print(f'\nAve Return of most recent 30 eps: {ave_return:.2f}\n')

100%|█████████████████████████████████████████████████████████████████████████████| 1200/1200 [00:04<00:00, 285.96it/s]


Ave Return of most recent 30 eps: 16.20






Next, inspect the results. You can use `help()` to see the docstring to get a better understanding of the plotting methods (e.g. `help(WH_no_reward_shift.plot_returns)`).

### Plot the G0 returns for each episode with a 100-running-average-window

In [4]:
WH_no_reward_shift.plot_returns(window=100)

  for col_name, dtype in df.dtypes.iteritems():


In [5]:
WH_no_reward_shift.plot_trajectory()

In [6]:
WH_no_reward_shift.plot_policy_probs(8, width=600, height=600)

  for col_name, dtype in df.dtypes.iteritems():


### To Run with a Baseline

To do so, run the same thing as above, except set `with_baseline = False`. There are other things you may want to configure. E.g. you could use the below config.

In [7]:
protos_per_dim_value = 5
distance_scaler_value = .2
alpha_value = .05 # Should be larger than alpha, since the baseline value needs to be learned before actions can be learned.
with_baseline = False