## Render Taxi Environment using Learned Q-Table

First we import the learned Q-Table from 'Taxi_QLearning.py' and have the agent use it to decide how to act. We create a 'frames' list and store the rendered scene at every timestep. We create a 'run' function that can execute N episodes, returning the frames stored during those episodes.

In [7]:
import pickle
import gym
import numpy as np

env = gym.make("Taxi-v2").env

# Start by loading in a learned Q-Table from training
q_table = pickle.load(open("q_table.p", "rb"))

frames = [] # for animation

def run(num_episodes):
    for i in range(num_episodes):
        state = env.reset()
        epochs, penalties, reward = 0, 0, 0
        
        done = False
        
        while not done:
            action = np.argmax(q_table[state])
            state, reward, done, info = env.step(action)
    
            if reward == -10:
                penalties += 1
                
            # Put each rendered frame into dict for animation
            frames.append({
                'frame': env.render(mode='ansi'),
                'state': state,
                'action': action,
                'reward': reward
                }
            )
    
            epochs += 1
            if epochs >= 5000: # Cap an episode at 5k timesteps
                break
    return frames

Next we define a function that can playback the scenes from the list of frames.

In [10]:
from IPython.display import clear_output
from time import sleep

def print_frames(frames):
    for i, frame in enumerate(frames):
        clear_output(wait=True)
        print(frame['frame'].getvalue())
        print("Timestep: {}".format(i+1))
        print("State: {}".format(frame['state']))
        print("Action: {}".format(frame['action']))
        print("Reward: {}".format(frame['reward']))
        sleep(.1)

Now we can call 'run()' and use the returned value to playback what the agent did.

In [11]:
f = run(10) # run ten episodes
print_frames(f) # play them all back

+---------+
|[35m[42mR[0m[0m: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (Dropoff)

Timestep: 362
State: 16
Action: 5
Reward: 20
