# Reinforcement Learning Taxi Tutorial

This notebook is some code I wrote to help myself understand Q tables in reinforcement learning and to experiment with OpenAI's gym package. It walks through a class I built to solve the taxi problem within gym using RL code from 
[this](https://www.oreilly.com/learning/introduction-to-reinforcement-learning-and-openai-gym) O'Reilly post/tutorial. Some code is also taken from the O'Reilly blog post and expanded upon in the Taxi_Tutorial class.

In [123]:
#Import necessary packages.
import gym
import time #let me pause steps when I want
import numpy as np

In [165]:
env = gym.make("Taxi-v2") #initiate to taxi environment

In [166]:
#code defining tutorial class, a few methods below run it.
class Taxi_Tutorial():
    '''
    Small class to package all Taxi Tutorial learnings together.
    '''
    def __init__(self):
        '''
        Sets up attributes for the class including the empty Q table,
        number of possible states in the environment, number of actions,
        what those actions are, and a learning rate.
        '''
        self.number_of_states = env.observation_space.n
        self.number_of_actions = env.action_space.n
        self.action_dictionary = {
                            0:'up',
                            1:'right',
                            2:'down',
                            3:'left',
                            4:'dropoff',
                            5:'pickup'
                        }
        self.Q = np.zeros([env.observation_space.n, env.action_space.n])
        self.alpha = 0.618
        
    def random_solve(self):
        '''
        Showcases the simplest solution to this environment, just randomly 
        guessing actions from the dictionary and prints out the number of 
        steps it took to complete it. Also adds that number to a new 
        class attribute.
        '''
        state = env.reset()
        steps = 0
        reward = None
        while reward != 20:
            state, reward, done, info = env.step(env.action_space.sample())
            steps += 1
        print(f'Total steps for random solver was {steps}')
        self.last_step_count = steps
    
    def train(self, verbose=True):
        '''
        Main loop for this class that updates the Q table using the Q learning
        formula you'll see on many reinforcement learning pages. See it here if
        curious: https://commons.wikimedia.org/wiki/File:Q-l%C3%A6ring_formel_1.png
        
        Output: the self.Q table attribute has values to quickly complete the problem.
        '''
        for episode in range(1,1001):
            done = False
            G, reward = 0, 0
            state = env.reset()
            while done != True:
                    action = np.argmax(self.Q[state]) #1
                    state2, reward, done, info = env.step(action) #2
                    self.Q[state,action] += alpha * (reward + np.max(self.Q[state2]) - self.Q[state,action]) #3
                    G += reward
                    state = state2   
            if episode % 50 == 0 and verbose == True:
                print('Episode {} Total Reward: {}'.format(episode,G))
    
    def show_solution(self, seconds_paused=2):
        '''
        Run only after class.train() has been called. This will print out step by step
        the path the RL model takes after its Q table has been updated by the train()
        method. 
        
        INPUT: Optional: Number of seconds to pause between steps.
        OUTPUT: None, prints all values.
        '''
        initial_state = env.reset()
        state, reward, done, info = env.step(np.argmax(self.Q[initial_state, :]))
        #loop
        tot_reward = 0
        while done != True:
            print(f'Q table values currently are {self.Q[state, :]}') #examine q table for that state
            state, reward, done, info = env.step(np.argmax(self.Q[state, :])) #move argmax forward
            tot_reward += reward
            print(f'reward is {reward} and total is {tot_reward}')
            print(env.render())
            time.sleep(seconds_paused)
            
    def count_solution_steps(self):
        '''
        Run only after class.train(). Will create a random starting state for the 
        problem and then print new number of steps to complete challenge.
        '''
        initial_state = env.reset() #create random starting point.
        state, reward, done, info = env.step(np.argmax(self.Q[initial_state, :]))
        #loop
        steps = 0
        while done != True:
            state, reward, done, info = env.step(np.argmax(self.Q[state, :])) #move argmax forward
            steps += 1
        print (f'Total steps for trained solver was {steps}')

In [164]:
tt = Taxi_Tutorial()
tt.random_solve()
tt.train()
tt.count_solution_steps()

Total steps for random solver was 1277
Total steps for trained solver was 12
