# Exploring gym environment
[useful ref](https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/)
>     Passenger locations:
    - 0: R(ed)
    - 1: G(reen)
    - 2: Y(ellow)
    - 3: B(lue)
    - 4: in taxi (Taxi is yellow)
    Destinations:
    - 0: R(ed)
    - 1: G(reen)
    - 2: Y(ellow)
    - 3: B(lue)
    Actions:
    There are 6 discrete deterministic actions:
    - 0: move south
    - 1: move north
    - 2: move east
    - 3: move west
    - 4: pickup passenger
    - 5: drop off passenger

> Summary
    5x5x5x4 = 500 possible states

` Each step in environment returns
    Observation
    Reward
    done: If step resulted in drop off / pick up -> episode
    Info`

> Blue letter : Current passenger pick up locations
  Purple letter is current destination

In [1]:
import gym

In [2]:
env = gym.make('Taxi-v3')
env.render()

+---------+
|[35mR[0m: | : :G|
| : | : :[43m [0m|
| : : : : |
| | : | : |
|[34;1mY[0m| : |B: |
+---------+



## States and Actions

In [3]:
print("Actions {}".format(env.action_space))
print("States {}".format(env.observation_space))

Actions Discrete(6)
States Discrete(500)


## Rendering different states
### Random steps


In [161]:
# reset to get random initial state
env.reset()
result = env.step(1) # step env by 1 timestep, returns: <observation, reward, done, info>
env.render()

+---------+
|[35mR[0m: | : :[34;1mG[0m|
| :[43m [0m| : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (North)


### Trying out a move
> set random positions
 

In [162]:
env.s = env.action_space.sample()
print(env.s)
env.render()


0
+---------+
|[35mR[0m: | : :[34;1mG[0m|
| :[43m [0m| : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (North)


Move south and visualize state

In [166]:
# set position for taxi row, col, passenger, destination
# Location of passenger: Yellow, destination is Green, Taxi in row 0, col 1
newState = env.step(0)
env.s = newState
print(env.s)
env.render()

(124, -1, False, {'prob': 1.0})
+---------+
|[35mR[0m: | : :[34;1mG[0m|
| :[43m [0m| : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
  (South)


Take random action and visualize

In [182]:
from IPython.display import clear_output
from time import sleep
import random
env.reset()
ticks = 0
done = False
while not done:    
    k = random.randint(0,3)
    newState = env.step(k)
    print(newState)
    env.render()    
    sleep(.2)
    clear_output(wait=True)
    ticks += 1
    if ticks == 100:
        done = True

(368, -1, False, {'prob': 1.0})
+---------+
|[35mR[0m: | : :G|
| : | : : |
| : : : : |
| | : |[43m [0m: |
|[34;1mY[0m| : |B: |
+---------+
  (South)


Reward table for the final state from above

In [184]:

print(newState[0])
env.P[newState[0]]

368


{0: [(1.0, 468, -1, False)],
 1: [(1.0, 268, -1, False)],
 2: [(1.0, 388, -1, False)],
 3: [(1.0, 368, -1, False)],
 4: [(1.0, 368, -10, False)],
 5: [(1.0, 368, -10, False)]}

`{action: [(probability, nextstate, reward, done)]}`