# 🚕 The Taxi Problem (Taxi-v3) 🚕

## Description:
There are four designated locations in the grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the episode starts, the taxi starts off at a random square and the passenger is at a random location. The taxi drives to the passenger's location, picks up the passenger, drives to the passenger's destination (another one of the four specified locations), and then drops off the passenger. Once the passenger is dropped off, the episode ends.

[Openai Link](https://gym.openai.com/envs/Taxi-v3/)  
[Github Link](https://github.com/openai/gym/blob/master/gym/envs/toy_text/taxi.py)

## Observations:
There are 500 discrete states since there are 25 taxi positions, 5 possible locations of the passenger (including the case when the passenger is in the taxi), and 4 destination locations. 

## Passenger locations:
- 0: R(ed)
- 1: G(reen)
- 2: Y(ellow)
- 3: B(lue)
- 4: in taxi

## Destinations:
- 0: R(ed)
- 1: G(reen)
- 2: Y(ellow)
- 3: B(lue)

## Actions:
There are 6 discrete deterministic actions:
- 0: move south
- 1: move north
- 2: move east
- 3: move west
- 4: pickup passenger
- 5: drop off passenger

## Import Packages

In [1]:
import gym
import numpy as np
import pandas as pd
from pylab import plt
from IPython import display
plt.style.use('seaborn')
np.random.seed(100)
import warnings; warnings.simplefilter('ignore')

In [2]:
def set_seeds(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    env.seed(seed)

## Environment

In [3]:
env = gym.make('Taxi-v3')

## Action Space

In [4]:
env.action_space  # type of action space

Discrete(6)

In [5]:
env.action_space.n  # number of discrete actions

6

In [6]:
env.action_space.sample()  # sample action

3

In [7]:
[env.action_space.sample() for _ in range(10)]

[4, 0, 2, 4, 2, 5, 5, 2, 4, 0]

## Observation Space

In [8]:
np.set_printoptions(precision=4, suppress=True)

In [9]:
env.observation_space  # type of observation space

Discrete(500)

In [10]:
o = env.reset()
o

248

In [11]:
env.reset()

402

## Taking Action

The **blue** letter represents the current passenger pick-up location, and the **purple** letter is the current destination.

In [12]:
env.render()

+---------+
|[34;1mR[0m: | : :G|
| : | : : |
| : : : : |
| | : | : |
|[35m[43mY[0m[0m| : |B: |
+---------+



In [13]:
a = env.action_space.sample()  # random action
a

2

In [14]:
r = env.step(a)  # taking action, capturing new observations
r  # (observation, reward, done, info)

(402, -1, False, {'prob': 1.0})

In [15]:
env.render()

+---------+
|[34;1mR[0m: | : :G|
| : | : : |
| : : : : |
| | : | : |
|[35m[43mY[0m[0m| : |B: |
+---------+
  (East)


In [16]:
env.step(1)

(302, -1, False, {'prob': 1.0})

In [17]:
env.render()

+---------+
|[34;1mR[0m: | : :G|
| : | : : |
| : : : : |
|[43m [0m| : | : |
|[35mY[0m| : |B: |
+---------+
  (North)


In [18]:
env.step(1)

(202, -1, False, {'prob': 1.0})

In [19]:
env.render()

+---------+
|[34;1mR[0m: | : :G|
| : | : : |
|[43m [0m: : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (North)
