# Reinforcement Learning Sprint Challenge - play Taxi

For the sprint challenge, we will apply the techniques we have learned to play
[Taxi](https://gym.openai.com/envs/Taxi-v2/), an environment in the OpenAI Gym.
In this task the agent controls a taxi that can navigate between four locations.
The goal is to pick up a passenger from one location and drop them off to
another. You get 20 points for each successful drop off, but lose 1 point for
each step you take, and additionally there is a 10 point penalty for illegal
pick-up/drop-off actions.

You can create the environment and watch a random agent play with this code:

```python
import gym

env = gym.make('Taxi-v2')
state = env.reset()
env.render()

total_reward = 0
done = False
while not done:
    state, reward, done, info = env.step(env.action_space.sample())
    total_reward += reward
    env.render()

print('Total reward:', total_reward)
```

You'll see that a random agent doesn't do very well - in a trial run the score
reached -713 before the environment terminated.


In [1]:
import gym

env = gym.make('Taxi-v2')
state = env.reset()
env.render()

total_reward = 0
done = False
while not done:
    state, reward, done, info = env.step(env.action_space.sample())
    total_reward += reward
    env.render()

print('Total reward:', total_reward)



+---------+
|R:[43m [0m| : :[34;1mG[0m|
| : : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+

+---------+
|R:[43m [0m| : :[34;1mG[0m|
| : : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (Pickup)
+---------+
|R:[43m [0m| : :[34;1mG[0m|
| : : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (Dropoff)
+---------+
|R: | : :[34;1mG[0m|
| :[43m [0m: : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (South)
+---------+
|R: | : :[34;1mG[0m|
|[43m [0m: : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (West)
+---------+
|R: | : :[34;1mG[0m|
|[43m [0m: : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (West)
+---------+
|R: | : :[34;1mG[0m|
|[43m [0m: : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (West)
+---------+
|[43mR[0m: | : :[34;1mG[0m|
| : : : : |
| : : : : |
| | : | : |
|[35mY[0m| : |B: |
+---------+
  (North)
+---------+
|[43mR[0m: 

## Instructions

Make a Python notebook where you work on the below goals. You can use whatever
environment you wish to develop, but for turning in you should add the file to
the `ML-Reinforcement-Learning` repository in the `sprintchallenge/` directory.
Add, commit, push, and it will appear in your already open pull request.

The goals involve trying to beat a score in Taxi - be sure to measure the score
of your approach after it is trained, and not during the training. This snippet
measures performance (run a simulation repeatedly and average total rewards):

```python
episodes = 1000
rewards = []
max_steps = 99

for episode in range(episodes):
    state = env.reset()  # Assuming you already have env created as above
    total_rewards = 0
    
    for step in range(max_steps):
        action = env.action_space.sample() # TODO your policy here!
        state, reward, done, info = env.step(env.action_space.sample())
        total_rewards += reward
        if done:
            break
    rewards.append(total_rewards)        

print('Average score over time:', sum(rewards) / episodes)
```

