# 1. What is RL? 

![rl](data/environment.png)

## Taxi as an example of an "easy" environment 

In [1]:
import pickle
with open('data/optimum_policy.pck','rb') as f:
    policy = pickle.load(f)

In [2]:
import gym
import time
from IPython.core.display import clear_output, display, HTML
def step_taxi(optimum_policy,n_games=10):
    games_solved = 0
    taxi = gym.make('Taxi-v2')
    while games_solved<n_games:
        obs = taxi.reset()
        end = False
        while not end:
            try:#TODO: update policy for taxi new version
                action = optimum_policy[str(obs)]
                obs,_,end,_ = taxi.step(action)
                clear_output(True)
                taxi.render()
                time.sleep(0.25)
            except:
                break
        #time.sleep(0.75)
        games_solved += 1

In [16]:
step_taxi(policy,n_games=15)

+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |[35m[42mB[0m[0m: |
+---------+
  (Dropoff)


## What is a policy

In [3]:
print(list(policy.items())[:10])

[('274', 0), ('421', 1), ('208', 0), ('478', 1), ('199', 0), ('138', 3), ('362', 1), ('422', 1), ('191', 3), ('185', 1)]


### Visual representation of a policy

Using graph theory to represent the state space of the environment, so no need to dive into markov chains theory.

![graph](data/policy graph.png)

![graph](data/policy_tree.png)

# 2. How is it used?

Real world examples:

- [Factories](https://www.technologyreview.com/s/601045/this-factory-robot-learns-a-new-job-overnight)
- [Trading](https://www.pit.ai/). (Not sure about this, I have no way to tell if its legit) 
- [Business and RL](https://www.marutitech.com/businesses-reinforcement-learning/)
- [Quora post](https://www.quora.com/What-are-some-practical-applications-of-reinforcement-learning)
- [Deepmind](https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/)

## what it shuold accomplish in the future

Talk about deepmind and where it is heading.
talk a bit about autonomous robots.

- [deepmind lab](https://deepmind.com/blog/open-sourcing-deepmind-lab/): too experimental to use seriously.
- [paper](https://arxiv.org/pdf/1612.03801.pdf) lab.

# 3. Packages for RL environments

# [Openai gym](https://gym.openai.com/docs):
- Show some environments.
- Explain API.
- Hackable environments for custom research.
- Show some example of an experimental algorithm in a customized environment.


# [Openai Universe](https://universe.openai.com/):
- [Github](https://github.com/openai/universe)
- Show some environments.
- Explain differences with gym.
- VNC and vectorization for environment parallelization.


# 4. RL backend: Building agents

## [Keras-rl](https://github.com/matthiasplappert/keras-rl)

- Keras to build the model that the agent will use: wraps Theano and TensorFlow.
- Available models.
- API and functionality.
- [Example](https://github.com/matthiasplappert/keras-rl/blob/master/examples/duel_dqn_cartpole.py) of a deep RL agent with a few lines of code. 

## [Universe starter agent](https://github.com/openai/universe-starter-agent):

- TensorFlow as backend. 
- Allows to dive into hardcore details on how paralelization is achieved. 
- Easy to run an [A3C](https://arxiv.org/abs/1602.01783) example (very difficult to build new agents from scratch).

# 5. Educational resources

# Reinforcement learning

- [Introduction to deep learning and self driving cars](https://www.youtube.com/watch?v=1L0TKZQcUtA&list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf)
- [David Silver](https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT)
- [Nando de Freitas](https://www.youtube.com/user/ProfNandoDF/videos)

# TensorFlow

- [Intro (2h15min)](https://www.youtube.com/watch?v=vq2nnJ4g6N0)
- [Big data University online course (ML0120EN)](https://bigdatauniversity.com/blog/learn-tensorflow-and-deep-learning-together/)
- [Udacity course](https://www.udacity.com/course/deep-learning--ud730)