# Deep Reinforcement Learning Tutorial for Python in 20 Minutes
## From Youtube channel: Nicholas Renotte
        https://www.youtube.com/watch?v=cO5g5qLrLSo&ab_channel=NicholasRenotte

Worked with supervised learning?
Maybe you’ve dabbled with unsupervised learning. 

But what about reinforcement learning?

It can be a little tricky to get all setup with RL. You need to manage environments, build your DL models and work out how to save your models down so you can reuse them. But that shouldn’t stop you! 

Why?

Because they’re powering the next generation of advancements in IOT environments and even gaming and the use cases for RL are growing by the minute. That being said, getting started doesn’t need to be a pain, you can get up and running in just 20 minutes working with Keras-RL and OpenAI. 

In this video you’ll learn how to:
1. Create OpenAI Gym environments like CartPole
2. Build a Deep Learning model for Reinforcement Learning using Tensorflow and Keras
3. Train a Reinforcement Learning model using Deep Q Policy based learning using Keras-RL

Github Repo for the Project: https://github.com/nicknochnack/Tenso...​

Want to learn more about it all:
Open AI Gym: https://gym.openai.com/envs/​
Keras RL: https://keras-rl.readthedocs.io/​

Oh, and don't forget to connect with me!
LinkedIn: https://www.linkedin.com/in/nicholasr...​
Facebook: https://www.facebook.com/nickrenotte/​
GitHub: https://github.com/nicknochnack​

Happy coding!
Nick

P.s. Let me know how you go and drop a comment if you need a hand!

Music by Lakey Inspired
Chill Day - https://www.youtube.com/watch?v=3HjG1...

# Documentation:
    https://keras-rl.readthedocs.io/en/latest/agents/dqn/

In [None]:
import gym
import random
from IPython import display
import tensorflow as tf

print(tf.__version__)

print('Imported ...')

# 1 Test the environement:

In [2]:
env = gym.make('CartPole-v0')
actions = env.action_space.n 
states = env.observation_space.shape[0]
print('actions number: {} | states number: {}\n'.format(actions, states))

print('Done...')

actions number: 2 | states number: 4

Done...


In [3]:
episodes = 5

for episode in range(episodes):
    env.reset()
    done = False
    score = 0
    step = 0
    while not done:
        action =  random.choice(range(actions))
        state, reward, done, info =  env.step(action)
        score += reward
        step +=1
        print('Episode: ', episode, ' | Steps: ', step,  ' | Score: ', score)
        display.clear_output(wait=True)
        env.render()
        
        
env.close()    
print('finished...')    

finished...


# 2 Make a Deep Learning Model with Keras:

In [4]:
import numpy as np
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam

print('Done...')

Done...


In [5]:
def build_model(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape = (1, states)))
    model.add(Dense(24,activation = 'relu'))
    model.add(Dense(24,activation = 'relu'))
    model.add(Dense(actions, activation = 'linear'))
    return model
print('finished...')

finished...


In [6]:
model = build_model(states,actions)
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 4)                 0         
_________________________________________________________________
dense (Dense)                (None, 24)                120       
_________________________________________________________________
dense_1 (Dense)              (None, 24)                600       
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 50        
Total params: 770
Trainable params: 770
Non-trainable params: 0
_________________________________________________________________


# 3 Build Agent with Keras-RL

In [15]:
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

print('finished...')

finished...


In [None]:

def build_agent(actions, model):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit = 50000, window_length = 1)
    agent = DQNAgent(policy =  policy, memory = memory,
                     nb_actions = actions, model= model, 
                     nb_steps_warmup = 10 , target_model_update = 1e-1
                    )
    return agent


agent = build_agent(actions, model)
#agent.compile(Adam(lr=1e-3), metrics=['mae'])
agent.compile(optimizer="adam",
              loss='sparse_categorical_crossentropy',
              metrics = ['accuracy'])  
    
print('finished ...')

# 4 Train the Agent:

In [53]:
agent.fit(env, nb_steps = 50000, visualize = False, verbose =1)

print('done...')

Training for 50000 steps ...
Interval 1 (0 steps performed)
50 episodes - episode_reward: 196.220 [165.000, 200.000] - loss: 5.955 - mae: 39.873 - mean_q: 80.015

Interval 2 (10000 steps performed)
52 episodes - episode_reward: 195.192 [147.000, 200.000] - loss: 4.602 - mae: 37.066 - mean_q: 74.221

Interval 3 (20000 steps performed)
51 episodes - episode_reward: 195.176 [133.000, 200.000] - loss: 3.824 - mae: 36.375 - mean_q: 72.827

Interval 4 (30000 steps performed)
51 episodes - episode_reward: 195.216 [122.000, 200.000] - loss: 4.932 - mae: 39.421 - mean_q: 78.911

Interval 5 (40000 steps performed)
done, took 356.335 seconds
done...


Note: I changed the line 180 in the line of the file below :
gedit ~/anaconda3/envs/nadour_ai/lib/python3.8/site-packages/tensorflow/python/keras/engine/training_v1.py

# 5 Test the Agent:

In [43]:
scors = agent.test(env, nb_episodes = 100, visualize = False)
print(np.mean(scors.history['episode_reward']))

Testing for 100 episodes ...
Episode 1: reward: 200.000, steps: 200
Episode 2: reward: 200.000, steps: 200
Episode 3: reward: 200.000, steps: 200
Episode 4: reward: 200.000, steps: 200
Episode 5: reward: 200.000, steps: 200
Episode 6: reward: 200.000, steps: 200
Episode 7: reward: 200.000, steps: 200
Episode 8: reward: 176.000, steps: 176
Episode 9: reward: 200.000, steps: 200
Episode 10: reward: 200.000, steps: 200
Episode 11: reward: 200.000, steps: 200
Episode 12: reward: 200.000, steps: 200
Episode 13: reward: 200.000, steps: 200
Episode 14: reward: 200.000, steps: 200
Episode 15: reward: 200.000, steps: 200
Episode 16: reward: 200.000, steps: 200
Episode 17: reward: 200.000, steps: 200
Episode 18: reward: 200.000, steps: 200
Episode 19: reward: 200.000, steps: 200
Episode 20: reward: 200.000, steps: 200
Episode 21: reward: 200.000, steps: 200
Episode 22: reward: 200.000, steps: 200
Episode 23: reward: 200.000, steps: 200
Episode 24: reward: 200.000, steps: 200
Episode 25: reward: 

In [54]:
env.close()

# 6 save the agent model:

In [47]:
agent.save_weights('CartPole-v0-QAgent.h5')

# References:
    Playing Atari with Deep Reinforcement Learning, Mnih et al., 2013
    Human-level control through deep reinforcement learning, Mnih et al., 2015
    Deep Reinforcement Learning with Double Q-learning, van Hasselt et al., 2015
    Dueling Network Architectures for Deep Reinforcement Learning, Wang et al., 2016