Skip to content

AhmetHamzaEmra/Understanding_RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Understanding Reinforcement Learning

Deep Reinforcement Learning requires good amount of intuation in both deep learning and reinforcement learning. Even though, theorical part is not that hard to understand, it definitely makes it harder to understand the messy codes. Lets try to change that :D


Section 1

OpenAI gym

Since we are working with OpenAI's GYM alot, lets have better intuation about it!

OpenAi Gym enviroments


Section 2

Policy Gradient menthods

In supervised learning, it is simple to create a system that can easily map inputs X into outputs Y since there is a dataset which contains all input and output examples. On the other hand in Reinforcement learning, there are no datasets which contain examples just like datasets in supervised learning. Using Policy gradient is one way to solve this problem. The hole idea relly on encouraging the actions with good reward and discouraging the actions with bad reward. The general formula is minimizing the log(p(y | x)) A loss. In here A represent Adventage and for most vanilla version we can use discounted rewards.

Extra resources:

  1. My blog post
  2. Pong from pixels
  3. Hands-On Machine Learning with Scikit-Learn and TensorFlow Chapter 16
  4. Policy Gradients Pieter Abbeel lecture
  5. Continuous control with deep reinforcement learning
  6. Better Exploration with Parameter Noise

Section 3

Deep Q Networks

Before we start with DQN lets talk about Q function first. Q(s,a)​ is a function that maps given s (state) and a(action) pair to expected total reward untile the terminal state. It is basicaly how much reward we are gonna gate if we act with action a in state s. The reason we combine this idea with NN is it is almost imposible to find q values for all states in environment.