# Reinforcement Learning
### Goal of lesson
- Understand how Reinforcement Learning works
- Learn about Agent and Environment
- How it iterates and gets rewards based on action
- How to continuously learn new things
- Create own Reinforcement Learning from scratch

### Reinforcement Learning simply explained
- Given a set of rewards or punishments, learn what actions to take in the future
- The second large group of Machine Learning

### Environment
<img src='img/reinforcement-learning.png' width=600 align='left'>

### Agent
- The environment gives the agent a state
- The agent action
- The environment gives a state and reward (or punishment)

This is how robots are taught how to walk

### Markov Decision Process
- Model for decision-making, representing states, actions, and their rewards
- Set of states $S$
- a state $s$
- Set of actions $Actions(s)$
- an action $a$
- Transition model $P(s'|s, a)$
- Reward function $R(s, a, s')$

### Q-learning (one model)
- Method for learning a function $Q(s, a)$, estimate of the value of performing action $a$ in state $s$

### Q-learning
- Start with $Q(s, a) = 0$ for all $s, a$
- Update $Q$ when we take an action
- $\alpha$ : learning rate 
- $\gamma$ : some gamma
- $a', s'$ : new possible state and action
- how to update the $Q$ table:
> $Q_{new}(s, a) = Q_{old}(s, a) + \alpha($reward$ + \gamma\max(s', a') - Q(s, a)) = (1 - \alpha)Q_{old}(s, a) + \alpha($reward$ + \gamma\max(s', a'))$

Update your previous experience according to the new experience.

### $\epsilon$-Greedy Decision Making
**Explore vs Exploit**
- With propability $\epsilon$ take a random move
- Otherwise, take action $a$ with maximum $Q(s, a)$

### Simple task
<img src='img/field.png' width=600 align='left'>


- Starts at a random point
- Move left or right
- Avoid the red box
- Find the green box

![Field](img/field-3.png)

> #### Programming Notes:
> - Libraries used
>     - [**numpy**](http://numpy.org) - scientific computing with Python ([Lecture on NumPy](https://youtu.be/BpzpU8_j0-c))
>     - [**random**](https://docs.python.org/3/library/random.html) - pseudo-random generators
> - Functionality and concepts used
>     - **Object-Oriented Programming (OOP)**: [Lecture on Object Oriented Programming](https://youtu.be/hbO9xo6RfDM)

> ### Resources
> #### What if there are more states?
> - [Reinforcement Learning from Scratch](https://youtu.be/y4LEVVE2mV8)

In [2]:
import numpy as np
import random

In [27]:
# implement the above filed
class Field:
    def __init__(self):
        # a list to represent all the states(11칸) in the above figure 3
        # negative reward state(-1), neutural state(0), positive reward state(1)
        self.states = [-1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]   # this is our field
        self.state = random.randrange(0, len(self.states)) # random integer from [0, 11)
        
    def done(self):
        if self.states[self.state] != 0:
            return True
        else:
            return False
   
    def get_possible_actions(self):
        actions = [0, 1]  # 0: left, 1: right
        if self.state == 0:
            actions.remove(0)
        if self.state == len(self.states) - 1:
            actions.remove(1)
        return actions
    
    def update_next_state(self, action):
        if action == 0: # means we go left
            return self.states, -10
        self.state -= 1
        if action == 1:
            if self.state == len(self.states) - 1:
                return self.state, -10
            reward = self.states[self.state]
        return self.state, reward

In [28]:
field = Field()
field.state, field.done(), field.get_possible_actions() 

(6, False, [0, 1])

In [31]:
field.update_next_state(1)

(3, 0)

In [32]:
field = Field()
q_table = np.zeros((len(self.states))) # 23:20까지 시청