# Reinforcement Learning

- Use Reinforcement Learning with Q-learning to find solutions to this field.

![field](../images/field1.PNG)

In [1]:
import random
import numpy as np

## Step 1: Create a field
**It is a two dimensional field (2D)**
- **__init__**:
    - Use a list of list with integer values to represent all the states
        - Goal end state should be 1, illegal states -1, other states 0
    - Set the state to be random fo the size of states
- **done**:
    - Check if current state has non-negative values
- **get_possible_actions**:
    - Set a list to all possible actions **actions = [0, 1, 2, 3]**
        - action = 2 is up
        - action = 4 is left
        - action = 6 is right
        - action = 8 is down
    - Then check if state is in a position where a possible actions should be removed.
    - Finally, return the remaining actions
- **update_next_state**:
    - Get the current state
    - Check if move is illegal, then return current state and -10 in reward
    - Otherwise opdate state and return the reward according to new state

In [25]:
# the goal is to find the green field and avoid the red once.
class Field:
    def __init__(self):
        self.states = [
            [-1,0,0,0,0,0,0,0,0,0,0],
            [-1,0,0,0,0,0,0,1,0,0,0],
            [0,0,0,0,0,0,0,0,0,0,0]]
        self.state = (random.randrange(0,len(self.states)),random.randrange(0, len(self.states[0])))
        
    def done(self):
        if self.states[self.state[0]][self.state[1]] != 0:
            return True
        else:
            return False
        
    def get_possible_actions(self):
        # action: 2 -> up, 4 -> left, 6 -> right, 8 -> down
        actions = [2,4,6,8]
        if self.state[0] == 0:
            actions.remove(2)
        if self.state[0] == len(self.states)-1:
            actions.remove(8)
        if self.state[1] == 0:
            actions.remove(4)
        if self.state[1] == len(self.states[0])-1:
            actions.remove(6)
        return actions
    
    def update_next_state(self, action):
        x,y = self.state
        if action == 4:
            if y == 0:
                return self.state, -10
        if action == 6:
            if y == len(self.state[0])-1:
                return self.state, -10

In [27]:
field = Field()
field.state, field.done(), field.get_possible_actions()

((1, 5), False, [2, 4, 6, 8])

## Step 2: Train the model
- Create a $q$-table initialized to all 0
    - Use **q_table = np.zeros(...)** *(insert values for ...)*
- Set **alpha = .5, gamma = 0.5,** and **epsilon = 0.5**
- Create *for*-loop iterating 10000
    - Create new field
    - While field not done
        - Get possible actions and assign to **actions**
        - With probability epsilon take a random action, otherwise take the best action
            - HINT: **random.uniform(0, 1) < epsilon**
            - HINT: Random action: **random.choice(actions)**, and best action: **np.argmax(q_table[field.state])**
        - Get current state and assign it to **cur_x, cur_y**
        - Update next state and get it and the reward
        - Update **q_table[cur_x, cur_y, action] = (1 - alpha)*q_table[cur_x, cur_y, action] + alpha*(reward + gamma*np.max(q_table[next_x, next_y]))**

## Step 3: Solve a task
- To see the path make a variable **path = np.zeros((3, 11))**
- Create a field **Field()**
- To count steps assign **steps = 1**
- Assign the start state in the path to **np.nan**.
- The we begin: while not solved.
    - Get the **action** to take
    - Get the next **state**
    - Update **path** with **steps**
    - Increment **steps** with one
- see the **path**