# An Introduction to Reinforcement Learning Concepts and a Practical Guide

In this guide, we will cover :
1. What Reinforcement Learning is.
2. Reinforcement Learning basic concepts and terminology.
3. Applications of Reinforcement Learning.
4. A practical guide to performing Reinforcement Learning with code. 


## 1. What is Reinforcement Learning?

Most of you have heard of an AI that plays computer games and was actually able to defeat a world reknown South Korean player in the game. Well ... that is technically reinforcement learning at a glimpse. Let me show you how through a simple analogy.

Imagine that you have a dog and that you're trying to teach it a new trick. In this situation, every time the dog's response is a desired one, you give it food and when the response is not desired, you don't! Hard I know! The idea is that the dog will learning from every positive experience it has had before, so it will keep making the desired response because it knows that it will be rewarded. It will also learn what not to do from the undesired responses. The dog is learning from it's past experiences which influence its present choices. This is reinforcement learning!

### Definition of Reinforcement Learning

It's a subset of Machine Learning which involves agents attempting to take actions in hopes of maximizing a prioritized reward. It is different from Supervised Learning (SL) whereby in SL ,the training data is labelled and therefore the model is trained with the correct answer, in Reinforcement Learning (RL), there is no answer, the agent decides what action to make and learns from its past experiences.



## 2. Basic Concepts and Terminology

Here I will go over the main terminologies you need to understand:

1. Agent - The one who makes decisions based on the rewards an punishments. In our analogy, the dog is the agent.
2. Environment - The world in which the agent lives and interacts. In our analogy, this could be in your backyard or house.
3. State - Current situation returned by the environment. An example of a state could be your dog standing and you using a specific word in a certain tone in your living room.
4. Action - All the possible moves that the agent can make. A transition from one state to the other. In our analogy, this could be the dog going from standing to running to fetch the stick.
5. Reward - An immediate return send back from the environment to evaluate the last action. In our analogy, the dog is given a treat when it does the desired treat and a penalty of "No" if it performs the undesired action.
6. Policy - The strategy of choosing an action, given a state, in expectation of better outcomes.

#### Putting it all together ...
A Reinforcement Learning (RL) is about training an agent that interacts with its environment. The agent transitions between different scenarios of the environment, referred to as states, by performing actions. Actions, in return, yield rewards, which could be positive, negative or zero. The agent’s sole purpose is to maximize the notion of cumulative reward over an episode, which is everything that happens between an initial state and a terminal state, where we decide the rewards which align with the tasks that we want to accomplish.
Hence, we reinforce the agent to perform certain actions by providing it with positive rewards, and to stray away from others by providing negative rewards. This is how an agent learns to develop a strategy or policy.




## 3. Applications of Reinforcement Learning

1. It can be used in robotics for industrial automation.
2. It can be used in machine learning and data processing.
3. It can be used to create training systems that provide custom instruction and materials according to the requirement of students.
4. Aircraft control and robot motion control.
5. Business strategy planning.

## 4. A practical guide to performing Reinforcement Learning with code.

### AI Learns to Play Atari Space Invaders using Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) is a subset of Machine Learning that combines neural networks with RL to help agents achieve their goals. We will use DRL techniques to taech our agent how to play the famous Atari Space Invaders game. If you want to learn more about the game beforehand, here you go: http://www.atarimania.com/game-atari-2600-vcs-space-invaders_s6947.html

For the game, we need to create an environment where our agent can perform actions, get a score and get the current state. Fortunately, OpenAI Gym , already has this environment built for us. This library provide us with the API that provides our ageent with all the information it will need. We'll be using the gym environment 'SpaceInvaders-v0' for our game.

## Install Dependencies
We use gym, keras-rl - a reinforcement learning library , atari-py which is a python binding to atari games,and tensorflow. Let's install them

In [1]:
! pip install gym tensorflow atari_py keras-rl2 pygame



You should consider upgrading via the 'c:\users\supreme computers\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.


## Test a random environment using gym


In [2]:
import gym
import random
import numpy as np

  _RESOLVED_ROMS = _resolve_roms()


In the line of code below, we call the import rom method on our installed library, atari-py. We need a rom because it is a digital copy of our video game. You can make a quick google search or download it here: https://www.emulatorgames.net/space-invaders-roms/ . Load the file name at the specific file path that you saved on your local machine.

In [3]:
! python -m atari_py.import_roms "C:/Users/Supreme Computers/Documents/4.1/Knowledge-Based Systems/myroms/"

  and should_run_async(code)


copying space_invaders.bin from C:/Users/Supreme Computers/Documents/4.1/Knowledge-Based Systems/myroms/Space Invaders (1980) (Atari, Richard Maurer - Sears) (CX2632 - 49-75153) ~.bin to C:\Users\Supreme Computers\AppData\Local\Programs\Python\Python39\lib\site-packages\atari_py\atari_roms\space_invaders.bin


In the lines of code below, create the space invaders environment. Pull in the height, width and channels which are part of a state in the environment in order to pull it into a neural network. We can also call the get_action_meanings() function to see what they are: The agent can move left and right, fire left and right and noop means, no action.

In [5]:
env = gym.make('ALE/SpaceInvaders-v5',render_mode='human')
height, width, channels = env.observation_space.shape
actions = env.action_space.n

In [6]:
env.unwrapped.get_action_meanings()

['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

Below, we test the environment on 5 episodes to see how it performs. We reset the environment for each episode. As the episode runs, the agent chooses a random action and is provided with information about the state and reward. The reward updated to the final ouput score. Here's how it performs without reinforcement Learning. 

You can notice that the scores are random and are not getting better with experience, let's now use RL to improve the agent's scores per game.

In [7]:
episodes = 5
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        action = random.choice([0,1,2,3,4,5])
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))
env.close()

Episode:1 Score:120.0
Episode:2 Score:40.0
Episode:3 Score:110.0
Episode:4 Score:95.0
Episode:5 Score:125.0


## Create Deep Learning Model using Keras

Import the suitable packages for this, listed below

In [8]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Convolution2D
from tensorflow.keras.optimizers import Adam

Start building the model and use the height, width, channels(pulled from observation space) and actions(pulled from the action space). The shape of these parameters will define our model.

Sequential() is assigned to our model, so as to pass frames through sequentially.
model.add() is used to stack layers in our network.

The first layer is the Convolution2D layer with the following parameters:
1. The number of filters (32). This is to train the model to detect objects in the frames such an enemy ships
2. The size of the filters and the number of strides
3. The ReLu function and the shape of the frame

The second and thirs layer are all the same , just with different parameter
Then flatten all the layers into a single layer, to pass it through a dense (fully-connected) layer. The first dense layer has 512 units. The second dense layer compresses this slightly and is 256 units. The third dense layer has the number of actions, which is 6 units.
Finally, build the model by assigning the build_model function to model.

In [9]:
def build_model(height, width, channels, actions):
    model = Sequential()
    model.add(Convolution2D(32, (8,8), strides=(4,4), activation='relu', input_shape=(3,height, width, channels)))
    model.add(Convolution2D(64, (4,4), strides=(2,2), activation='relu'))
    model.add(Convolution2D(64, (3,3), activation='relu'))
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

  and should_run_async(code)


In [15]:
del model

In [16]:
model = build_model(height, width, channels, actions)

In [17]:
model.summary()


Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 3, 51, 39, 32)     6176      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 3, 24, 18, 64)     32832     
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 3, 22, 16, 64)     36928     
_________________________________________________________________
flatten_1 (Flatten)          (None, 67584)             0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               34603520  
_________________________________________________________________
dense_5 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_6 (Dense)              (None, 6)                

## Build the RL Agent

Finally, Build the RL agent with Keras. Import  SequentialMemory, which allows the agent to retain memory from previous games. Also import LinearAnnealedPolicy, which adds a decay as we get closer to the optimal strategy, and EpsGreedyPolicy, which allows the agent to find the best reward outcome.

In [18]:
from rl.agents import DQNAgent
from rl.memory import SequentialMemory
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy

Define the agent function and pass through the model and actions. Then define the policy by passing the epsilon greedy policy through the linear anneal policy. Define the memory using sequential memory with a buffer limit of 1000 episodes with a window length of (meaning for 1000 episodes, the model stores the past 3 windows to capture what our previous steps look like).
Then define the agent by passing through the model, memory, and policy. Add a dueling network, which splits the value and advantage and helps the model learn when to take actions and when not to take actions. Add which actions to take and the number of steps the model should take.

In [19]:
def build_agent(model, actions):
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.2, nb_steps=10000)
    memory = SequentialMemory(limit=1000, window_length=3)
    dqn = DQNAgent(model=model, memory=memory, policy=policy,
                  enable_dueling_network=True, dueling_type='avg', 
                   nb_actions=actions, nb_steps_warmup=1000
                  )
    return dqn

Assign the build_agent function to dqn and add an optimizer for the neural network at a learning rate of .0001. Fit the model and training it 1000 steps. For reference, Google DeepMind suggests that 10 to 40 Million steps would be the best amoount of steps for optimal RL but for purposes of learning, I train my model for 10,000 steps.

In [20]:
dqn = build_agent(model, actions)
dqn.compile(Adam(learning_rate=1e-4))

In [21]:
dqn.fit(env, nb_steps=1000, visualize=False, verbose=2)

Training for 1000 steps ...




 330/1000: episode: 1, duration: 73.034s, episode steps: 330, steps per second:   5, episode reward: 50.000, mean reward:  0.152 [ 0.000, 30.000], mean action: 2.436 [0.000, 5.000],  loss: --, mean_q: --, mean_eps: --
 845/1000: episode: 2, duration: 80.824s, episode steps: 515, steps per second:   6, episode reward: 180.000, mean reward:  0.350 [ 0.000, 30.000], mean action: 2.643 [0.000, 5.000],  loss: --, mean_q: --, mean_eps: --
done, took 178.273 seconds


<tensorflow.python.keras.callbacks.History at 0x19751ab92b0>

## Test Model

Let's test the model for 10 episodes and ouput the reward. From the scores, you can see that the agent had started learning from the training process. If the model was trained for 40M steps, it would have arrived at higher rewards and an optimal strategy.

In [22]:
scores = dqn.test(env, nb_episodes=10, visualize=False)
print(np.mean(scores.history['episode_reward']))

  and should_run_async(code)


Testing for 10 episodes ...
Episode 1: reward: 30.000, steps: 377
Episode 2: reward: 70.000, steps: 467
Episode 3: reward: 65.000, steps: 514
Episode 4: reward: 140.000, steps: 723
Episode 5: reward: 45.000, steps: 497
Episode 6: reward: 160.000, steps: 477
Episode 7: reward: 140.000, steps: 706
Episode 8: reward: 215.000, steps: 792
Episode 9: reward: 465.000, steps: 818
Episode 10: reward: 225.000, steps: 684
155.5


## The end
I hope you learned an interesting concept from today's practical guide.