# Reinforcement learning 

# Basic overview

From [wiki](https://en.wikipedia.org/wiki/Reinforcement_learning): 

***Reinforcement learning (RL)** is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.*

The 4 main components of any RL algorithm are therefore the following:

* `Agent` - an entity (computer program) that makes decisions. 

* `Action` - decision made by an agent. 

* `Environment` - an interface for the agents to interact with. The environment accepts actions and responds with the result and a new set of observations. 

* `Reward` - a function that assigns a value (reward) for each action that an agent can take. 

The interaction between the 4 components: 

![](media/chapter-1/reinforcement-learning-overview.png)

# How does the agent learn? 

The goal of RL is to make the agent make actions that maximize the rewards. The programmer (you) must define an environment and a reward generated function such that the agent learns with each iteration and "good" actions are rewarded while "bad" actions are penalized.

To put it mathematicaly, the reward function maps the set of all actions made in the environment and assigns a reward value. 

If we define 

$A$ - action set 

$f_{E}$ - environment function 

$R$ - reward set

Then the output that gets feeded to the agent is: 

$f_{E}: A \rightarrow R$

Or 

$f(a) = r$, $a \in A$, $r \in R$

Basicaly, an agent keeps a ledger in his internal memory that maps each action to a certain reward. After the reward is received, the internal state is updated. If we define the internal state at time $t$ as $w_{t}$, then the high level logic of an agent "learning" is: 

$$w_{t + 1} = w_{t} + \alpha f_{E}(a_{t}) $$

Where 

$\alpha$ - a positive constant; Learning rate.

$a_{t}$ - action taken at time $t$. 

$w_{t}$ - internal state at time $t$.

![](media/chapter-1/rl-learning.png)

After each action, the agent updates the internal state and then the cycle is repeated. 