# An Introduction to Reinforcement Learning

> A brief introduction to the main concepts

## What is Reinforcement Learning?

Reinforcement Learning (RL) is a type of Machine Learning, where agents learn to make decisions in a dynamic environment to maximize a reward.

The typical Reinforcement Learning loop looks like this:

1. The agent receives an *observation*, representing the current state of the environment.
2. Based on the observation and its current *policy*, the agent selects an *action*.
3. The agent performs the action and receives a *reward* based on the environment's new state.

## Markov Decision Process

Formally, RL can be understood as solving *Markov Decision Process* (MDP). A MDP is a $5$-tuple $(S, A, T, \pi_0, R)$, with:

- $S$: state space of the environment
- $A$: action space of the environment, i.e., actions that can be performed by the agent
- $T : S \times A \times S \mapsto [0,1]$: transition function, that describes how actions affect the state of the environment
- $r: S \times A \times S \mapsto \mathbb{R}$: reward function
- $\pi_0: S \mapsto [0,1]$: probability distribution over initial state

The agent selects actions using a *policy* $\pi: S \mapsto A$.
The objective is usually to find the optimal policy $\pi^*$, which maximizes the cumulative reward.

> **Questions**
>
> 1. Why does the cartesian product that defines the domain of the transition function include the state space $S$ twice?
> 2. Why does the transition function return a value between 0 and 1?
> 3. Assume that from the current state S, you can get to state A with an immediate high reward, or state B with an immediate low reward. Is state A always prefarable over B?
> 4. Can teaching a dog a new trick be understood as a Markov Decision Process? If yes, what are the state space, action space, and the reward function?

## Bookmarks

- [Kaggle Intro to Reinforcement Learning](https://www.kaggle.com/learn/intro-to-game-ai-and-reinforcement-learning)