# Reinforcement Learning ([FreeCodeCamp](https://www.youtube.com/watch?v=vufTSJbzKGU&t=32s))

## Basics of Reinforcement Learning:

1. Reinforcement Learning is a type of machine learning in which an agent learns to make decisions by interacting with an environment.

<center>
<img src="https://gymnasium.farama.org/_images/AE_loop.png" width=400 alt="Demo of the Reinforcement Learning"/>
</center>

2. An agent is an entity that interacts with an environment in order to learn how to make decisions that will maximize a specific goal or objects. The agent can be thought of as an autonomous decision-making entity that receives inputs from the environment, perform actions, and receives rewards or penalties based on its actions.
3. The environment, on the other hand, is the external system or context in which the agent operates. The environment can be a simulation, a physical system, or any other system that the agent interacts with. The environment provides feedback to the agent in the form of rewards or punishments, which the agent uses to learn how to make better decisions.
4. The agent receives feedback in the form of rewards or penalties, and its goal is to maximize the total reward it receives over time.
5. The agent follows a trial-and-error process of taking actions, observing the consequences, and adjusting its behavior based on the rewards it receives. Over time,the agent learns to identify the actions that lead to the highest rewards and avoid those that lead to penalties.
6. RL agents can adapt to new situations and environments. They can adjust their behavior based on the feedback they receive and can continually improve their performance over time.
7. RL agents can learn to optimize their behavior over time, leading to more efficient decision making and better performance.

### **Real-world use cases of RL:**
- <u>**Robotics:**</u> Reinforcement learning can be used to train robots to perform tasks such as grasping objects, navigating in complex environments, and interacting with humans. For example, RL has been used to train robots to play table tennis, where the robot learns to adjust its movements based on the position of the ball.
- <u>**Game Playing:**</u> Reinforcement learning has been applied to game playing, particularly in the development of artificial intelligence agents that can play games such as chess, Go and Atari games. For example, AlphaGo, a computer program developed by Google DeepMind, uses RL to learn to play the game of Go at a world-class level.
- <u>**Autonomous Driving:**</u> Reinforcement learning can be used to train autonomous vehicles to make decisions in complex and dynamic environments. For example, RL can be used to train a self-driving car to navigate through traffic, avoid obstacles, and make safe and efficient driving decisions.
- <u>**Personalized Recommendations:**</u> RL can be used to provide personalized recommendations to users based on their preferences and behavior. For example, RL can be used to optimize the recommendations of a video streaming service, learning what content to recommend to users to maximize user engagement and satisfaction.

### What is **[Gymnasium](https://gymnasium.farama.org/)**?

- It provides a collection of environments or "tasks" that can be used to test and develop reinforcement learning algorithms. These environments are typically game-like, with well defined rules and a reward structure, making them useful for evaluating and comparing different reinforcement learning algorithms.
- The Gym toolkit includes a set of interfaces and tools for interacting with the environments, such as observation spaces, action spaces, and rewards. This make it easy to build and test reinforcement learning algorithms in a standardized way.
- These environments range from simple games, like Pong or Breakout, to more complex simulations, like robotics or autonomous driving. Gymnasium environments are designed to be easy to use and come with a standard interface for interacting with the environment.

### Getting started with Gymnasium

The basic process for using Gymnasium to train a Reinforcement Learning model is as follows:
1. Define the environment you want to work with.
2. Create an instance of the environment.
3. Define the agent's policy (i.e., how it decides which action to take).
4. Interact with the environment, taking actions and receiving rewards.
5. Update the agent's policy based on the rewards it receives.

### Main concepts of OpenAI Gymnasium

1. <u>**Observation and Action Spaces:**</u> An observation space is the set of possible states that an agent can observe in the environment. An action space is the set of possible actions that an agent can take in the environment.
2. <u>**Episode:**</u> An episode is a complete run-through of an environment, starting from the initial state and continuing until a terminal state is reached. Each episode is composed of a sequence of states, actions and rewards.
3. <u>**Wrapper:**</u> A wrapper is a tool in OpenAI Gym that allows you to modify an environment's behavior without changing its code. Wrappers can be used to add features such as time limits, reward shaping and action masking.
4. <u>**Benchmark:**</u> OpenAI Gym provides a set of benchmark environments, which are standardized tasks that can be used to evaluate and compare reinforcement learning algorithms. These benchmarks include classic control tasks, Atari games, and robotics tasks.

### Introduction to [Blackjack](https://gymnasium.farama.org/environments/toy_text/blackjack/)

<center>

![Blackjack example](../Images/image-1.png)
|
</center>

**Here are the basic rules of Blackjack:**
1. The game is played with one or more decks of standard playing cards.
2. Each player is dealt two cards, and the dealer is also dealt two cards, with one card face down.
3. The value of each card is determined by its rank. Aces can be worth 1 or 11, face cards (kings, queens, and jacks) are worth 10, and all other cards are worth their face value.
4. Players have the option to "hit" and take additional cards to improve their hand, or "stand" and keep their current hand.
5. The dealer must hit until their hand has a value of 17 or more.
6. If a player's hand goes over 21, they "bust" and lose the game.
6. If the dealer's hand goes over 21, the player wins the game.
7. If neither the player nor the dealer busts, the hand with the highest total value that is less than or equal to 21 wins the game.

#### **Action Space**

The action shape is (1,) in the range {0, 1} indicating whether to stick or hit.

0: Stick

1: Hit

#### **Observation Space**

The observation consists of a 3-tuple containing: the player’s current sum, the value of the dealer’s one showing card (1-10 where 1 is ace), and whether the player holds a usable ace (0 or 1).

#### **Starting State**

The starting state is initialized in the following range.

#### <b>Observation</b>

| **Observation** | **Min** | **Max** |
| :--: | :--: | :--: |
| Player current sum | 4 | 12 |
| Dealer showing card value | 2 | 11 |
| Usable Ace | 0 | 1 |

#### **Rewards**

win game: +1

lose game: -1

draw game: 0

#### **Episode End**

The episode ends if the following happens:

**Termination:** 
- The player hits and the sum of hand exceeds 21.
- The player sticks.
- An ace will always be counted as usable (11) unless it busts the player.

