## Reinforcement Learning
- This technique is different than many of the other machine learning techniques we have seen earlier and has training agents to interact with enviornments like games.
- Rather than feeding our machine learning model millions of examples we let our model come up with its own examples by exploring an enviornemt.
- The concept is simple. Humans learn by exploring and learning from mistakes and past experiences so let's have our computer do the same.

### RL Basics
- **Environment:** Where the agent operates; like a game level.
- **Agent:** The entity exploring the environment; e.g., game character.
- **State:** Agent's situation, like its location in the environment.
- **Action:** Agent's interactions; moving, jumping, or not doing anything.
- **Reward:** Outcome of actions; positive or negative, guiding the agent's goal.

The most important part of reinforcement learning is determing how to reward the agent. After all, the goal of the agent is to maximize its rewards. This means we should reward the agent appropiatly such that it reaches the desired goal.

#### Q-Learning:
- Q-Learning is a powerful technique in machine learning. It's like teaching a smart agent by filling a table.
- This table has rows for different situations (states) and columns for actions. Each entry shows the expected reward for an action in a situation. The agent learns by updating this table based on experiences.
- After learning, the agent knows the best action for any situation by looking at the highest reward in that situation's row.

**Consider this example:**
![Screenshot%202023-08-12%20164202.png](attachment:Screenshot%202023-08-12%20164202.png)

- Here we have a drawback, as you can see the will pick the highest reward in each state i.e., S1-3, S2-2, S3-4. what's happening here is local maxima, when an agent sees highest reward in a specific state then it will blindy pick it.
- But we have to explore every state so that we may get a better reward. There are two ways that our agent can decide on which action to take: 
    1. Randomly picking a valid action 
    2. Using the current Q-Table to find the best action.

- **Initial Exploration:** At the start, the agent takes random actions to explore and discover the environment's possibilities.

- **Learning Progression:** Over time, the agent relies more on its learned Q-Table values, making smarter choices.

- **Balanced Approach:** The agent balances random actions for discovery and Q-Table guidance for better decisions, avoiding getting stuck in limited options.

- **Continuous Improvement:** After each action, the agent notes new states and rewards, using them to update the Q-Table.

- **Stopping Conditions:** The agent stops when a time limit, goal achievement, or end of the environment is reached.

#### Updating Q-Values
The formula for updating the Q-Table after each action is as follows:
> $ Q[state, action] = Q[state, action] + \alpha * (reward + \gamma * max(Q[newState, :]) - Q[state, action]) $

- $\alpha$ stands for the **Learning Rate**

- $\gamma$ stands for the **Discount Factor**

**Learning Rate (α):**

- α is a constant that controls how much a Q-Table changes with each update.
- High α: Big changes, quick learning. Low α: Small changes, gradual learning.
- Changing α affects exploration, speed of Q-Table learning.

**Discount Factor (γ):**

- γ balances focus on present and future rewards.
- High γ: Values future rewards. Low γ: Values immediate rewards.

## Q-Learning Example
For this example we will use the Q-Learning algorithm to train an agent to navigate a popular enviornment from the [Open AI Gym](https://gym.openai.com/). The Open AI Gym was developed so programmers could practice machine learning using unique enviornments. Intersting fact, Elon Musk is one of the founders of OpenAI!

Let's start by looking at what Open AI Gym is.

This is how the env looks:

    S F F F       (S: starting point, safe)
    F H F H       (F: frozen surface, safe)
    F F F H       (H: hole, stuck forever)
    H F F G       (G: goal, safe)

- Actually, it’s really easy to find several correct solutions: RIGHT → RIGHT → DOWN → DOWN → DOWN → RIGHT
- The agent needs to meet the goal in a minimum number of actions. In this example, the minimum number of actions to complete the game is 6. We need to remember this fact to check if our agent really masters ❄️Frozen Lake or not.

**We will code our Q-Learning Example in next lesson**