# Intro

The plan is to have a player blob (blue), which aims to navigate its way as quickly as possible to the food blob (green), while avoiding the enemy blob (red). Now, we could make this super smooth with high definition, but we already know we're going to be breaking it down into observation spaces. Instead, let's just start in a discrete space. Something between a 10x10 and 20x20 should suffice. Do note, the larger you go, the larger your Q-Table will be in terms of space it takes up in memory as well as time it takes for the model to actually learn. So, our environment will be a 20 x 20 grid, where we have 1 player, 1 enemy, and 1 food. For now, we'll just have the player able to move, in attempt to reach the food, which will yield a reward.

## Explanation
### 1.Hyperparameters and Constants
__Grid and Episodes:__

SIZE: Defines the size of the grid environment as 10x10.
A 10x10 Q-Table for example, in this case, is ~15MB. A 20x20 is ~195MB

HM_EPISODES: The total number of episodes (iterations) for which the agent will be trained.

__Rewards and Penalties:__

MOVE_PENALTY: The penalty (negative reward) for each move made by the player.
ENEMY_PENALTY: The penalty for the player colliding with the enemy.

FOOD_REWARD: The reward for the player reaching the food.

__Exploration-Exploitation Parameters:__

epsilon: Initial probability of choosing a random action (exploration).

EPS_DECAY: Factor by which epsilon decays after each episode, reducing exploration over time.

__Display Control:__

SHOW_EVERY: Controls how often (in terms of episodes) the environment is visually displayed.

__Q-Learning Parameters:__

start_q_table: A filename to load a pre-trained Q-table or None to start fresh.

LEARNING_RATE: Determines how much newly acquired information overrides old information.

DISCOUNT: Discount factor for future rewards.

__Identifiers and Colors:__

PLAYER_N, FOOD_N, ENEMY_N: Numeric identifiers for the player, food, and enemy in the environment.
d: A dictionary mapping these identifiers to RGB color values for visualization.

### 2. Blob classification
Blob Class:

Represents an entity (player, food, or enemy) on the grid.

Constructor (__init__):

Initializes the blob at a random position within the grid.

__str__ Method:

Returns a string representation of the blob's coordinates, useful for debugging.

__sub__ Method:

Defines the subtraction operation between two blobs, returning their relative distance as a tuple (dx, dy).

action Method:

Takes an action (0-3) that moves the blob diagonally in one of four directions.

move Method:

Moves the blob based on provided x and y values or randomly if not provided.
Ensures the blob remains within grid boundaries.
### 3. Q_table initialization
__Q-Table:__
The Q-table is a dictionary that maps observations (states) to a list of Q-values corresponding to each possible action.

__Initialization:__
If start_q_table is None, the code initializes the Q-table with random values for all possible states.
Each state is represented as a tuple of two differences: (player-food, player-enemy), and each entry in the table contains four Q-values, one for each possible action.
Loading a Pre-trained Q-Table:
If start_q_table is not None, it loads an existing Q-table from a file using pickle.

### 4. Main training loop
At the start of each episode, the player, food, and enemy are initialized as Blob objects at random positions on the grid.
Every SHOW_EVERY episodes, the code sets show to True and prints the current episode number and the average reward for the last SHOW_EVERY episodes.
This ensures the environment is visually rendered at intervals, allowing observation of the agent's behavior.

### 5. Episode execution
__Observations and Actions:__

obs: The current state, represented by the relative positions of the player to the food and enemy.
The agent selects an action using an epsilon-greedy strategy:
With probability epsilon, it takes a random action (exploration).
Otherwise, it chooses the action with the highest Q-value for the current state (exploitation).

__Action Execution:__

The chosen action is executed by calling player.action(action), which moves the player on the grid.

# Requirements

In [10]:
import numpy as np
from PIL import Image  # for creating visual env
import cv2  # for showing our visual live
import matplotlib.pyplot as plt
import pickle  # to save/load Q-Tables
from matplotlib import style  # to make pretty charts.
import time  # using this to keep track of our saved Q-Tables.