# Intro to the project purpose
In this project we are going to `conduct a reinforcement learning (RL) project comparing tabular and function approximation methods covered up to Chapter 10`.   

## We are going to explore:
- Exploration strategies.
- Learning rate impacts.
- Reward structures. 

# Intro to the Environment
In this project we are covering the Car Racing environment from [Gymnasium](https://gymnasium.farama.org/environments/box2d/car_racing/)   
## Box2D
Before explaining the environment, we need to know about Box2D.   
[Box2D](https://github.com/erincatto/box2d) is physics engine for making 2D games.   
It offers the capability to make the world interactive and as real to the real world physics [Box2D docs](https://box2d.org/documentation/)
Car Racing uses Box2D as the engine for it.
## Car Racing
>Note: please notice that this section is summarized from the [Gymnasium Car Racing documentation](https://gymnasium.farama.org/environments/box2d/car_racing/)

Car Racing is a `top-down racing environemnt` made from pixels that generates random tracks every episode.   
The goal is for the car to pass the whole racing track without failure as efficiently as possible.   
The environment could generate and RGB buffer to view the environment world along with indicators that show the `true speed, ABS sensors, steering wheel position, and gyroscope`. But those indicators are only shown for viewing purposes, it is not used as a reward system or an observation state. The observation state is solely the pixels of the environemnt.   
### Actions Space
Car Racing offers 2 types of actions: continuous, and discrete.
#### Continuous
- 0: steering [-1,1] -1 for left, 1 for light
- 1: gas [0,1]
- 2: braking [0,1]
#### Discrete
- 0: do nothing
- 1: left
- 2: right
- 3: gas
- 4: brake   
We are going to cover discrete action only, as it is simpler and we cannot use continuous action space for tabular methods.
### Observation Space
The observation space represents the POV of the environment as a top-down camera.
It is an image of the car and the race track.
The space is **96x96 RGB image**. Therefore giving the size of 96x96x3=27,648.
### Reward Structure
The normal reward structure taken from the gym step functions gives the following:
- -0.1: for each frame.
- $ 1000 \over N $ for every tile visited. (N=total number of tiles visited in the track)
- -100 when the car dies. (car death explained below)
### Episode Termination
The episode terminates when either:
- All tiles are visited.
- Car went out of bounds of the playfield, and therefore it dies.
### Environment Arguments
- `lap_complete_percent=0.95`: to change the amound of tiles to be visited to consider the lap as completed.
- `domain_randomize=False`: to randomize the colors of the track and the field.
- `continuous=True`: to use continuous or discrete action space.

# Literature Review

## The problem of large state size
Through initial research (with ChatGPT) and discussions with our instructor, we have found that Car Racing only produces pixels as an observation space, and this is huge `27,648` for tabular and function approximation methods. Therefore we need to handle this case before proceeding with the project. We must find a way to reduce the state size or handle with it differently.
## A blog for dealing with Car Racing
We have walked through a blog made by an Engineer who refers to himself as [Mike](https://notanymike.github.io/Solving-CarRacing/).
The blog covers how to train the Car Racing model with Proximal Policy Optimisations (PPO), which looks like a deep RL method but nevertheless, his discoveries are inightful for our case.
### Discrete actions Case 
Car Racing offers a discrete action space which makes the agent only perform one actions at a time (do nothing, turn left, turn right, gas, brake) which provides simplicity but produces a new problem when dealing with track curves.
In a real scenario, the car brakes AND turns as a one action. This might effect the performance for our case if we did not handle it.
The author handled the case by providing soft discrete actions. It provides more actions that perform hard or soft turning which gives full or partial values for turning and acceleraing, or turning and braking.
### Simplifying the observation space
The author simplified the observation space by:
1. Removing the bottom panel that shows observable stats for the human observer. Since this is useless for the agent, removing it is better to avoid wasteful training.