# Useful Resources
* [Manual of the game](https://www.gamesdatabase.org/Media/SYSTEM/Atari_2600/Manual/formated/Freeway_-_1981_-_Zellers.pdf)

# 1. Description 

## 1.1. The problem addressed

- The nature of your environment

- What are the terminal states

- How is the reward function defined

- All parameters employed in your methods (discount factor, step size, etc.)

## 1.2. The MDP formulation

- How the problem was modeled

- Implementation specifics and restrictions

## 1.3. The discretization model adopted

# 2. Implementation

## 2.1. Setup

### Imports

In [1]:
#Install the dependencies:
#!pip install gym
#!pip install gym[atari]

In [2]:
import sys
sys.path.append('../')  # Enable importing from `src` folder

In [3]:
import gym
import time
import src.agents as agents
import src.environment as environment
import src.utils as utils

### Environment

We will be using the Open AI Gym framework in this study.......

In [4]:
env, initial_state = environment.get_env()

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

Action Space: Discrete(3)
Observation Space: Box(0, 255, (128,), uint8)


The agent in this game has three possible actions:

* 0: Stay
* 1: Move forward
* 2: Move back

TODO: Talk a bit about the observation space of 128 bytes of RAM...

### Baseline

As a simple baseline, we are using an agent that moves always **up**.

In [5]:
scores = environment.run(agents.Baseline, render=False, n_runs=5)
scores

Score #0: 23
Score #1: 21
Score #2: 21
Score #3: 23
Score #4: 21


[23, 21, 21, 23, 21]

In [6]:
# Mean score
print("Mean score:", sum(scores) / len(scores))

Mean score: 21.8


It usually scores from 21 to 23 points, as shown in the images below:

![Baseline 1](./img/baseline_1.png)
![Baseline 2](./img/baseline_2.png)

## 2.2. Monte Carlo Control

## 2.3. Q-learning (or some variation like DoubleQ-learning)

## 2.4. SARSA ($λ$)

## 2.5. Linear function approximator

# 3. Evaluation

The system must be evaluated according to the quality of the solutions found and a critical evaluation is expected on the relationship between adopted parameters x solution performance. Graphs and tables representing the evolution of the solutions are expected. Additional comparisons with the literature are welcome, although they are not mandatory.

## 3.1. Computational cost

## 3.2. Optimality

## 3.3. Influence of reward function

## 3.4. State and action space sizes

# 4. Discussion

## 4.1. The advantages and disadvantages of bootstrapping in your problem

## 4.2. How the reward function influenced the quality of the solution? Was your group able to achieve the expected policy given the reward function defined?

## 4.3. How function approximation influenced the results? What were the advantages and disadvantages of using it in your problem?