<a href="https://colab.research.google.com/github/AlmeidaAlin3/FreewayGame/blob/main/aline.almeida/a_freeway.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Useful Resources
* [Manual of the game](https://www.gamesdatabase.org/Media/SYSTEM/Atari_2600/Manual/formated/Freeway_-_1981_-_Zellers.pdf)
* [Freeway.asm source code](http://bjars.com/source/Freeway.asm) 

# 1. Description 

## 1.1. The problem addressed

- The nature of your environment

- What are the terminal states

- How is the reward function defined

- All parameters employed in your methods (discount factor, step size, etc.)

## 1.2. The MDP formulation

- How the problem was modeled

- Implementation specifics and restrictions

## 1.3. The discretization model adopted

# 2. Implementation

## 2.1. Setup

### Libraries


In [1]:
#Install the dependencies:
#!pip install gym
#!pip install gym[atari]

In [2]:
import sys
sys.path.append('../')  # Enable importing from `src` folder

In [3]:
import gym
import time
#import src.agents as agents
#import src.environment as environment
#import src.utils as utils

In [4]:
def convert_score(hex_score: int) -> int:
    """Convert the score from the hex represation used in memory to base 10."""
    return (hex_score // 16) * 10 + (hex_score % 16)

In [5]:
# TODO: We might need to implement another method to "train" the agent.
from abc import ABC
from abc import abstractmethod


class Agent(ABC):
    """
    Abstract class to implement agents.
    It requires an `__init__` method to set the required parameters (such
    as epsilon) and an `act` method that implements the policy of the agent.
    """
    @abstractmethod
    def __init__(self, **params):
        pass
    
    @abstractmethod
    def act(self, ob, reward, game_over):
        pass

    
class Baseline(Agent):
    """The Baseline agent always move up, regardless of the reward received."""
    def __init__(self):
        pass
    
    def act(self, ob, reward, game_over):
        return 1  # Always move up!

    
if __name__ == '__main__':
    print('Testing agents.py...')
    agent = Baseline()
    print('All good!')

Testing agents.py...
All good!


In [6]:
import gym
#import src.agents as agents
#import src.utils as utils

def get_env():
    env = gym.make('Freeway-ram-v0')
    state = env.reset()
    
    return (env, state)

In [7]:
def run(Agent: Agent, render: bool=False, n_runs: int=1, verbose=True):
    scores = []  # List of each run rewards
    
    for i in range(n_runs):
        env, initial_state = get_env()
        agent = Agent()

        game_over = False
        action = agent.act(initial_state, 0, False)

        while not game_over:
            if render:
                time.sleep(0.05)
                env.render()
                
            # We won't use the fourth returned value, `lives`.
            ob, reward, game_over, _ = env.step(action)
            action = agent.act(ob, reward, game_over)
            # input()  # Workaround: wait for next action

        # Small hack: The byte 103 contains the Player 1 score.
        player_score = convert_score(ob[103])

        if verbose:
            print(f"Score #{i}: {player_score}")

        scores.append(player_score)

        # TODO: Big concern: Depending on the seed being used, we achieve different results.
        # using the Baseline agent, sometimes we get 21 or 23 (even 24) points, depending on the run.
        # We need to find a way to set a fixed value for the seed.
        env.close()
        
    return scores

### Colab graphical requirements 

In [8]:
import numpy as np 
import pandas as pd 
import shutil
import os

!apt-get install python-opengl -y
!apt install xvfb -y
!pip install pyvirtualdisplay
!pip install https://github.com/pyglet/pyglet/archive/pyglet-1.5-maintenance.zip
!apt-get install ffmpeg -y

from pyvirtualdisplay import Display
import gym
from gym import wrappers
from gym import envs
import matplotlib.pyplot as plt

Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-opengl is already the newest version (3.1.0+dfsg-1).
0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
xvfb is already the newest version (2:1.19.6-1ubuntu4.8).
0 upgraded, 0 newly installed, 0 to remove and 14 not upgraded.
Collecting https://github.com/pyglet/pyglet/archive/pyglet-1.5-maintenance.zip
  Using cached https://github.com/pyglet/pyglet/archive/pyglet-1.5-maintenance.zip
Building wheels for collected packages: pyglet
  Building wheel for pyglet (setup.py) ... [?25l[?25hdone
  Created wheel for pyglet: filename=pyglet-1.5.11-cp36-none-any.whl size=1088881 sha256=92b65f636bcbaa61b8df0a8ed91e9c960b5d0875c3660a54c3e67bcb04d3d5a2
  Stored in directory: /root/.cache/pip/wheels/cc/bf/c6/31ad24e254cf2ffea48e575a12344c295076167cba5e4a208e
Successfully built pyglet
Readin

### Environment

We will be using the Open AI Gym framework in this study.......

In [9]:
env, initial_state = get_env()

print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

Action Space: Discrete(3)
Observation Space: Box(0, 255, (128,), uint8)


The agent in this game has three possible actions:

* 0: Stay
* 1: Move forward
* 2: Move back

TODO: Talk a bit about the observation space of 128 bytes of RAM...

### Baseline

As a simple baseline, we are using an agent that moves always **up**.

In [11]:
scores = run(Baseline, render=False, n_runs=5)
scores

Score #0: 23
Score #1: 21
Score #2: 21
Score #3: 21
Score #4: 24


[23, 21, 21, 21, 24]

In [12]:
# Mean score
print("Mean score:", sum(scores) / len(scores))

Mean score: 22.0


### Game renderization

In [26]:
display = Display(visible=0,size=(600,600))
display.start()

env, _ = get_env()
monitor_dir = os.getcwd()
env = wrappers.Monitor(env,monitor_dir,video_callable=lambda ep_id: ep_id%1000 == 0,force=True)
env.reset()

#Choose the agent:
agent = Baseline()

game_over = False
action = agent.act(initial_state, 0, False)

while not game_over:
    ob, reward, game_over, _ = env.step(action)
    action = agent.act(ob, reward, game_over)
env.close()


from IPython.display import HTML
from base64 import b64encode
video = [v for v in os.listdir('./') if 'mp4' in v]
video.sort()
print(len(video))
# print(video[:26])
vid_1 = open(video[0],'rb').read()
data_url_1 = "data:video/mp4;base64," + b64encode(vid_1).decode()
HTML("""
<video width=500 height=500 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url_1)

1


## 2.2. Monte Carlo Control

## 2.3. Q-learning (or some variation like DoubleQ-learning)

## 2.4. SARSA ($λ$)

## 2.5. Linear function approximator

# 3. Evaluation

The system must be evaluated according to the quality of the solutions found and a critical evaluation is expected on the relationship between adopted parameters x solution performance. Graphs and tables representing the evolution of the solutions are expected. Additional comparisons with the literature are welcome, although they are not mandatory.

## 3.1. Computational cost

## 3.2. Optimality

## 3.3. Influence of reward function

## 3.4. State and action space sizes

# 4. Discussion

## 4.1. The advantages and disadvantages of bootstrapping in your problem

## 4.2. How the reward function influenced the quality of the solution? Was your group able to achieve the expected policy given the reward function defined?

## 4.3. How function approximation influenced the results? What were the advantages and disadvantages of using it in your problem?