<a href="https://colab.research.google.com/github/DionisiusMayr/FreewayGame/blob/main/aline.almeida/a_freeway.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Useful Resources
* [Manual of the game](https://www.gamesdatabase.org/Media/SYSTEM/Atari_2600/Manual/formated/Freeway_-_1981_-_Zellers.pdf)
* [Freeway.asm source code](http://bjars.com/source/Freeway.asm) 

# 1. Description 

## 1.1. The problem addressed

- The nature of your environment

- What are the terminal states

- How is the reward function defined

- All parameters employed in your methods (discount factor, step size, etc.)

## 1.2. The MDP formulation

- How the problem was modeled

- Implementation specifics and restrictions

## 1.3. The discretization model adopted

```
      14  # Chicken Y
    , 16  # Chicken Lane Collide
    , 18  # Chicken Collision flag (with the bottom car)
    , 22  # Car X Direction
    , 23, 24, 25, 26, 27, 28, 29, 30, 31, 32  # Z Car Patterns
    , 33, 34, 35, 36, 37, 38, 39, 40, 41, 42  # Car Motion Timmers
    , 43, 44, 45, 46, 47, 48, 49, 50, 51, 52  # Car Motions
    , 87, 88  # Car Shape Ptr
    # TODO: test if this makes any difference
    , 89, 90  # Chicken Shape Ptr
    # TODO: test if this makes any difference
    , 106, 107  # Chicken Sounds
    , 108, 109, 110, 111, 112, 113, 114, 115, 116, 117  # Car X Coords
```

# 2. Implementation

## 2.1. Setup

### Initialization


In [1]:
## Install the dependencies:

#!pip install gym
#!pip install gym[atari]

In [2]:
## Enable importing from "src" folder

import sys
sys.path.append('../')  

In [3]:
## import the libraries

import gym
import time
#import src.agents as agents
#import src.environment as environment
#import src.utils as utils

In [4]:
## Convert hex score values to int 

def convert_score(hex_score: int) -> int:
    """Convert the score from the hex represation used in memory to base 10."""
    return (hex_score // 16) * 10 + (hex_score % 16)

In [5]:
## Baseline agent

from abc import ABC
from abc import abstractmethod

class Agent(ABC):
    """
    Abstract class to implement agents.
    It requires an `__init__` method to set the required parameters (such
    as epsilon) and an `act` method that implements the policy of the agent.
    """
    @abstractmethod
    def __init__(self, **params):
        pass
    
    @abstractmethod
    def act(self, ob, reward, game_over):
        pass

    
class Baseline(Agent):
    """The Baseline agent always move up, regardless of the reward received."""
    def __init__(self):
        pass
    
    def act(self, ob, reward, game_over):
        return 1  # Always move up!

    
if __name__ == '__main__':
    print('Testing agents.py...')
    agent = Baseline()
    print('All good!')

Testing agents.py...
All good!


In [6]:
## The Freeway enviorment 

import gym
#import src.agents as agents
#import src.utils as utils

def get_env():
    env = gym.make('Freeway-ram-v0')
    state = env.reset()
    
    return (env, state)

In [7]:
## Run the episodes and return the scores

def run(Agent: Agent, render: bool=False, n_runs: int=1, verbose=True):
    scores = []  # List of each run rewards
    
    for i in range(n_runs):
        env, initial_state = get_env()
        agent = Agent()

        game_over = False
        action = agent.act(initial_state, 0, False)

        while not game_over:
            if render:
                time.sleep(0.05)
                env.render()
                
            ob, reward, game_over, _ = env.step(action)
            action = agent.act(ob, reward, game_over)

        player_score = convert_score(ob[103]) #The byte 103 contains the Player 1 score.
        if verbose:
            print(f"Score #{i}: {player_score}")

        scores.append(player_score)
        env.close()
        
    return scores

### Colab graphical requirements 

In [8]:
import numpy as np 
import pandas as pd 
import shutil
import os

!apt-get install python-opengl -y
!apt install xvfb -y
!pip install pyvirtualdisplay
!pip install https://github.com/pyglet/pyglet/archive/pyglet-1.5-maintenance.zip
!apt-get install ffmpeg -y

from pyvirtualdisplay import Display
import gym
from gym import wrappers
from gym import envs
import matplotlib.pyplot as plt

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  libgle3
The following NEW packages will be installed:
  python-opengl
0 upgraded, 1 newly installed, 0 to remove and 14 not upgraded.
Need to get 496 kB of archives.
After this operation, 5,416 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 python-opengl all 3.1.0+dfsg-1 [496 kB]
Fetched 496 kB in 2s (248 kB/s)
Selecting previously unselected package python-opengl.
(Reading database ... 144865 files and directories currently installed.)
Preparing to unpack .../python-opengl_3.1.0+dfsg-1_all.deb ...
Unpacking python-opengl (3.1.0+dfsg-1) ...
Setting up python-opengl (3.1.0+dfsg-1) ...
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  xvfb
0 upgraded, 1 newly installed, 0 to remove and 14 not upgraded.
Need to get 784 kB of 

### Environment

In [9]:
## init the environment

env, initial_state = get_env()
print("Action Space:", env.action_space)
print("Observation Space:", env.observation_space)

Action Space: Discrete(3)
Observation Space: Box(0, 255, (128,), uint8)


The agent in this game has three possible actions:

* 0: Stay
* 1: Move forward
* 2: Move back

### Baseline agent


As a simple baseline, we are using an agent that moves always **up**.

In [10]:
## Run the baseline agent

scores = run(Baseline, render=False, n_runs=5)
scores

Score #0: 21
Score #1: 23
Score #2: 21
Score #3: 21
Score #4: 21


[21, 23, 21, 21, 21]

In [11]:
## Mean score of the baseline agent
print("Mean score:", sum(scores) / len(scores))

Mean score: 21.4


### Game renderization

In [12]:
display = Display(visible=0,size=(1000,1000))
display.start()

env, _ = get_env()
monitor_dir = os.getcwd()
env = wrappers.Monitor(env,monitor_dir,video_callable=lambda ep_id: ep_id%1000 == 0,force=True)
env.reset()

#Choose the agent:
agent = Baseline()

game_over = False
action = agent.act(initial_state, 0, False)

while not game_over:
    ob, reward, game_over, _ = env.step(action)
    action = agent.act(ob, reward, game_over)
env.close()


from IPython.display import HTML
from base64 import b64encode
video = [v for v in os.listdir('./') if 'mp4' in v]
video.sort()
print(len(video))
vid_1 = open(video[0],'rb').read()
data_url_1 = "data:video/mp4;base64," + b64encode(vid_1).decode()
HTML("""
<video width=400 height=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url_1)

1


## 2.2. Monte Carlo Control

In [14]:
# MC episodes
def MonteCarloES(RAM_mask: List[int], render: bool=False):
    epi = episode.generate_episode(env, agent, RAM_mask=RAM_mask, render=render)
    return agent.update_policy(epi)

NameError: ignored

In [13]:
## Getting the info we care about directly from the RAM
RAM_mask = [
      14  # Chicken Y
    , 16  # Chicken Lane Collide
    , 108, 109, 110, 111, 112, 113, 114, 115, 116, 117  # Car X Coords
]


## initialize the environment
env, initial_state = environment.get_env()


## Monte Carlo agent
agent = MonteCarloControl(gamma=0.95, available_actions=2, N0=0.5)


## time t0
%time
MonteCarloES(RAM_mask=RAM_mask, render=False) 

#list of scores and rewards
scores = []
total_rewards = []

%%time
n_runs = 1000

for i in range(n_runs):
    render = i % 201 == 200

    score, total_reward = MonteCarloES(RAM_mask=RAM_mask, render=render)

    scores.append(score)
    total_rewards.append(total_reward)

    print(f"Run [{i:3}] - Total reward: {total_reward:7.2f} Mean scores: {sum(scores) / len(scores):.2f} Means Scores[:-10]: {sum(scores[-10:]) / 10:5.2f} Score: {score:2} ")

## 2.3. Q-learning (or some variation like DoubleQ-learning)

## 2.4. SARSA ($\lambda$)

## 2.5. Linear function approximator

# 3. Evaluation

The system must be evaluated according to the quality of the solutions found and a critical evaluation is expected on the relationship between adopted parameters x solution performance. Graphs and tables representing the evolution of the solutions are expected. Additional comparisons with the literature are welcome, although they are not mandatory.

## 3.1. Computational cost

## 3.2. Optimality

## 3.3. Influence of reward function

## 3.4. State and action space sizes

# 4. Discussion

## 4.1. The advantages and disadvantages of bootstrapping in your problem

## 4.2. How the reward function influenced the quality of the solution? Was your group able to achieve the expected policy given the reward function defined?

## 4.3. How function approximation influenced the results? What were the advantages and disadvantages of using it in your problem?