# Test Reinforcement Learning in PyBoy (Tetris)

## Imports

### Installations

In [2]:
!pip install pyboy



In [4]:
!pip install 'stable-baselines3[extra]'

Collecting stable-baselines3[extra]
  Using cached stable_baselines3-2.3.2-py3-none-any.whl.metadata (5.1 kB)
Collecting gymnasium<0.30,>=0.28.1 (from stable-baselines3[extra])
  Using cached gymnasium-0.29.1-py3-none-any.whl.metadata (10 kB)
Collecting torch>=1.13 (from stable-baselines3[extra])
  Using cached torch-2.4.0-cp310-none-macosx_11_0_arm64.whl.metadata (26 kB)
Collecting cloudpickle (from stable-baselines3[extra])
  Using cached cloudpickle-3.0.0-py3-none-any.whl.metadata (7.0 kB)
Collecting opencv-python (from stable-baselines3[extra])
  Using cached opencv_python-4.10.0.84-cp37-abi3-macosx_11_0_arm64.whl.metadata (20 kB)
Collecting pygame (from stable-baselines3[extra])
  Using cached pygame-2.6.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (12 kB)
Collecting tensorboard>=2.9.1 (from stable-baselines3[extra])
  Using cached tensorboard-2.17.1-py3-none-any.whl.metadata (1.6 kB)
Collecting tqdm (from stable-baselines3[extra])
  Using cached tqdm-4.66.5-py3-none-any.whl.metad

Tetris rom: https://www.emulatorgames.net/roms/gameboy-color/tetris/

PyBoy github: https://github.com/Baekalfen/PyBoy

Stable Baseline3 documentation: https://stable-baselines3.readthedocs.io/en/master/

### Imports

In [1]:
# Emulation
from pyboy import PyBoy
from pyboy.utils import WindowEvent
import time
import numpy as np

# CNN
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import Sequential, layers, optimizers
from tensorflow.keras.callbacks import EarlyStopping

# RL
import gymnasium as gym # openAI simulated environment
from stable_baselines3 import PPO # RL model
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy

2024-08-19 00:43:26.518333: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## PyBoy setup

### Tips

#### About PyBoy

Installation: https://github.com/Baekalfen/PyBoy/wiki/Installation

PyBoy API Documentation: https://docs.pyboy.dk/index.html

<pre>Inputs :
                     
| GameBoy | Keyboard  | Value for     |
|         |           | pyboy button  |
| ------- | --------- | ------------- |
| Up      | Up        | 'up'          |
| Down    | Down      | 'down'        |
| Left    | Left      | 'left'        |
| Right   | Right     | 'right'       |
| A       | A         | 'a'           |
| B       | B         | 'b'           |
| Start   | Return    | 'start'       |
| Select  | Backspace | 'select'      |</pre>

<pre>Other keyboard inputs :
-Esc : Quit
-D : Debug
-Space : Toggle unlimited FPS
-Z : Save state
-X : Load state
-I : Toggle screen recording (as gif)
-, : Rewind backwards
-. : Rewind forward</pre>

<pre>PyBoy inputs :
-pyboy.button('a') : simple press 'A' for 1 frame, released after next 'pyboy.tick()'
-pyboy.button('a', 3) : simple press 'A' for 3 frames, released after 3 'pyboy.tick()'
-pyboy.button_pess('a') : hold input 'A' until 'pyboy.button_release('a')' or 'pyboy.send_input()'</pre>

<pre>Other :
-pyboy.tick() : Let the game going to the next frame
-pyboy.tick(X, True) : Let the game going X frames forward, and show the last frame
-pyboy.tick(X, False) : Let the game going X frames forward, without rendering frames
-pyboy.set_emulation_speed(0) : Set emulation speed. 0 is no speed limit
-pyboy.screen.image : Current screen image</pre>

#### About Tetris

##### In PyBoy

<pre>Get Tetris infos:
-tetris = pyboy.game_wrapper
-tetris.score : Score
-tetris.level : Level
-tetris.lines : Lines
-tetris.next_tetromino() : Get next tetromino type (O, Z, S, L, J, T, I)
-tetris.game_area() : current game tiles as array

We can print 'tetris' to have some informations :
-Score, Level and Lines
-Sprites on current screen : IDs, positions, shapes, tiles (?), on screen or not
-A representation of the current game screen</pre>

<pre>Tiles values (in 'tetris.game_area()' or when we print 'tetris'):
0 : empty tile
1 : tile occupied by a 'J' tetromino
2 : tile occupied by a 'Z' tetromino
3 : tile occupied by a 'O' tetromino
4 : tile occupied by a 'L' tetromino
5 : tile occupied by a 'T' tetromino
6 : tile occupied by a 'S' tetromino
7 : tile occupied by a 'I' tetromino
8 : tile occupied by the Game Over wall</pre>

##### Tetris rules

https://harddrop.com/wiki/Tetris_(Game_Boy)
<pre>
Base Scoring:
-slow drop bloc              = +0 pt
-fast drop bloc (down arrow) = +1 pt * drop distance
-1 line                      = +40 pts
-2 lines                     = +100 pts
-3 lines                     = +300 pts
-4 lines                     = +1200 pts

10 lines = 1 level

After level 0:
Score = base score * (level+1)

The goal of the game is to reach the 100th line.
The ending screen is shown after losing if we had reached at least 100 lines.
The ending is different depending on the party type, score, and lines:
> In A-Type
  -100 000 to 149 999 points: small missile launch
  -150 000 to 199 999 points: medium-sized missile launch
  -200 000 or more points: big missile launch
> In B-Type
  -25 lines on level 9: Russian musicians and dancers
  -25 lines on level 9 + height 5: Russian musicians and dancers + Buran shuttle launch</pre>

### Start emulation

In [2]:
rom_path = 'ROMs/GameBoy/Tetris.gb'

#### Tuto Tetris emulation

https://github.com/Baekalfen/PyBoy/wiki/Example-Tetris

>1.Start game (skip Title screen and go directly in game) with a random seed to have always the same tetrominos\
2.Take a screenshot and start recording as gif\
3.Play during 1000 frames\
4.Auto play the first tetromino (go to the right)\
5.Take an other screenshot and end recording\
6.Close the game

In [33]:
pyboy = PyBoy(rom_path)

pyboy.set_emulation_speed(0) # No speed limit
assert pyboy.cartridge_title == "TETRIS"

tetris = pyboy.game_wrapper
tetris.game_area_mapping(tetris.mapping_compressed, 0)
tetris.start_game(timer_div=0x00) # The timer_div works like a random seed in Tetris
pyboy.tick() # To render screen after `.start_game`

pyboy.screen.image.save("Tetris1.png")

pyboy.send_input(WindowEvent.SCREEN_RECORDING_TOGGLE)

tetromino_at_0x00 = tetris.next_tetromino()
assert tetromino_at_0x00 == "Z", tetris.next_tetromino()
assert tetris.score == 0
assert tetris.level == 0
assert tetris.lines == 0

# Checking that a reset on the same `timer_div` results in the same Tetromino
tetris.reset_game(timer_div=0x00)
assert tetris.next_tetromino() == tetromino_at_0x00, tetris.next_tetromino()

blank_tile = 0
first_brick = False

for frame in range(1000):
    pyboy.tick(1, True)

    # The playing "technique" is just to move the Tetromino to the right.
    if frame % 2 == 0: # Even frames to let PyBoy release the button on odd frames
        pyboy.button("right")

    # Illustrating how we can extract the game board quite simply. This can be used to read the tile identifiers.
    game_area = tetris.game_area()
    # game_area is accessed as [<row>, <column>].
    # 'game_area[-1,:]' is asking for all (:) the columns in the last row (-1)
    if not first_brick and any(filter(lambda x: x != blank_tile, game_area[-1, :])):
        first_brick = True
        print("First brick touched the bottom!")
        print(tetris)

# Final game board:
print(tetris)

pyboy.screen.image.save("Tetris2.png")
pyboy.send_input(WindowEvent.SCREEN_RECORDING_TOGGLE)

# We shouldn't have made any progress with the moves we made
assert tetris.score == 0
assert tetris.level == 0
assert tetris.lines == 0

# Assert there is something on the bottom of the game area
assert any(filter(lambda x: x != blank_tile, game_area[-1, :]))
tetris.reset_game(timer_div=0x00)
assert tetris.next_tetromino() == tetromino_at_0x00, tetris.next_tetromino()

tetris.reset_game(timer_div=0x00)
assert tetris.next_tetromino() == tetromino_at_0x00, tetris.next_tetromino()
# After reseting, we should have a clean game area
assert all(filter(lambda x: x != blank_tile, game_area[-1, :]))

tetris.reset_game(timer_div=0x55) # The timer_div works like a random seed in Tetris
assert tetris.next_tetromino() != tetromino_at_0x00, tetris.next_tetromino()

# Testing that it defaults to random Tetrominos
selection = set()
for _ in range(10):
    tetris.reset_game()
    selection.add(tetris.next_tetromino())
assert len(selection) > 1 # If it's random, we will see more than one kind

pyboy.stop()

First brick touched the bottom!
Tetris:
Score: 0
Level: 0
Lines: 0
Sprites on screen:
Sprite [4]: Position: (72, 128), Shape: (8, 8), Tiles: (Tile: 129), On screen: True
Sprite [5]: Position: (80, 128), Shape: (8, 8), Tiles: (Tile: 129), On screen: True
Sprite [6]: Position: (88, 128), Shape: (8, 8), Tiles: (Tile: 129), On screen: True
Sprite [7]: Position: (88, 136), Shape: (8, 8), Tiles: (Tile: 129), On screen: True
Sprite [8]: Position: (120, 112), Shape: (8, 8), Tiles: (Tile: 130), On screen: True
Sprite [9]: Position: (128, 112), Shape: (8, 8), Tiles: (Tile: 130), On screen: True
Sprite [10]: Position: (128, 120), Shape: (8, 8), Tiles: (Tile: 130), On screen: True
Sprite [11]: Position: (136, 120), Shape: (8, 8), Tiles: (Tile: 130), On screen: True
Tiles on screen:
       0   1   2   3   4   5   6   7   8   9
____________________________________________
0  |   0   0   0   0   0   0   0   0   0   0
1  |   0   0   0   0   0   0   0   0   0   0
2  |   0   0   0   0   0   0   0   0   

#### Custom functions

Print() with some colors:

In [3]:
def cstr(s):
    if s == ' 0 ': # empty
        return cstr_with_arg(s=s, fg_color='white', bold=False)
    elif s == ' 1 ': # J
        return cstr_with_arg(s=s, fg_color='pure white', bg_color='bright red', bold=True)
    elif s == ' 2 ': # Z
        return cstr_with_arg(s=s, fg_color='pure white', bg_color='bright blue', bold=True)
    elif s == ' 3 ': # O
        return cstr_with_arg(s=s, fg_color='pure white', bg_color='bright green', bold=True)
    elif s == ' 4 ': # L
        return cstr_with_arg(s=s, fg_color='pure white', bg_color='magenta', bold=True)
    elif s == ' 5 ': # T
        return cstr_with_arg(s=s, fg_color='pure white', bg_color='yellow', bold=True)
    elif s == ' 6 ': # S
        return cstr_with_arg(s=s, fg_color='pure white', bg_color='cyan', bold=True)
    elif s == ' 7 ': # I
        return cstr_with_arg(s=s, fg_color='pure white', bg_color='grey', bold=True)
    elif s == ' 8 ': # I
        return cstr_with_arg(s=s, fg_color='black', bg_color='pure red', bold=True)

    return cstr_with_arg(s=s, fg_color='black', bold=False)

In [4]:
def get_color_id(color, is_fg):
    # ANSI color codes : https://en.wikipedia.org/wiki/ANSI_escape_code#Colors
    # RGB values : https://g.co/kgs/K5ciwD1

    if color is str:
        color = color.lower()

    if color == 'black':
        return 30 if is_fg else 40
    elif color == 'red':
        return 31 if is_fg else 41
    elif color == 'green':
        return 32 if is_fg else 42
    elif color == 'yellow':
        return 33 if is_fg else 43
    elif color == 'blue':
        return 34 if is_fg else 44
    elif color == 'magenta':
        return 35 if is_fg else 45
    elif color == 'cyan':
        return 36 if is_fg else 46
    elif color == 'white':
        return 37 if is_fg else 47
    elif color == 'gray' or color == 'grey':
        return 90 if is_fg else 100
    elif color == 'bright red':
        return 91 if is_fg else 101
    elif color == 'bright green':
        return 92 if is_fg else 102
    elif color == 'bright yellow':
        return 93 if is_fg else 103
    elif color == 'bright blue':
        return 94 if is_fg else 104
    elif color == 'bright magenta':
        return 95 if is_fg else 105
    elif color == 'bright cyan':
        return 96 if is_fg else 106
    elif color == 'bright white':
        return 97 if is_fg else 107
    elif color == 'pure white':
        return '38;2;255;255;255' if is_fg else '48;2;255;255;255'
    elif color == 'pure red':
        return '38;2;255;0;0' if is_fg else '48;2;255;0;0'
    elif color == 'pure green':
        return '38;2;64;192;64' if is_fg else '48;2;64;192;64'
    elif color == 'pure blue':
        return '38;2;0;0;255' if is_fg else '48;2;0;0;255'

    return 30 if is_fg else 40 # black by default

In [5]:
def cstr_with_arg(s, fg_color, bold, bg_color=None):
    fg_color_id = get_color_id(fg_color, True)
    bg_color_id = get_color_id(bg_color, False)
    bold_id = 1 if bold else 0

    color = f'{fg_color_id};{bg_color_id}' if bg_color != None else fg_color_id

    return f"\x1b[{bold_id}m\x1b[{color}m{s}\x1b[0m"

In [6]:
# TEST
print(f"     |   |   |   |   |   |   |   |Game|Not\
      \nempty| J | Z | O | L | T | S | I |Over|used\
      \n  {' '.join([cstr(f' {n} ') for n in range(0, 10)])}")

     |   |   |   |   |   |   |   |Game|Not      
empty| J | Z | O | L | T | S | I |Over|used      
  [0m[37m 0 [0m [1m[38;2;255;255;255;101m 1 [0m [1m[38;2;255;255;255;104m 2 [0m [1m[38;2;255;255;255;102m 3 [0m [1m[38;2;255;255;255;45m 4 [0m [1m[38;2;255;255;255;43m 5 [0m [1m[38;2;255;255;255;46m 6 [0m [1m[38;2;255;255;255;100m 7 [0m [1m[30;48;2;255;0;0m 8 [0m [0m[30m 9 [0m


Custom print Game Area:

In [7]:
def better_game_area(game_area, with_indexes=True):
    colored_game_area = ''
    if with_indexes:
        colored_game_area = '      0  1  2  3  4  5  6  7  8  9\n-----------------------------------\n'

    for x, row in enumerate(game_area):
        if with_indexes:
            colored_game_area += f"{'{:02d}'.format(x)} | "
        for y, tile in enumerate(row):
            colored_game_area += cstr(f' {tile} ')
        colored_game_area += '\n'

    return colored_game_area

Get Tetromino form/id:

In [8]:
def get_tetromino_form(tetromino_id):
    if tetromino_id == 0:
        return 'empty'
    if tetromino_id == 1:
        return 'J'
    if tetromino_id == 2:
        return 'Z'
    if tetromino_id == 3:
        return 'O'
    if tetromino_id == 4:
        return 'L'
    if tetromino_id == 5:
        return 'T'
    if tetromino_id == 6:
        return 'S'
    if tetromino_id == 7:
        return 'I'
    if tetromino_id == 8:
        return 'Game Over Wall'
    return None

In [9]:
def get_tetromino_id(tetromino_form):
    if tetromino_form == 'empty':
        return 0
    if tetromino_form == 'J':
        return 1
    if tetromino_form == 'Z':
        return 2
    if tetromino_form == 'O':
        return 3
    if tetromino_form == 'L':
        return 4
    if tetromino_form == 'T':
        return 5
    if tetromino_form == 'S':
        return 6
    if tetromino_form == 'I':
        return 7
    if tetromino_form == 'Game Over Wall':
        return 8
    return -1

In Game functions:

In [13]:
def reset_values():
    reward = 0
    start_time = time.time()
    play_time = 0
    tetromino = tetris.next_tetromino()
    last_second = 0
    time_without_scoring = 0
    no_scoring_timer_decr = no_scoring_timer
    last_lines = 0
    last_levels = 0
    last_spawner_area = np.zeros((2, 4), dtype='int8')
    nb_tetromino_used_at_game_over = 1
    return reward, start_time, play_time, tetromino, last_second, time_without_scoring, no_scoring_timer_decr, \
            last_lines, last_levels, last_spawner_area, nb_tetromino_used_at_game_over

In [14]:
def game_over(play_time, reward, score, lines, nb_tetrominos_used):
    # TODO: Values we have to saved on a DataFrame
    print(cstr_with_arg('GAME OVER', 'pure red', True))
    minutes = int(play_time // 60)
    seconds = int(play_time - minutes * 60)
    milliseconds = int((play_time - minutes * 60 - seconds)*1000)
    print(f"Game Infos:\
                \n-Total Rewards:{cstr_with_arg(reward, 'pure green' if reward > 0 else 'pure red', True)}\
                \n-Game Score:{cstr_with_arg(tetris.score, 'pure green' if tetris.score > 0 else 'pure red', True)}\
                \n-Lines:{cstr_with_arg(tetris.lines, 'pure green' if tetris.lines >= 100 else 'pure red', True)}\
                \n-Time:{'{:02d}:{:02d}.{:03d}'.format(minutes, seconds, milliseconds)}\
                \n-Tetrominos used:{nb_tetromino_used_at_game_over}")

#### Custom Tetris emulation

In [10]:
# \ Constant values: /
#  ------------------
normal_fps = 59.73 # Game Boy runs at 59.73 frames per second

In [23]:
# \ Configuration: /
#  ----------------
skip_title_screen = True
print_fps = False
print_spawner = False
print_game_area = True
auto_restart_at_game_over = True
auto_restart_at_100_lines = True
game_random_seed = None
game_speed = 0 # 0 = No speed limit

# Rewards balancing:
base_scoring_multiplier = 1
no_scoring_timer = 10
no_scoring_penalty = -10
other_screen_timer = 5
other_screen_penalty = -10

In [12]:
# \ Values to save: /
#  -----------------
# TODO
nb_lines_at_game_over = 0
score_at_game_over = 0
reward_at_game_over = 0
time_at_game_over = 0
nb_tetromino_used_at_game_over = 1
inputs_at_time = {} # ex: {1.2sec : Left, 2.5sec : Down} but maybe we can use "experience replay"?

In [25]:
pyboy = PyBoy(rom_path)

pyboy.set_emulation_speed(game_speed)

if skip_title_screen:
    # Skip Title Screen and go directly in game
    tetris = pyboy.game_wrapper
    tetris.game_area_mapping(tetris.mapping_compressed, 0)
    tetris.start_game(timer_div=game_random_seed)
    pyboy.tick()

# \ Start values: /
#  ---------------
tetromino = tetris.next_tetromino()
score = 0
reward = 0
start_time = time.time() # not used
play_time = 0
fps = 0
last_time_fps = time.time()
time_scale = 1
delta_time = 0
last_second = 0
time_without_scoring = 0
no_scoring_timer_decr = no_scoring_timer
last_lines = 0
last_levels = 0
last_spawner_area = np.zeros((2, 4), dtype='int8') # Ths spawner area is an area of shape (2, 4) on top of the game area

while pyboy.tick():
    # TODO: Does rewind action (input ',') affect correctly delta_time, fps and time_scale? Allow RL model to use this action?
    delta_time = time.time() - last_time_fps
    fps = round(1 / delta_time, 0)
    time_scale = fps / normal_fps

    play_time += delta_time * time_scale
    time_without_scoring += delta_time * time_scale

    spawner_area = tetris.game_area()[1:3,3:7]

    # 1sec passed (on current game speed)
    if last_second + 1 < play_time:
        if print_fps:
            print(f'FPS: {fps}')
        last_second += 1

    # TODO: if current_screen == 'in_game': > detect with a CNN model, to predict current_screen value
    # > all our code below
    # else:
    # > reward penalty

    # Check time without scoring (for penalty)
    if time_without_scoring > no_scoring_timer_decr:
        reward += no_scoring_penalty
        print(cstr_with_arg(f'{no_scoring_penalty}pts because {no_scoring_timer_decr}sec passed without scoring !', 'pure red', True))
        print(f"  Reward={cstr_with_arg(reward, 'pure green' if reward > 0 else 'pure red', True)}")
        time_without_scoring = 0
        no_scoring_timer_decr = 1 if no_scoring_timer_decr - 1 < 1 else no_scoring_timer_decr - 1

    # Check when a new tetromino spawn
    if not np.array_equal(last_spawner_area, spawner_area):
        unique = np.unique(last_spawner_area, return_counts=True)
        count_last = {value:count for value, count in zip(unique[0], unique[1])}

        unique = np.unique(spawner_area, return_counts=True)
        count_curr = {value:count for value, count in zip(unique[0], unique[1])}

        for t_id in range(1, 8):
            if count_curr.get(t_id, 0) - count_last.get(t_id, 0) == 4:
                print(cstr_with_arg(f'Add new tetromino {get_tetromino_form(t_id)}', 'pure green', True))
                nb_tetromino_used_at_game_over += 1
                if print_spawner:
                    print(better_game_area(spawner_area, False))

                adding_score = (tetris.score - score) * base_scoring_multiplier
                reward += adding_score if adding_score > 0 else 0
                print(f'Level: {tetris.level} | Lines: {tetris.lines}')
                if tetris.lines > last_lines:
                    print(f"{cstr_with_arg(f'{tetris.lines - last_lines} line(s)!', 'pure green', True)}")
                    last_lines = tetris.lines
                if tetris.level > last_levels:
                    print(f"{cstr_with_arg('Level up!', 'pure green', True)}")
                    last_levels = tetris.level
                if adding_score > 0:
                    print(f"{cstr_with_arg(f'+{tetris.score - score} pt(s) with Tetromino {tetromino}', 'pure green', True)}")
                    time_without_scoring = 0
                    no_scoring_timer_decr = no_scoring_timer
                print(f"Reward={cstr_with_arg(reward, 'pure green' if reward > 0 else 'pure red', True)}")
                if print_game_area:
                    print(better_game_area(tetris.game_area()))

                score = tetris.score
                tetromino = get_tetromino_form(t_id)

                break

    # Check Game Over
    if 8 in tetris.game_area() or (tetris.lines >= 100 and auto_restart_at_100_lines):
        game_over(play_time, reward, tetris.score, tetris.lines, nb_tetromino_used_at_game_over)
        if auto_restart_at_game_over:
            tetris.reset_game()
            # reset values
            reward, start_time, play_time, tetromino, last_second, time_without_scoring, no_scoring_timer_decr, \
            last_lines, last_levels, last_spawner_area, nb_tetromino_used_at_game_over = reset_values()

    last_spawner_area = spawner_area
    last_time_fps = time.time()

pyboy.stop()

[1m[38;2;64;192;64mAdd new tetromino S[0m
Level: 0 | Lines: 0
Reward=[1m[38;2;255;0;0m0[0m
      0  1  2  3  4  5  6  7  8  9
-----------------------------------
00 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
01 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;255;255;46m 6 [0m[1m[38;2;255;255;255;46m 6 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
02 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;255;255;46m 6 [0m[1m[38;2;255;255;255;46m 6 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
03 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
04 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0

[1m[38;2;255;0;0m-10pts because 7sec passed without scoring ![0m
  Reward=[1m[38;2;255;0;0m-40[0m
[1m[38;2;64;192;64mAdd new tetromino O[0m
Level: 0 | Lines: 0
Reward=[1m[38;2;255;0;0m-40[0m
      0  1  2  3  4  5  6  7  8  9
-----------------------------------
00 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
01 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;255;255;102m 3 [0m[1m[38;2;255;255;255;102m 3 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
02 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;255;255;102m 3 [0m[1m[38;2;255;255;255;102m 3 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
03 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[3

[1m[38;2;255;0;0m-10pts because 2sec passed without scoring ![0m
  Reward=[1m[38;2;255;0;0m-90[0m
[1m[38;2;255;0;0m-10pts because 1sec passed without scoring ![0m
  Reward=[1m[38;2;255;0;0m-100[0m
[1m[38;2;255;0;0m-10pts because 1sec passed without scoring ![0m
  Reward=[1m[38;2;255;0;0m-110[0m
[1m[38;2;255;0;0m-10pts because 1sec passed without scoring ![0m
  Reward=[1m[38;2;255;0;0m-120[0m
[1m[38;2;255;0;0m-10pts because 1sec passed without scoring ![0m
  Reward=[1m[38;2;255;0;0m-130[0m
[1m[38;2;64;192;64mAdd new tetromino L[0m
Level: 0 | Lines: 0
Reward=[1m[38;2;255;0;0m-130[0m
      0  1  2  3  4  5  6  7  8  9
-----------------------------------
00 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
01 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;255;255;45m 4 [0m[1m[38;2;255;255;255;45m 4 [0m[1m[38;2;255;255;2

[1m[38;2;255;0;0m-10pts because 1sec passed without scoring ![0m
  Reward=[1m[38;2;255;0;0m-180[0m
[1m[38;2;64;192;64mAdd new tetromino I[0m
Level: 0 | Lines: 0
Reward=[1m[38;2;255;0;0m-180[0m
      0  1  2  3  4  5  6  7  8  9
-----------------------------------
00 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
01 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;255;255;100m 7 [0m[1m[38;2;255;255;255;100m 7 [0m[1m[38;2;255;255;255;100m 7 [0m[1m[38;2;255;255;255;100m 7 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
02 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;255;255;101m 1 [0m[1m[38;2;255;255;255;101m 1 [0m[1m[38;2;255;255;255;101m 1 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m
03 | [0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[0m[37m 0 [0m[1m[38;2;255;

In [87]:
# Close the game windows if the previous cell crashes
pyboy.stop()

## Learning Models

### CNN Model (predict screen type)

Learn all screens type (Title Screen, Pause, In Game, Game Over, Learderboard, etc.), and predict current screen type to run all our code only on the "In Game" screen, and allowing us to automatically restart the game when Game Over is reached.

In [None]:
# TODO

### RL Model (perform to the WR!)

Tetris (GameBoy 1989) speedruns : https://www.speedrun.com/fr-FR/tetrisgb?h=100_Lines_Level_0_Start&x=z277yr42
> World Record "100 Lines, Level 0 Start" to beat: 4min 24sec 133ms...

<pre>Rewards:
-X*scoring

Penalties:
-Xsec without scoring
-Not in the "In Game" screen</pre>

Video of Reinforcement Learning on Pokemon Red (GameBoy, emulated on PyBoy): https://www.youtube.com/watch?v=DcYLT37ImBY
> And project github: https://github.com/PWhiddy/PokemonRedExperiments

Custom environment (PyBoy) inherited from Gymnasium: https://rotational.io/blog/reinforcement-learning-automation-and-tetris/
> To be able to use it with PPO model

Proximal Policy Optimization (PPO) model documentation: https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html

In [None]:
# TODO