# Introduction, Motivation and/or Problem Statement 

The ability of an AI to play video games comparably to a human, has been a great interest in the AI community. AI has managed to succeed in a wide variety of games, but in a lot of cases the AI is directly fed in the data rather than obtaining it through what's shown on the screen. As such, these AI models do not accurately reflect a human player, since they have access to exact data values and information which a human player would be unable to infer from looking at the screen. However, https://arxiv.org/pdf/1312.5602.pdf, showed that the game Atari Breakout was playable by a deep learning model which inferred all the information needed directly from the screen through the use of a CNN. This type of model is a better representation of an AI playing a game in the same way a human player would. 

The game Flappy Bird has been a popular choice for the testing of new deep learning models which are designed for games. These models fall under the area of reinforcement learning. When training reinforcement learning models in game-playing capabilities, a common trend is to utilise genetic models such as the NEAT algorithm, Experience replay models like Deep Q-Learning or model-free methods such as Q-Learning. Many of these methods require a direct tap into the variables of the game and as a result, rely on self-implementations that don’t directly reflect the true state of the game. Furthermore, a model knowing the exact values of variables isn’t reflective of a human player playing the game.  

As a result, the aim of this paper is to implement a model that utilises CNN classification techniques alongside the game ‘Flappy Bird’ to identify game objects, and provide data to another algorithm for Reinforced Learning. This should allow the model to be used on any implementation of ‘Flappy Bird’, regardless of game screen size or implementation specific variables, giving the implementation an edge over other models in applicability. Also, a CNN gathering information from the pixels is better representative of a human player, who acts solely based on what’s shown on the screen. From prior research, it has been hypothesised that the Deep Q-Learning model, with its capability to generalise actions over large datasets will be the most suitable approach to combine with this CNN model.

# Data Sources or RL Task

The task is to teach a model to play the game of flappy bird perfectly. The flappy bird game involves a bird which is usually controlled by a human player. The bird has 1 action which it can take, *flap*, which causes the bird to increase its height and stop it from falling temporarily. There are also a sequence of vertical pipe pairs which are seperated by a gap and the player receives a score point if the bird can fly through the gap without hitting the pipes. If the bird flies into one of the pipes or hits the ground, the player loses. The goal of the game is to fly as long as possible through the pipe gaps and get the highest score without colliding with the pipes or ground. This is summarised in the images below.

<img src="./notebook_assets/trainingBird.gif" width="200" height="400" />

<em>Training Bird</em>

<img src="./notebook_assets/Flappybird.gif" width="700" height="400" />

<em>Trained Bird</em>

Most deep learning model attempts at Flappy Bird rely on the input data being directly provided to the model. However, there has been attempts at using a CNN to infer the data from the screen. Github user yenchenlin (https://github.com/yenchenlin/DeepLearningFlappyBird), built a working implementation of a CNN + QLearning model which is able to play the game. However, it should be noted that in his implementation, the game is quite forgiving. This implementation has lower pipe velocity and slower bird movement, allowing for the model to make a mistake without dying. This however isn't an accurate representation of the Flappy Bird game, which was originally quite fast paced. So, in order to preserve accuracy, we decided to make the game as difficult as the original game, keeping a high pipe velocity and quick bird movements. As a result, if the model makes a single mistake it will always lose.

The game can be played by running the next 5 cells if the image assets are given.

In [8]:
class Bird:
    def __init__(self):
        self.vel = 0
        self.x = BIRD_SIZE
        self.y = SCREEN_HEIGHT // 2 - 25
        self.tick_count = 0

    def move_bird(self):
        self.vel += GRAVITY
        self.y += self.vel

    def flap(self):
        self.vel = -10.5

In [9]:
import pygame
import random
import neat
import pickle
import os
import torch

SCREEN_WIDTH = 550
SCREEN_HEIGHT = 850
BIRD_SIZE = 50
PIPE_WIDTH = 75
GAP_SIZE = 200
PIPE_HEIGHT = PIPE_WIDTH * 854 / 52
VISUALISE = True
MAX_FRAME_RATE = 60
FITNESS_THRESHOLD = 100
GRAVITY = 0.6


pipe_img = pygame.transform.scale(pygame.image.load(os.path.join("imgs","pipe.png")).convert_alpha(), (PIPE_WIDTH, PIPE_HEIGHT))
bg_img = pygame.transform.scale(pygame.image.load(os.path.join("imgs","bg.png")).convert_alpha(), (600, 900))
bird_img = pygame.transform.scale(pygame.image.load(os.path.join("imgs","bird1.png")).convert_alpha(), (BIRD_SIZE, BIRD_SIZE))
base_img = pygame.transform.scale2x(pygame.image.load(os.path.join("imgs","base.png")).convert_alpha())
screenshots = torch.empty(0,3,600,800)
frames = []
output = []
outputs = torch.empty(0,3)
moved = False

class Game:
    def __init__(self, screen):
        self.screen = screen
        self.clock = pygame.time.Clock()
        self.bird = Bird()
        self.pipe_height1 = random.randint(100, 400)
        self.pipe_x1 = SCREEN_WIDTH
        self.pipe_height2 = random.randint(100, 400)
        # manually found starting value for x2 for pipes to be equidistant
        self.pipe_x2 = 1.5 * SCREEN_WIDTH + 40
        self.score = 0

    def move_pipes(self):
        self.pipe_x1 -= 5
        self.pipe_x2 -= 5
        if self.pipe_x1 < -PIPE_WIDTH:
            self.pipe_x1 = SCREEN_WIDTH
            self.pipe_height1 = random.randint(100, SCREEN_HEIGHT - GAP_SIZE)
            self.score += 1
            return True
        elif self.pipe_x2 < -PIPE_WIDTH:
            self.pipe_x2 = SCREEN_WIDTH
            self.pipe_height2 = random.randint(100, SCREEN_HEIGHT - GAP_SIZE)
            self.score += 1
            return True
        return False


    def check_collisions(self, bird):
        collision_occurred = False
        if bird.y > SCREEN_HEIGHT or bird.y < -BIRD_SIZE:
            collision_occurred = True
        if bird.x + BIRD_SIZE > self.pipe_x1 and bird.x < self.pipe_x1 + PIPE_WIDTH:
            if bird.y < self.pipe_height1 or bird.y + BIRD_SIZE > self.pipe_height1 + GAP_SIZE:
                collision_occurred = True
        elif bird.x + BIRD_SIZE > self.pipe_x2 and bird.x < self.pipe_x2 + PIPE_WIDTH:
            if bird.y < self.pipe_height2 or bird.y + BIRD_SIZE > self.pipe_height2 + GAP_SIZE:
                collision_occurred = True
        return collision_occurred

    def process_game_events(self):
        global moved
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                return True
            if event.type == pygame.KEYDOWN:
                if event.key == pygame.K_UP:
                    self.bird.flap()
                    moved = True
        return False

    def draw_game(self, birds):
        global SCREEN
        self.screen.blit(bg_img, (0,0))
        for bird in birds:
            #pygame.draw.rect(self.screen, (255, 255, 255,10), (bird.x, bird.y, BIRD_SIZE, BIRD_SIZE))
            self.screen.blit(bird_img, (bird.x,bird.y))
        #pygame.draw.rect(self.screen, (255, 255, 255), (self.pipe_x1, 0, PIPE_WIDTH, self.pipe_height1))
        #pygame.draw.rect(self.screen, (255, 255, 255), (self.pipe_x1, self.pipe_height1 + GAP_SIZE, PIPE_WIDTH, SCREEN_HEIGHT - self.pipe_height1 - GAP_SIZE))
        self.screen.blit(pipe_img, (self.pipe_x1, self.pipe_height1 + GAP_SIZE))
        self.screen.blit(pygame.transform.flip(pipe_img, False, True), (self.pipe_x1, self.pipe_height1 - PIPE_HEIGHT))
        
        #pygame.draw.rect(self.screen, (255, 255, 255), (self.pipe_x2, 0, PIPE_WIDTH, self.pipe_height2))
        #pygame.draw.rect(self.screen, (255, 255, 255), (self.pipe_x2, self.pipe_height2 + GAP_SIZE, PIPE_WIDTH, SCREEN_HEIGHT - self.pipe_height2 - GAP_SIZE))
        self.screen.blit(pipe_img, (self.pipe_x2, self.pipe_height2 + GAP_SIZE))
        self.screen.blit(pygame.transform.flip(pipe_img, False, True), (self.pipe_x2, self.pipe_height2 - PIPE_HEIGHT))

        pygame.display.update()

    def run(self):
        # initial screen
        if VISUALISE: 
            self.draw_game([self.bird])
        intial_running = True
        while intial_running:
            for event in pygame.event.get():
                if event.type == pygame.QUIT:
                    pygame.quit()
                    exit(0)
                if event.type == pygame.KEYDOWN:
                    intial_running = False
        # Game loop
        while True:
            # if quit signal returned, end game 
            if self.process_game_events(): break
            # move game entities
            self.bird.move_bird()
            self.move_pipes()
            # if collision occurred, end game
            if self.check_collisions(self.bird): break
            # Update display
            if VISUALISE: self.draw_game([self.bird])
            pygame.display.update()
            self.clock.tick(MAX_FRAME_RATE)
        print(self.score)

ModuleNotFoundError: No module named 'neat'

In [None]:
def load_network():
    local_dir = os.path.abspath('')
    config_path = os.path.join(local_dir, "config.txt")
    config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction, neat.DefaultSpeciesSet, neat.DefaultStagnation, config_path)
    with open("best.pickle", "rb") as f:
        genome = pickle.load(f)
    return neat.nn.FeedForwardNetwork.create(genome, config)

In [None]:
# play the game
pygame.init()
screen = pygame.display.set_mode((SCREEN_WIDTH, SCREEN_HEIGHT))
game_instance = Game(screen)
network = load_network()
game_instance.run()
pygame.quit()

4


The RL task is to train the bird to achieve a high score and be able to fly perfectly through the pipes. When training reinforcement learning models in game-playing capabilities, a common trend is to utilise genetic models such as the NEAT algorithm, Experience replay models like Deep Q-Learning or model-free methods such as Q-Learning. Many of these methods require a direct tap into the variables of the game and as a result, rely on self-implementations that dont directly reflect the true state of the game. As a result, the aim of this paper is to implement a model that utilises CNN classification techniques to identify game objects, and provide data to another algorithm for Reinforced Learning. This should allow the model to be used on any implementation of Flappy Bird, regardless of gamescreen size or particular game assets, giving the implementation an edge over other models. From prior research, it has been hypothesised that the Deep Q-Learning model, with its capability to generalise actions over large datasets will be the most suitable approach to combine with this CNN model.

# Exploratory Analysis of Data or RL Tasks

The RL task relies on having input information to base its decision on. In this game, the input data values are:

- The Bird's x coordinate
- The Bird's y coordinate
- The first pipe's x coordinate
- The first pipe's y coordinate
- The second pipe's x coordinate
- The second pipe's y coordinate

The output of a given model will be the decision on whether to flap or not (a boolean value).

The constants for this environment include:
Scaled game
- Screen width: 288
- Screen height: 512
- Vertical gap between pipes: 100
- Pipe velocity: 4

Non-scaled game
- Screen width: 550
- Screen height: 850
- Vertical gap between pipes: 150
- Pipe velocity: 4

Whilst most models for games are fed this data directly in the program, the project will also use a CNN approach which only utilises the screen pixels and no direct data. The CNN model will approximate the input data values from above and feed them into a Deep Q-Learning model. For this type of model, preprocessing is applied on the images of the game. This data is generated by running a model from the Q-Learning method and saving each frame as a tensor as well as the corresponding input data values (defined above). Each frame of the game will have 3 channels corresonding to red, blue, and green. In pygame, each of these values range from 0 to 255 and so to standardise the values, each of the channel values is divided by 255. The training data values are also normalised by dividing by either the screen height or width depending on which data value it is. For example, the bird's y-coordinate is in the range of 0 to SCREEN_HEIGHT and so it is divided by the SCREEN_HEIGHT to give a normalised value between 0 and 1.

A diagram showing the preprocessing of the image is shown below.

<img src="imgs/preprocessing.png">

# Models and/or Methods


### CNN Model Structure
- Convolution layer with 32 filters and kernel size 5
- Maxpooling kernel size 10
- Convolution layer with 64 filters and kernel size 5
- Fully connected layer with 100 hidden nodes
- Output layer with 3 nodes corresponding to the data values

All layers have a RELU activation except the output layer which has a sigmoid activation. Each of the 3 output values are between 0 and 1 due to the sigmoid activation.

### Deep Q-Learning Model Structure
- Add 
- Our 
- Final
- Structure

### Combining These Models
To combine the models, the CNN outputs (Bird x, Bird y, First Pipe x, First Pipe y, Second Pipe x, Second pipe y), will be used as inputs directly into the Deep Q-Learning Model. Note that due to velocity being difficult to track given the research time, it was decided to simply take velocity straight from the game values. There are a few ways this could have been implemented. One way is to change the rotational componenent of the bird image depending on its velocity. Another way would be to input a sequence of images rather than a single image, for example joining 5 successive frames together into a larger image which would allow the CNN to see the change in the birds position, allowing it to infer its velocity.

The values of the outputs of the CNN are in the range of 0 to 1. So, they are scaled back up to the screen size so that they can provide the coordinate positions for the input data values. These coordinates are then fed into the Q-Learning model creating a CNN Q-Learning model. A model that identifies pixel values through a CNN, uses these as inputs for a Deep Q-Learning model, which finally outputs the "Flap" or "No Flap" condition.

# Results

We need to basically just put all of the metrics here, and what they are.

# Discussion

Talk about each of the results, the trends that are there, what they could mean. Etc.