# Let DQN play Flappy Bird

## 1. Abstract

Flappy bird was a popular game for its high difficulty with an easy-understanding control. What the player needs to do is tapping the screen to make the bird fly higher or doing nothing to drop the bird, in order to let the bird fly over pipes. Though the control is easy, getting good scores is a hard problem. Then training an AI agent who is able to perfectly play this game would be a really interesting project.

<img src=“image/FlappyBird.PNG”, width=100, height=200>

Our group though it is a great model for us to learn how DQN works, since in this specific game, an agent only has two actions in each state and the description of the state can be simplified as several parameters. That’s might be the reason why so many DQN tutorials use Flappy Bird as the example.

## 2. Game Environment

The game enviroment, found on github, is a python based Flappy Bird version. The funtion below is the main funtion in this game which is used to update the game state and display the game screen. We have modified the parameters and the return states to accelerate the training process. The return value info is game state in Q Learning.

In [3]:
def frame_step(self, input_actions):
        pygame.event.pump()

        reward = 1.0
        terminal = False

        # input_actions[0] == 1: do nothing
        # input_actions[1] == 1: flap the bird
        if input_actions[1] == 1:
            if self.playery > -2 * PLAYER_HEIGHT:
                self.playerVelY = self.playerFlapAcc
                self.playerFlapped = True
                #SOUNDS['wing'].play()

        # check for score
        playerMidPos = self.playerx + PLAYER_WIDTH / 2
        for pipe in self.upperPipes:
            pipeMidPos = pipe['x'] + PIPE_WIDTH / 2
            if pipeMidPos <= playerMidPos < pipeMidPos + 4:
                self.score += 1
                #SOUNDS['point'].play()
                reward = 500

        # playerIndex basex change
        if (self.loopIter + 1) % 3 == 0:
            self.playerIndex = next(PLAYER_INDEX_GEN)
        self.loopIter = (self.loopIter + 1) % 30
        self.basex = -((-self.basex + 100) % self.baseShift)

        # player's movement
        if self.playerVelY < self.playerMaxVelY and not self.playerFlapped:
            self.playerVelY += self.playerAccY
        if self.playerFlapped:
            self.playerFlapped = False
        self.playery += min(self.playerVelY, BASEY - self.playery - PLAYER_HEIGHT)
        if self.playery < 0:
            self.playery = 0

        # move pipes to left
        for uPipe, lPipe in zip(self.upperPipes, self.lowerPipes):
            uPipe['x'] += self.pipeVelX
            lPipe['x'] += self.pipeVelX

        # add new pipe when first pipe is about to touch left of screen
        if 0 < self.upperPipes[0]['x'] < 5:
            newPipe = getRandomPipe()
            self.upperPipes.append(newPipe[0])
            self.lowerPipes.append(newPipe[1])

        # remove first pipe if its out of the screen
        if self.upperPipes[0]['x'] < -PIPE_WIDTH:
            self.upperPipes.pop(0)
            self.lowerPipes.pop(0)

        # check if crash here
        isCrash= checkCrash({'x': self.playerx, 'y': self.playery,
                             'index': self.playerIndex},
                            self.upperPipes, self.lowerPipes)
        if isCrash:
            #SOUNDS['hit'].play()
            #SOUNDS['die'].play()
            terminal = True
            #reward = - abs(self.playery - self.lowerPipes[0]['y']) /40.0
            self.__init__()
            reward = -10

        FPSCLOCK.tick(FPS)

        #print('index = ',self.playerIndex,'playerx = ',self.playerx, 'playery = ',self.playery ,'lowerpipes =', self.lowerPipes[0]['x'])
        #print self.upperPipes[0]['y'] + PIPE_HEIGHT - int(BASEY * 0.2)
        #return image_data, reward, terminal

        info = np.array([self.playery,self.playerAccY, self.playerFlapAcc, self.lowerPipes[0]['x'],self.lowerPipes[0]['y'],self.upperPipes[0]['y']])
        return info, reward, terminal

## 3. Design of Deep Q-Network

### 3.1 The structuce of DQN


### 3.2 The adjust process of Neural Network

### 3.3 Trainging Model part 1

In [1]:
%%HTML
<video width="320" height="240" controls>
  <source src="video/Normal_SameHigh.mp4" type="video/mp4">
</video>

In [2]:
def getRandomPipe():
    """returns a randomly generated pipe"""
    # y of gap between upper and lower pipe
    gapYs = [20, 30, 40, 50, 60, 70, 80, 90]
    index = random.randint(0, len(gapYs)-1)
    #gapY = gapYs[index]
    gapY = gapYs[0]

    gapY += int(BASEY * 0.2)
    pipeX = SCREENWIDTH + 10

    return [
        {'x': pipeX, 'y': gapY - PIPE_HEIGHT*1.2},  # upper pipe
        {'x': pipeX, 'y': gapY + PIPEGAPSIZE*1.2},  # lower pipe
    ]

Here we first set the gapY to a constant value. As you can see in the video, all the pipes are at the same height which accelerate the training process. We want to fasten the training process so that we can easily test whether our network comes into effect.

### 3.4 Training Model part 2

In [5]:
%%HTML
<video width="320" height="240" controls>
  <source src="video/Normal_Random.mp4" type="video/mp4">
</video>

```
index = random.randint(0, len(gapYs)-1)  
gapY = gapYs[index]  
```
Here we set the gapY to a random value so in the video you can see the height of the pipes are different. In the previous model, the position of the pipes doesn't matter but here they are important states.   
```
{'x': pipeX, 'y': gapY - PIPE_HEIGHT*1.2},  
{'x': pipeX, 'y': gapY + PIPEGAPSIZE*1.2},  
```
Considering the time limit, the window size we set is 1.2 times of the initial version. This also accelerate the learning process and we have time to apply different parameters.

## 4. Conclusion

## Reference

Our game model: https://github.com/floodsung/DRL-FlappyBird/blob/master/game/wrapped_flappy_bird.py