# Reinforcement Learning Techniques in Flappy Bird

Reinforcement learning has been the foundation of breakthroughs in go, chess, and protein folding [^1][^2][^3]. It works by using neural networks to in a simulated environment and rewarding it for good behavior. Here, I apply reinforcement learning to the mobile game flappy bird[^4].

Continuation of a project that I started two years ago, a final project for a deep learning course. I didn't end up getting it working then, but have been working on it since.

[^1]: [Mastering the Game of Go without Human Knowledge](https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf)\
[^2]: [Mastering Chess and Shogi by Planning with a Tree Search](https://arxiv.org/pdf/1712.01815.pdf)\
[^3]: [Highly Accurate Protein Structure Prediction with AlphaFold](https://www.nature.com/articles/s41586-021-03819-2)\
[^4]: [Flappy Bird](https://flappy-bird.co)

## Overview of Reinforcement Learning
In reinforcement learning, an agent lives in an environment specified by an event loop. The agent must observe the relationship between its actions and the environment's response. The environment's response consists of two things: (1) a reward signal, and (2) a new state.

``` python
observation = env.get_initial_state()
while True:
    action = agent.act(observation)
    observation, reward, done, info = env.step(action)
    agent.learn(observation, reward)
```
Formally, the environment is defined by four things: (1) the state space $S$, (2) the action space $A$, (3) the transition function $P$, and (4) the reward function $R$.

- The state space $S$ is the set of all possible states the agent can observe. 
- The action space $A$ is the set of all possible actions the agent can take.
- The transition function $P(s'|s,a) \rightarrow [0,1]$ is the probability of transitioning from state $s \in S$ to state $s' \in S$ given action $a \in A$. 
- The reward function $R(s,a) \rightarrow \mathbb{R}$ is the reward the agent receives for transitioning from state $s$ to state $s'$.

When the agent interacts with the environment, its interaction can be described as a sequence of states, actions, and rewards:

$$
(s_0, a_0, r_0, s_1, a_1, r_1, \cdots)
$$

[^6]: [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2020.pdf)

## Flappy Bird Environment
I used the flappy bird gymnasium package to create the environment. 
The Flappy Bird environment is characterized by a state space, an action space, and a reward function. 
1. The state space $S \in \mathbb{R}^{12}$ is a vector of 12 numbers, representing various distances, velocities, and angles related to the bird and the pipes [^8]. 
2. The action space $A = \{0, 1\}$ allows the agent to either "do nothing" or "flap". 
3. The transition probability function $P$ is unknown, and the agent must learn it.
4. The reward function $R$ provides feedback based on the agent's state transitions, rewarding survival, successful pipe navigation, and penalizing death. The reward function is defined as:

 $R(s, a, s') = \begin{cases} +0.1 & \text{if the agent is alive} \\ +1.0 & \text{if the agent passes through a pipe} \\ -1.0 & \text{if the agent dies} \end{cases}$


[^7]: [Flappy Bird Gymnasium](https://github.com/Kautenja/flappy-bird-gymnasium)\
[^8]: [Flappy Bird State Vector](https://github.com/markub3327/flappy-bird-gymnasium)

## Sanity Check for the Flappy Bird Environment

Before exploring reinforcement learning techniques, it is important to sanity check the flappy bird environment. Sanity testing the environment will make us more knowledgeable when debugging future issues.

In [19]:
import flappy_bird_gymnasium
import gymnasium
import numpy as np
from tqdm import tqdm
from agent import Agent

In [20]:
import gymnasium
import cv2
import numpy as np
from IPython.display import Video

# Create the environment
env = gymnasium.make("FlappyBird-v0", 
                     render_mode="rgb_array",
                     use_lidar=False,
                     normalize_obs=True)

fourcc = cv2.VideoWriter_fourcc(*'avc1')  
out = cv2.VideoWriter('flappy_bird.mp4', fourcc, 30.0, (288, 512))

env.reset()
while True:
    obs, reward, terminal, _, _ = env.step(env.action_space.sample())
    image = env.render()
    
    # Convert the image from RGB to BGR
    image_bgr = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    
    # Write the frame to the video file
    out.write(image_bgr)
    
    if terminal:
        break    

# Release resources
env.close()
out.release()

# Display the converted video in the notebook
Video("flappy_bird.mp4", embed=True)

### Agent Class
