## Enter Your Name

Before starting the game, please enter the provided token as the variable below

In [2]:


token = "[token here]"



# Flappy Bird Played by AI

This notebook demonstrates how to use [gymnasium](https://gymnasium.farama.org/index.html) to train a Reinforcement Learning(RL) model to play a flappy bird game. This is based on some code found [here](https://jeffersonlab.github.io/AIOP-PIER/examples/Flappy_Bird_gymnasium/).

In this version, designed for Google Colab, the game itself is implemented in Javascript to allow for smooth play in the browser. A python version of the game also exists at the above link and is used in the actual model training. The Python and Javascript versions are made to be pixel-to-pixel identical so that the AI trained on Python can be used in Javascript.

## Human Playable Game
This first cell below just pulls in a javascript version of a simple flappy bird game. Running the cell will load the game in a small canvas below the cell and you can play it by hitting the space bar to flap and then hitting *enter* (or *return*) to start a new game once the game ends.

This human playable version demonstrates what you have to do to play the game and therefore, what the AI is going to need to learn in order to play it.

*note: you may need to click on the game canvas itself for the keyboard focus to go there.*

In [None]:
import random
pt = token+str(random.randint(100,1000000))

from IPython.display import display, HTML
display(HTML(f'''
    <canvas id="flappyCanvas" width="512" height="256"></canvas>
    <script>
        // Set the player token as a global variable
        var playerToken = "{pt}";
        console.log("Player token from Python:", playerToken);
    </script>
    <script src="https://jeffersonlab.github.io/AIOP-PIER/examples/Flappy_Bird_gymnasium/flappy_game.js"></script>
'''))

Every time you play the game please submit your score below!

In [None]:
my_score=input("Enter your score: ")


from IPython.display import display, HTML
display(HTML(f'''
    <canvas id="flappyCanvas" width="512" height="256"></canvas>
    <script>
        // Set the player token as a global variable
        var playerToken = "{pt}";
        var myScore = "{my_score}";

    </script>
    <script src="https://jeffersonlab.github.io/AIOP-PIER/examples/Flappy_Bird_gymnasium/manual_score.js"></script>
'''))

## Download Python packages needed to train the AI

This next cell will download a Python version of the above game, but one that uses the same format (pixel-for-pixel) as the above Javascript version. The training will use the Python version, but will not actually draw the screen and will speed up time to play many games every second.

The following cell will also install the AI python packages that will be used to define and train an AI model to play the game.

In [None]:
!wget https://jeffersonlab.github.io/AIOP-PIER/examples/Flappy_Bird_gymnasium/flappy_game.py
%pip install gymnasium stable_baselines3 onnx

## RL Learning Environment

The following cell contains Python code that defines the Reinforcement Learning(RL) environment of the *gymnasium* package. *Gymnasium* is a popular tool for doing RL which is a certain category of AI that is useful for things that have a time sequence. Playing a game falls into this category since the game really is just a series of frames being drawn one after the other. These are called *time steps*. What makes this different than other AI applications is that the AI needs to decide what to do **now** in order to reach a desirable outcome **in the future**. Specifically, should if flap or not flap at the current time step in order to miss the next obstacle it is heading towards.

In [None]:
import gymnasium as gym
import numpy as np
import flappy_game as game

class FlappyEnv(gym.Env):

    def __init__(self, render_mode=None):

        # Action space
        self.action_space = gym.spaces.Discrete(2) # 0=no flap  1=flap

        # Observation space: ydiff, y_velocity, x_obstacle
        self.observation_space = gym.spaces.Box(low=np.array([-game.SCREEN_HEIGHT, -np.inf, 0]),
                                                high=np.array([+game.SCREEN_HEIGHT, +np.inf, game.SCREEN_WIDTH]),
                                                dtype=np.float32)

        # Create instance of playable FlappyGame object
        self.game = game.FlappyGame()

        # Reset game and instatiate objects
        self.reset()

    def step(self, action):
      collision = self.game.step_game(action)
      self.done = collision

      obs = self.game.get_current_state(self.game.player, self.game.obstacles)

      # Initialize reward
      reward = 0.3  # Small reward for staying alive

      # Reward for passing a pipe
      if self.game.score > self.previous_score:
        reward += 10  # Large reward for passing a pipe
        self.previous_score = self.game.score

      # Penalty for crashing
      if collision:
        reward = -100

      terminated = self.done
      truncated = False
      info = {}
      return obs, reward, terminated, truncated, info

    def reset(self, seed=None):
      super().reset(seed=seed)
      self.game.reset()
      self.done = False
      self.previous_score = 0  # Reset previous score

      obs = self.game.get_current_state(self.game.player, self.game.obstacles)
      info = {}
      return obs, info


    def render(self):
        self.game.render_frame()


## Train the AI Model

The following cell will use the **FlappyEnv** class defined in the previous cell to train an AI model using *Reinforcement Learning* (RL). Running this will take a while so you might want to start it running and then read on while it goes.

The line that sets the value of *net_arch* actually determines how big and complex the model is that we are training and using. The values [128, 64, 32] specify that the model should use 3 *hidden* layers and that the first one should have 128 nodes, the second 64 nodes, and the third 32 nodes. You can add another number to this list to add a fourth hidden layer or change these numbers to adjust the size of the model. The more layers and the bigger the numbers are, the more parameters are in the model. In principle, a bigger model can learn more, but it takes longer to learn. It may also make it more difficult for it to learn a simple task. For example, if the model is just way more complicated than the problem it is trying to solve. This is why there is an *art* to defining a model's *architecture* so that it actually solves the problem it is supposed to.

In [None]:
from stable_baselines3 import PPO, A2C
import time

start_time = time.time()
# Create the environment
flappyenv = FlappyEnv(render_mode="human")

# Define the policy architecture
policy_kwargs = dict(
    net_arch=[64,64, 32],  # Adjust the architecture as needed
)

# Create the PPO model
model = PPO("MlpPolicy", flappyenv, policy_kwargs=policy_kwargs, learning_rate=2e-4, verbose=1,n_steps=512)
#model = A2C("MlpPolicy", flappyenv, policy_kwargs=policy_kwargs, learning_rate=3e-4,n_steps=512,verbose=1)
# Train the model
model.learn(total_timesteps=200000)

end_time = time.time()
elapsed_time = end_time - start_time
print(f"Training time: {elapsed_time:.2f} seconds")
# Save the fully trained PPO model (actor, critic, action)
model_name_ppo = "flappy_bird_rl_model.ppo"
model.save(model_name_ppo)
print(f"PPO model saved to: {model_name_ppo}")


## Checking and Converting the Model

First off, you should look at the output of the above training to see if the model learned anything. In this case, the further the model was able to make the bird go without a collision, the better it learned. Look at the last block of numbers in the output from the training above and look for the top number *ep_len_mean*. This value is the average number of frames the AI was able to go without a collision after playing the game several times. For this game, if the player runs into the first obstacle the *episode* (=number of frames or time steps before a collision) will be 75 time steps long. So if this value is less than say, 80, then the model really hasn't learned anything useful. A value of >200 is OK. Even bigger values are better. If it looks like your model did not learn, then you can run the previous cell again.

OK, so the above cell will save the trained model to a file called *flappy_bird_rl_model.ppo*. This actually holds 3 different AI models and some other info, but we really only need 2 of them to have the AI play the game. (The third model known as the "critic" is only used to help speed up the training process.) We also need to convert these models to a form that is easier for Javascript to use since it does not understand the .ppo format. The following cell will copy the *policy* and *action* models we needs from the *.ppo* format file into two *ONNX* files that are a common format for storing AI models so they can be used in various languages.

In [None]:
import torch

# Load PPO model
model = PPO.load(model_name_ppo)
print(f"Loaded {model_name_ppo}")

# Create the environment
flappyenv = FlappyEnv(render_mode="human")

# Save the policy model
model_name_policy = "flappy_bird_rl_policy_model.onnx"
policy_net = model.policy.mlp_extractor.policy_net
obs,_ = flappyenv.reset()
dummy_policy_input = torch.tensor(obs, dtype=torch.float32).unsqueeze(0)
torch.onnx.export(policy_net, dummy_policy_input,  model_name_policy)
print(f"policy model saved to: {model_name_policy}")

# Save the action model
model_name_action  = "flappy_bird_rl_action_model.onnx"
action_net = model.policy.action_net
dummy_action_input = policy_net(dummy_policy_input)
torch.onnx.export(action_net, dummy_action_input,  model_name_action)
print(f"action model saved to: {model_name_action}")

### One more conversion ...

Getting the models into ONNX form is not quite enough. The Javascript code that actually plays the game runs in your browser and not on the Google Colab computers. Thus, we need to copy the .onnx files to your local computer. This could be done in a lot of ways, but here we will encode them as long strings in *base64* format so they can be sent be transfered directly from this notebook to your computer.

In [None]:
import base64

def load_onnx_base64(file_path):
    with open(file_path, "rb") as f:
        return base64.b64encode(f.read()).decode('utf-8').replace('\n', '').replace('\r', '')

policy_base64 = load_onnx_base64("flappy_bird_rl_policy_model.onnx")
action_base64 = load_onnx_base64("flappy_bird_rl_action_model.onnx")

## Testing the model

The following cell will load a different variation of the Javascript game that was loaded at the top of this notebook. The only difference is that instead of checking if the human pressed the spacebar to flap, this version will load the AI models and use those to play the game.

It is interesting to note that the AI could play the game at a much faster rate than is being shown. The game actually spends a lot of time sleeping between frames so that it looks smooth to humans!

In [None]:
import random

if not pt: pt=token+str(random.randint(100,1000000))

from IPython.display import Javascript, display, HTML

display(HTML(f'''
    <canvas id="flappyCanvas" width="512" height="256"></canvas>

    <script>
      var policyModelBase64 = '{policy_base64}';
      var actionModelBase64 = '{action_base64}';
      var aiplayerToken = "{pt}";
    </script>

    <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
    <script src="https://jeffersonlab.github.io/AIOP-PIER/examples/Flappy_Bird_gymnasium/flappy_game_AI2.js"></script>
'''))

## Adjusting the model

At this point you may want to see if you can improve the model to play the game better. This could include:

- Retraining for more time steps (increase the value *total_timesteps* is set to)
- Adjust the model architecture by changing the *net_arch* to add more layers or change the nodes per layer. For example, does a *deeper* model with more layers, but fewer nodes per layer work better than a *fat* model that has fewer hidden layers and more nodes per layer?
- Continue training the current model for more steps. This would require splitting the cell above where the model is defined and trained so that you have a cell that starts with "model.learn(total_timesteps=200000)". Just running that will continue training the existing model object without recreating it from scratch. Do you think you can do it?