# How to Design an Atari Breakout RL Agent 🤖

## OpenAI Gym 💪 🧠
[Gym](https://www.gymlibrary.dev/) is an open-source Python library, made for developing and comparing reinforcement learning algorithms. Gym provides us with an environment and then it is upto us to implement our reinforcement learning algorithms. Today we will use it to create and run an instance of the Atari game Breakout. The Gym library provides us access to the game state, game rewards, and available actions, which if you remember are a necessary parts of our RL framework. 

The code that we will be using to make our Deep RL agent can be found [here](https://github.com/dmitryelj/data-science-tutorials/blob/master/ai_breakout_game.py) It is based off of [this](https://wandb.ai/cleanrl/cleanrl.benchmark/runs/lqyi4g2g/files/code/cleanrl/ppo_atari_visual.py) demo which has some realy nice comparisons that can be seen [here](https://wandb.ai/cleanrl/cleanrl.benchmark/reports/Atari--VmlldzoxMTExNTI) 

<img src="Media/Test.gif" width="200" align="center">

### The RL Framework for Breakout:
- **Action:** Move the paddle left and right 
- **State:** The 210x160 RGB image frame 
- **Reward:** Amount the game score increases 

Lets get started with Gym!

In [1]:
import gym 
import cv2
import time 
import imageio

We need to set the environment variable ALE_PY_ROM_DIR to the directory of the bins so that we can use the namespace ALE/

In [2]:
from ale_py import ALEInterface
ale = ALEInterface()

from ale_py.roms import Breakout
ale.loadROM(Breakout)

A.L.E: Arcade Learning Environment (version 0.8.0+919230b)
[Powered by Stella]
Game console created:
  ROM file:  /Users/justinvalentine/opt/anaconda3/envs/Test/lib/python3.9/site-packages/ale_py/roms/breakout.bin
  Cart Name: Breakout - Breakaway IV (1978) (Atari)
  Cart MD5:  f34f08e5eb96e500e851a80be3277a56
  Display Format:  AUTO-DETECT ==> NTSC
  ROM Size:        2048
  Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file...
Random seed is 1667547198


## Creating the Atari Game Instance 

In [3]:
# create the Atari environment
#env = gym.make("ALE/Breakout-v5")

# Try ME! (many more at https://www.gymlibrary.dev/environments/atari/complete_list/)
#env = gym.make("ALE/Asteroids-v5")
#env = gym.make("ALE/MsPacman-v5")
#env = gym.make("ALE/SpaceInvaders-v5")


# list to store image frames (would not use if not in notebook)
image_lst = []

# run for 5 episodes
for episode in range(5):
    
    # put the environment into its start state
    env.reset() 
    
    # run until the episode completes
    terminated = False 
    
    while not terminated:
        
        # Agent chooses a random action
        action = env.env.action_space.sample()

        #  Agent takes the action and get the information from the environment 
        state, reward, terminated, truncated, info = env.step(action)
        # Create image frame 
        img = cv2.resize(state, (160, 210), interpolation=cv2.INTER_CUBIC)
        frame = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        image_lst.append(frame)

        time.sleep(0.01)
        
# terminate the environment
env.close()

# Save gif 
imageio.mimsave('Media/gym-ex.gif', image_lst, fps=30)

A.L.E: Arcade Learning Environment (version 0.8.0+919230b)
[Powered by Stella]


## Applying Reinforcement Learning to Breakout 
Recall that in Reinforcement Learning then the main goal of our agent is to maximize its expected future reward. This means that at each time step the agent wants to try and select the best action given its current state. 

However it is not enough for the agent to just select the greedy action at every time-step, the agent must also explore its other options. The agent then 'learns' what actions are good in what states, and 'remembers' these results.

So how can we train the agent to play like a human, when all the agent can do is move the paddle and watch what happens? 🤔

### The Challenges
The first challenge we need to address is that each state of the game contains a lot of information! In this case, each frame of the game is a 210x160 RGB image! That means that the state of the game on any time step will be encoded by 210x160=33600 pixel values... This is to much. But how can we reduce the amount of total information, while keeping important features present? 🤔

### Deep Reinforcement Learning
RL + Neural Networks = Deep Reinforcement Learning 🤯

In their 2013 paper by DeepMind [here](https://www.deepmind.com/publications/playing-atari-with-deep-reinforcement-learning), implemented several solutions on how to use neural networks to encode state information. Although our game image is aesthetically appealing to us humans, it is not suitable for the neural network so we will need to modify it. Some of the methods used by Deepmind to pre-prosses the state information are outlined below. 👇

**Steps:**
1. Rescale to the frame to an 84x84 grayscale image (already close to a 15x reduction in state data)
2. We need to also encode the game dynamics because a static image is not enough for the NN to know the direction of the ball. We can encode the dynamics into the game frame by overlaying 4 successive frames 

The full code used to make the environment optimized for training, is presented below:

In [None]:
def make_env(gym_id, seed, keep_rgb=False):
    def make_func():
        env = gym.make(gym_id)
        env = NoopResetEnv(env, noop_max=30)
        if keep_rgb:
            env = RGBSaveEnv(env)
        env = MaxAndSkipEnv(env, skip=4)
        env = RecordEpisodeStatistics(env)
        env = EpisodicLifeEnv(env)
        env = FireResetEnv(env)
        env = WarpFrame(env)
        env = ClipRewardEnv(env)
        env = FrameStack(env, 4)
        env = ImageToPyTorch(env)

        env.seed(seed)
        env.action_space.seed(seed)
        env.observation_space.seed(seed)
        return env
    return make_func

Visual representation of what the NN is "seeing"

<img src="Media/Transformation.gif" width="300" align="center">

## How Does The Code Work?  💭
After our agent selects an action the gym envorment feeds us our 84x84x4 state. This data is then prossesed by 3 convolutional layers (which are good at procsesing images). Lastly these layers are followed by a hidden layer, which is "connected" to two outputs "actor" and "critic". More info on the Actor-Critic Method [here](https://keras.io/examples/rl/actor_critic_cartpole/). This is only a very high level discription of what is going on, and there are alot of steps taken to improve the training of the NN, for more info check out Deepmind's paper [here](https://www.deepmind.com/publications/playing-atari-with-deep-reinforcement-learning). 

## How Do I Train My Own model?
Make sure you have installed all the dependences outlined in the README.md file then paste the following in your terminal:

`python3 ai_breakout_game.py --num-envs=8 --total-timesteps=3000 --multithreading=True`

## How Do I Run my model? 
`python3 ai_breakout_game.py --run=1 --model-filename="ai_breakout_game3000.mdl"`