# OpenAI Gym
[Gym](https://www.gymlibrary.dev/) is an open-source Python library, made for the easy development and testing of reinforcement learning algorithms. Today we will use it to create and run an instance of the Atari game Breakout. The Gym library provides us access to the game state, game rewards, and available actions, which if you remember are necessary parts of our RL framework. 

<img src="Media/Test.gif" width="200" align="center">

### The RL Framework for Breakout:
- **Action:** Move the paddle left and right 
- **State:** The 210x160 RGB image frame 
- **Reward:** Amount the game score increases 


### Packages

In [1]:
import gym 
import cv2
import time 
import imageio

We need to set the environment variable ALE_PY_ROM_DIR to the directory of the bins so that we can use the namespace ALE/

In [2]:
from ale_py import ALEInterface
ale = ALEInterface()

from ale_py.roms import Breakout
ale.loadROM(Breakout)

A.L.E: Arcade Learning Environment (version 0.8.0+919230b)
[Powered by Stella]
Game console created:
  ROM file:  /Users/justinvalentine/opt/anaconda3/envs/Test/lib/python3.9/site-packages/ale_py/roms/breakout.bin
  Cart Name: Breakout - Breakaway IV (1978) (Atari)
  Cart MD5:  f34f08e5eb96e500e851a80be3277a56
  Display Format:  AUTO-DETECT ==> NTSC
  ROM Size:        2048
  Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file...
Random seed is 1667373141


## Creating the Atari Breakout Instance 

In [8]:
env = gym.make("ALE/Breakout-v5") # creats a game instance of Atari Breakout 

env.reset()
image_lst = []

step_num, total_reward = 0, 0

while step_num < 100:
    # Used to generates random actions
    action = env.env.action_space.sample()
    
    # Get the next state, and reward after taking your action 
    state, reward, terminated, truncated, info = env.step(action)

    # Create image frame 
    img = cv2.resize(state, (160, 210), interpolation=cv2.INTER_CUBIC)
    frame = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    image_lst.append(frame)
        
    time.sleep(0.01)
    step_num += 1

env.close()

imageio.mimsave('Media/random-action.gif', image_lst, fps=30)

<img src="Media/random-action.gif" width="300" align="center">

## Applying Reinforcement Learning to Breakout 
Recall that in Reinforcement Learning then the main goal of our agent is to maximize its expected future reward. This means that at each time step the agent wants to try and select the best action given its current state at that time step. 

However it is not enough for the agent to just select the greedy action at every time-step, the agent must also explore its other options. The agent then 'learns' what actions are good in what states, and 'remembers' these results.

So how can we train the agent to play like a human, when all the agent can do is move the paddle and watch what happens?

### The Challenges
The first challenge we need to address is that each state of the game contains a lot of information! In this case, each frame of the game is a 210x160 RGB image! That means that the state of the game on any time step will be encoded by 210x160=33600 pixel values... This is to much. But how can we reduce the amount of total information, while keeping important features present?

### Deep Reinforcement Learning
Well, we can look the the 2013 paper by DeepMind [here](https://www.deepmind.com/publications/playing-atari-with-deep-reinforcement-learning), their solution was to use neural networks to encode state information.

Our game instance looks good and humans could train based off of it, but it is not suitable for the neural network so we will need to modify it. 

**Steps:**
1. Rescale to the frame to an 84x84 grayscale image (already close to a 15x reduction in state data)
2. We need to also encode the game dynamics because a static image is not enough for the NN to know the direction of the ball. We can encode the dynamics into the game frame by overlaying 4 successive frames 

The full code used to make the environment optimized for training, is presented below:

In [None]:
env = gym.make("BreakoutNoFrameskip-v4")
env = NoopResetEnv(env, noop_max=30)
env = MaxAndSkipEnv(env, skip=4)
env = RecordEpisodeStatistics(env)
env = EpisodicLifeEnv(env)
env = FireResetEnv(env)
env = WarpFrame(env)
env = ClipRewardEnv(env)
env = FrameStack(env, 4)
env = ImageToPyTorch(env)

<img src="Media/Transformation.gif" width="300" align="center">