## Python Packages Needed
- **Gym** by OpenAI is a library that creates a reinforcement learning environment interface (train models, define different reward functions, different inputs, and visualize performance)
- **Stable-Baselines** is a library of popular reinforcement learning algorithms (Deep Q, Actor2Critic, Proximal Policy Optimization, and etc.)
- **Gym-Retro** by OpenAI is a library that creates a reinforcement learning environment interface to work with hundreds of different retro games if you have the video game rom (train models in those game environments, define reward functions, and interface with games not already added) 

## Environment Initial Setup
 I use Google Colab for training since you are able to change the runtime and run off of Google Tesla GPUs. Just go to *Runtime* in the colab menu and select *Change Runtime* then set to **GPU**

In [0]:
!pip list #you will see that stable-baselines/gym/tensorflow already installed
#pip install stable-baselines
#pip install gym

In [1]:
!pip install gym-retro #only install needed in colab

Collecting gym-retro
[?25l  Downloading https://files.pythonhosted.org/packages/67/2b/bee76fbe439a8a600854fb41fafcfad7efa57d1f3107bbca48ac4a1387cd/gym_retro-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (162.0MB)
[K     |████████████████████████████████| 162.0MB 260kB/s 
Installing collected packages: gym-retro
Successfully installed gym-retro-0.7.0


In [2]:
from google.colab import drive #using Google Drive for my storage File system since using Google Colab to train
drive.mount('/content/drive') #you should see a drive folder appear in the left sidebar

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## Imports Needed
- Gym
- Numpy
- Gym-Retro (as retro)
- Stable-Baselines (specific models needed -> https://stable-baselines.readthedocs.io/en/master/)
***You can also use a self built model or other models! Then you wouldn't need to import Stable-Baselines**

In [0]:
import gym 
import numpy as np
import retro
from stable_baselines.common.policies import CnnPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines import A2C , PPO2

In [4]:
!python -m retro.import.sega_classics

Steam Username: phoenixtechnerd
Steam Password (leave blank if cached): 
Steam Guard code: 4VB45
Downloading games...
ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
Installing games...
Importing SonicTheHedgehog-Genesis
Importing BioHazardBattle-Genesis
Importing SuperThunderBlade-Genesis
Importing AlienSoldier-Genesis
Importing GoldenAxe-Genesis
Importing StreetsOfRage2-Genesis
Importing SonicAndKnuckles3-Genesis
Importing ShadowDancerTheSecretOfShinobi-Genesis
Importing ColumnsIII-Genesis
Importing ComixZone-Genesis
Importing GainGround-Genesis
Importing CrackDown-Genesis
Importing RevengeOfShinobi-Genesis
Impor

## Making the Action Space More Discrete
For *Gym-Retro* the standard output array (actions for the agent to choose) is has a length of 12 and is defined as follows
- ["B", "A", "MODE", "START", "UP", "DOWN", "LEFT", "RIGHT", "C", "Y", "X", "Z"]

To reduce training time I recommend adjusting the action space to fit the specific game. Here is an example from Sonic of creating array items and sub-arrays for actions that can be taken together.

In [0]:
class SonicDiscretizer(gym.ActionWrapper):
    """
    Wrap a gym-retro environment and make it use discrete
    actions for the Sonic game.
    """
    def __init__(self, env):
        super(SonicDiscretizer, self).__init__(env)
        buttons = ["B", "A", "MODE", "START", "UP", "DOWN", "LEFT", "RIGHT", "C", "Y", "X", "Z"]
        actions = [['LEFT'], ['RIGHT'], ['LEFT', 'DOWN'], ['RIGHT', 'DOWN'], ['DOWN'],['DOWN', 'B'], ['B']]
        self._actions = []
        for action in actions:
            arr = np.array([False] * 12)
            for button in action:
                arr[buttons.index(button)] = True
            self._actions.append(arr)
        self.action_space = gym.spaces.Discrete(len(self._actions))

    def action(self, a):
      return self._actions[a].copy()

## Creating Your First Retro Environment

To see if a particular retro game is supported visit (https://github.com/openai/retro/tree/master/retro/data/stable)


***This folder contains 4 primary files along with the different level/game states***
- **Scenario.json** (which is where you define your reward function)
- **data.json** (which defines the memory address in the rom for different variables usable in the Scenario.json file)
- **metadata.json** (which provides a default state and you probably won't touch much)
- **rom.sha** (which is the hash of the rom being used and must corespond with the game rom you have)

In [0]:
env = retro.make(game='SonicTheHedgehog2-Genesis', state='EmeraldHillZone.Act1', scenario='/content/updatedScenario.json')
env = DummyVecEnv([lambda: SonicDiscretizer(env)]) #note that the vecotrized environment is wrapped in the descritizer. This is only needed if you discrete the action space

## Creating Your Own Scenario File

You may have noticed in the above *retro.make* function that the keyword argument *scenario=* has a file passed in. This is how you update your reward function using the available variables in the **data.json** file. 

If you are using Google Colab you can upload this file into the notebook and it will be at *content/*filename

```
{
  "done": {
    "variables": {
      "lives": {
        "op": "zero"
      }
    }
  },
  "reward": {
    "variables": {
      "score": {
        "reward": 3
      },
      "rings": {
        "reward": 50.0
      }
    }
  }
}
```



## Using Stable-Baselines Models 

In this demo I am using a Proximal Policy Optimization Network (https://openai.com/blog/openai-baselines-ppo/) but other stable-baselines models such as A2C or DQN can be used. 


***Also, you can use other models or your own models in a similar way! ***

In [0]:
model_id = 'sonic2ppo2dis'
model = PPO2(CnnPolicy, env, n_steps=2048, verbose=1) #define specific model, also notice CNN as the input
model.learn(total_timesteps=500000) #total timesteps to train the model. Google Colab you can do about 1 to 1.5 million at a time
model.save('/gdrive/My Drive/Sonic2-Discrete/' + model_id) #save model as a pkl file to either load and continue training or use trained model

## Loading A Model to Continue Training, Update the Reward Function, or Change State

Given the pkl file that gym saves you are able to load in different reward functions, game states, or just train longer on Google Colab (about 1 million - 1.5 million timesteps before you use the allocated memory)

In [0]:
env = retro.RetroEnv(game='SonicAndKnuckles3-Genesis', state='AngelIslandZone.Act1', record='/gdrive/My Drive/Sonic2/')
env = DummyVecEnv([lambda: SonicDiscretizer(env)])
model = PPO2.load('/gdrive/My Drive/Sonic2Speedrun/sonic2Speedrun_TransferGreedyContest_7-3mil', env)

In [0]:
model_id = 'sonic3Speedrun_TransferGreedyContest_3mil'
model.learn(total_timesteps=1000000)
model.save('/gdrive/My Drive/Sonic3/' + model_id)

## Prediciting with Your Model

Reinforcement Algorithms are focused on exploration and exploitation. When learning exploration is critical with the optimal policy still being utilized to exploit. When you are predicting the model is just exploiting and not exploring the environment for finding other optimal outcomes. 

***If you are predicting make sure you are loaded with a path for the record parameter to visualize the model***

In [0]:
obs = env.reset()
done = False

while not done:
    action, _info = model.predict(obs) #predicting with the model
    obs, rewards, dones, info = env.step(action) #variables under the hood

## What Does My Model See

- ***Obs***: variable that contains the input observation (in this case the screen itself as pixel values)
- ***Rewards***: intermitent rewards that the agent is being fed defined in the scenario.file
- ***Dones***: whether the done condition (also in scenrio file) was met
- ***Info***: all variables defined in the data.json and their current values

## Sweet Now I Got a BK2 File??
To convert your bk2 file (which is an array of actions or buttons that were selected) to a more viewable format like mp4 there is a built in function provided with Retro

In [0]:
!python -m retro.scripts.playback_movie /content/drive/My\ Drive/SOR3_Dis/StreetsOfRage3-Genesis-1Player.Axel.Stage1-000001.bk2 #notice the escape character on My\ Drive