# 1. Load and Test the Environment

<em>Load the base environment, the Super Mario Bros video game, and test that it works.</em>

Note: The <a href="https://pypi.org/project/gym-super-mario-bros/"><u>Super Mario Bros environment</u></a> was created by Christian Kauten using OpenAI GYM and the Nes-py emulator. <a href="https://gym.openai.com/"><u>OpenAI GYM</u></a> is a popular framework that aims to standardize environments for reinforced learning. It features many pre-built environments but also allows for custom environments. <a href="https://pypi.org/project/nes-py/"><u>Nes-py</u></a> is an NES emulator designed for custom OpenAI GYM environments.

In [None]:
# Install the environment
!pip install gym-super-mario-bros

Note: If you receive and error during installation, please install <a href="https://visualstudio.microsoft.com/vs/community/"><u>Visual Studio</u></a> with the "Desktop Development with C++" workload and try again.

In [None]:
# Import the environment
import gym_super_mario_bros

In [None]:
# Create the base environment
env = gym_super_mario_bros.make('SuperMarioBros-v3')

In [None]:
# Test the base environment
done = True # Create a 'Done' flag which determines whether to restart the game or not

for step in range(2000): # Loop through each frame in the game
    if done:
        state = env.reset() # Start the game
    state, reward, done, info = env.step(env.action_space.sample()) # Do a random action
    env.render() # Display the game

env.close() # Close the game

# 2. Preprocess the Environment

<em>Preprocess the base environment so the AI agent can train effectively.</em>

Note: In its current state, the base environment is complex and hard to learn in. Simplyfying the environment will enable the AI agent to train effectively. This step serves the same purpose as preprocessing data for supervised or unsupervised learning. In the case of reinforced learning, the data is taken from the environment, so the environment itself must be preprocessed.

In [None]:
# Install Nes-py for controller support
!pip install nes-py

In [None]:
# Install Stable Baselines for reinforced learning resources
!pip install stable-baselines3[extra]

In [None]:
# Import the Joypad wrapper to emulate an NES controller
from nes_py.wrappers import JoypadSpace
# Import simplified movement so the AI agent has less actions to take
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
# Import grayscaling to remove color
from gym.wrappers import GrayScaleObservation
# Import Vectorization wrappers to improve training performance
from stable_baselines3.common.vec_env import VecFrameStack, DummyVecEnv

# Import MatplotLib to demonstrate the preprocessing changes
import matplotlib.pyplot as plt
# Matplotlib doesn't work for some reason unless 'KMP_DUPLICATE_LIB_OK' is set to True
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

### A. Simplify Actions

In [None]:
# Show the actions the AI Agent can take
env.action_space

In [None]:
# Simplify the actions the AI Agent can take
env = JoypadSpace(env, SIMPLE_MOVEMENT)

In [None]:
# Show the actions the AI Agent can take after simplyfying the actions
env.action_space

Note: Notice how the action space went from 256 to 7. Originally, the AI Agent could take 256 actions, but they can only take 7 actions now.

### B. Grayscale

In [None]:
# Show what the environment currently looks like and the state shape
state = env.reset()
plt.imshow(state)
state.shape

Note: If you receive an "access violation" error, restart the kernel, create the environment again, but do NOT test the environment again.

In [None]:
# Grayscale the environment
env = GrayScaleObservation(env, keep_dim=True)

In [None]:
# Show what the environment looks like and the state shape after grayscaling
state = env.reset()
plt.imshow(state)
state.shape

Note: Notice how the color changes. Also, notice how the state shape shrunk. Originally, there were 3 color values, RGB, now there is only 1, gray.

### C. Vectorize

In [None]:
# Show the state shape
state = env.reset()
state.shape

In [None]:
# Vectorize the Environment
env = DummyVecEnv([lambda: env])
env = VecFrameStack(env, 4, channels_order='last')

In [None]:
# Show the state shape after vectorizing
state = env.reset()
state.shape

Note: Notice how the state shape changes. By vectorizing, the AI Agent can take into account 4 frames at a time. This increases training efficiency.

# 3. Train the AI Agent

<em>Train the AI Agent using a complex model that implements the <a href="https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html"><u>PPO learning algorithm</u></a> and the <a href="https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html?highlight=cnnpolicy#stable_baselines3.ppo.CnnPolicy"><u>Convolutional Neural Network policy</u></a> from <a href="https://stable-baselines3.readthedocs.io/en/master/"><u>Stable Baselines3</u></a>.</em>

Note: PPO, or Proximal Policy Optimization, is a recent popular learning algorithm for reinforced learning. It is a type of policy gradient algorithm and aims to optimize its policy (discussed below). The authors of the <a href="https://arxiv.org/abs/1707.06347"><u>Proximal Policy Optimization Algorithms</u></a> paper claim that PPO effectively balances implementation, sample complexity, and tuning. They found that PPO performs comparable or better than other popular learning algorithms, such as <a href="https://arxiv.org/abs/1611.01224"><u>ACER</u></a> and <a href="https://arxiv.org/abs/1502.05477"><u>TRPO</u></a>.

Note: Learning algorithms, including PPO, use policies to alter their performance and operation depending on the environment. For example, cnn policy works well in environments where data is visual, like the Super Mario Bros environment. Mlp policy, on the other hand, works well with tabular environments, where the data is organized into tables. Now, cnn policy, or Convolutional Neural Network policy, works by using a convolutional neural network, a feed-forward neural network with 20 to 30 layers. Some of those layers are convolutional layers, which are optimized for analyzing images.

In [None]:
# Install PyTorch to accelerate training by using my CUDA-supported NVIDIA GPU
!conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

Note: Installing PyTorch is not necessary and only works with NVIDIA GPU's that have <a href="https://developer.nvidia.com/cuda-downloads"><u>CUDA</u></a> installed. 

In [None]:
# Import PPO to use for the learning algorithm
from stable_baselines3 import PPO
# Import a callback for saving models
from stable_baselines3.common.callbacks import CheckpointCallback

In [None]:
# Define the paths for saved models
save_path = './Saved Models/'

In [None]:
# Setup callback for saving the model
callback = CheckpointCallback(save_freq = 25000, save_path = save_path, name_prefix = 'MarioAI')

In [None]:
# Create the model
model = PPO('CnnPolicy', env, verbose=1)

In [None]:
# Train the model for 2,000,000 timesteps
model.learn(total_timesteps=2000000, callback=callback)

Note: If you reveive an "access violation" error, restart the Kernel, create and preprocess the environment again, but do NOT test the environment again.

# 4. Evaluate the AI Agent

<em>Load a model and watch the AI Agent play.</em>

Note: The below steps are set up to load and run the pre-trained AI agent.

In [None]:
# Load the model
model = PPO.load('./Saved Models/Control_6000000_steps', env=env)

In [None]:
# Start the game
state = env.reset()
# Loop through the game
while True:
    action, _ = model.predict(state)
    state, reward, done, info = env.step(action)
    env.render()

Note: If you reveive an "access violation" error, restart the Kernel, create and preprocess the environment again, but do NOT test the environment again.

Note: To stop the game, interrupt the kernel for the above step (by clicking the square at the top of the screen) and close the game.

In [None]:
# Close the game
env.close()