# Atari DRL Sample Code

## Training Parameters

If you want to see the Atari agent during training or just want to play around with the hyperparameters, here are the parameters that you can configure for training:

* `gym_env`: Specifies the Atari environment. See this link: https://gym.openai.com/envs/#atari
* `scaled_height`: Controls the scaling for the frame height during preprocessing.
* `scaled_width`: Controls the scaling for the frame width during preprocessing.
* `k_frames`: Controls how many frames are stack to represent one state.
* `memory_size`: The maximum capacity of replay memory.
* `memory_alpha`: Specifies how much priority to apply in replay memory.
* `memory_beta`: Controls the importance sampling weights.
* `memory_beta_increment`: The value at which beta is linearly annealed towards 1.
* `memory_eps`: A value added to the priority to ensure an experience has a non-zero probability to be drawn.
* `greedy_start`: The starting value for the exploration rate in the epsilon greedy policy.
* `greedy_end`: The ending value for the exploration rate in the epsilon greedy policy.
* `greedy_decay`: The value at which the exploration rate is linearly annealed towards `greedy_end`.
* `num_episodes`: The total number of episodes that the agent will train for.
* `max_timesteps`: The maximum number of states that the agent can experience for each episode.
* `discount`: The discount factor in the Q-learning algorithm.
* `batch_size`: The batch size used for training.
* `target_update`: The number of episodes that must pass for a target update to occur on the target network.
* `optim_lr`: The learning rate used in the Adam optimizer.
* `optim_eps`: The epsilon value used in the Adam optimizer.
* `render`: Determines if the Atari environment is rendered during training or not.
* `plot_reward`: Determines if the reward for each episode is plotted during training.
* `save_rewards`: Determines if the mean rewards for each episode is saved on disk.
* `save_model`: Determines if the target network's state dictionary will be saved to disk after training.

### Import Statements

In [None]:
%matplotlib inline
from atari import DQN, AtariAI

### Training with DQN without Prioritized Experience Replay

In [None]:
gym_env = 'Pong-v0'
scaled_height = 84
scaled_width = 84
k_frames = 4
memory_size = 10000
greedy_start = 1.
greedy_end = 0.01
greedy_decay = 1.5e-3
num_episodes = 100
max_timesteps = 10000
discount = 0.99
batch_size = 32
target_update = 10
optim_lr = 2.5e-4
optim_eps = 1e-8
render = True
plot_reward = True
save_rewards = True
save_model = True

AtariAI.train_DQN(gym_env, scaled_height, scaled_width, k_frames, memory_size, greedy_start, greedy_end,
              greedy_decay, num_episodes, max_timesteps, discount, batch_size, target_update, optim_lr, 
              optim_eps, render, plot_reward, save_rewards, save_model)

### Training with DDQN and Prioritized Experience Replay

In [None]:
gym_env = 'Pong-v0'
scaled_height = 84
scaled_width = 84
k_frames = 4
memory_size = 50000
memory_alpha = 0.4
memory_beta = 0.4
memory_beta_increment = 1.5e-5
memory_eps = 1e-2
greedy_start = 1.
greedy_end = 0.01
greedy_decay = 1.5e-3
num_episodes = 100
max_timesteps = 10000
discount = 0.99
batch_size = 32
target_update = 10
optim_lr = 2.5e-4
optim_eps = 1e-8
render = True
plot_reward = True
save_rewards = True
save_model = True

AtariAI.train_DDQN_PER(gym_env, scaled_height, scaled_width, k_frames, memory_size, memory_alpha,
              memory_beta, memory_beta_increment, memory_eps, greedy_start, greedy_end,
              greedy_decay, num_episodes, max_timesteps, discount, batch_size, target_update,
              optim_lr, optim_eps, render, plot_reward, save_rewards, save_model)