<a href="https://colab.research.google.com/github/RoyElkabetz/DQN_with_PyTorch_and_Gym/blob/main/dueling_ddqn_main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementing DQN Algorithms with PyTorch and Gym

## Clone the DQN repository

In [2]:
## uncomment only if running from google.colab
# clone the git reposetory
!git clone https://github.com/RoyElkabetz/DQN_with_PyTorch_and_Gym.git

# add path to .py files for import
import sys
sys.path.insert(1, "/content/DQN_with_PyTorch_and_Gym")

Cloning into 'DQN_with_PyTorch_and_Gym'...
remote: Enumerating objects: 380, done.[K
remote: Counting objects: 100% (380/380), done.[K
remote: Compressing objects: 100% (313/313), done.[K
remote: Total 380 (delta 206), reused 193 (delta 63), pack-reused 0[K
Receiving objects: 100% (380/380), 37.06 MiB | 27.95 MiB/s, done.
Resolving deltas: 100% (206/206), done.


## Mount your google drive to save results and checkpoints

In [3]:
## uncomment to mount google drive
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## Check GPU performance

In [4]:
# check GPU parameters
!nvidia-smi

Fri Aug 27 19:31:24 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Install Gym and atari libraries

In [5]:
!pip install --quiet gym
!pip install --quiet atari_py

## Fetch and install ROMs for Atari library

In [6]:
! wget http://www.atarimania.com/roms/Roms.rar
! mkdir /content/ROM/
! unrar e /content/Roms.rar /content/ROM/
! python -m atari_py.import_roms /content/ROM/

--2021-08-27 19:31:38--  http://www.atarimania.com/roms/Roms.rar
Resolving www.atarimania.com (www.atarimania.com)... 195.154.81.199
Connecting to www.atarimania.com (www.atarimania.com)|195.154.81.199|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11128004 (11M) [application/x-rar-compressed]
Saving to: ‘Roms.rar’


2021-08-27 19:31:56 (614 KB/s) - ‘Roms.rar’ saved [11128004/11128004]


UNRAR 5.50 freeware      Copyright (c) 1993-2017 Alexander Roshal


Extracting from /content/Roms.rar

Extracting  /content/ROM/HC ROMS.zip                                      36%  OK 
Extracting  /content/ROM/ROMS.zip                                         74% 99%  OK 
All OK
copying adventure.bin from HC ROMS/BY ALPHABET (PAL)/A-G/Adventure (PAL).bin to /usr/local/lib/python3.7/dist-packages/atari_py/atari_roms/adventure.bin
copying air_raid.bin from HC ROMS/BY ALPHABET (PAL)/A-G/Air Raid (PAL).bin to /usr/local/lib/python3.7/dist-packages/atari_py/at

## Train a DQN based agent in playing an atari game

In [10]:
import numpy as np
import gym
from agents import DuelingDDQNAgent
from utils import plot_learning_curve, make_env

record = False
load_checkpoint = False
env_name = 'SpaceInvadersNoFrameskip-v4'
algo = 'DuelingDDQNAgent'

env = make_env(env_name)
if record:
    env = gym.wrappers.Monitor(env, "recording", force=True)
best_score = -np.inf
n_games = 2000
agent = DuelingDDQNAgent(gamma=0.99, epsilon=1.0, lr=1e-4,
                          input_dims=(env.observation_space.shape),
                          n_actions=env.action_space.n,
                          mem_size=20000,
                          eps_min=0.01,
                          batch_size=32,
                          replace=1000,
                          eps_dec=1e-6,
                          chkpt_dir='gdrive/MyDrive/Checkpoints/',
                          algo=algo,
                          env_name=env_name)
if load_checkpoint:
    agent.load_models()

fname = agent.algo + '_' + agent.env_name + '_lr_' + str(agent.lr) + '_' + str(n_games) + '_games'
figure_file = 'gdrive/MyDrive/Checkpoints/' + fname + '.png'

n_steps = 0
scores, eps_history, steps_array = [], [], []

for i in range(n_games):
    done = False
    score = 0
    observation = env.reset()

    while not done:
        action = agent.choose_action(observation)
        observation_, reward, done, info = env.step(action)
        score += reward

        if not load_checkpoint:
            agent.store_transition(observation, action, reward, observation_, int(done))
            agent.learn()

        observation = observation_
        n_steps += 1
    scores.append(score)
    steps_array.append(n_steps)

    avg_score = np.mean(scores[-100:])
    print('episode ', i, 'score: ', score, 'average score %.1f best score %.1f epsilon %.2f' %
          (avg_score, best_score, agent.epsilon), 'steps ', n_steps)

    if avg_score > best_score:
        if not load_checkpoint:
            agent.save_models()
        best_score = avg_score

    eps_history.append(agent.epsilon)
env.close()

  q_next[dones] = 0.0


episode  0 score:  210.0 average score 210.0 best score -inf epsilon 1.00 steps  705
... saving checkpoint ...
... saving checkpoint ...
episode  1 score:  110.0 average score 160.0 best score 210.0 epsilon 1.00 steps  1242
episode  2 score:  80.0 average score 133.3 best score 210.0 epsilon 1.00 steps  1571
episode  3 score:  580.0 average score 245.0 best score 210.0 epsilon 1.00 steps  2422
... saving checkpoint ...
... saving checkpoint ...
episode  4 score:  360.0 average score 268.0 best score 245.0 epsilon 1.00 steps  3363
... saving checkpoint ...
... saving checkpoint ...
episode  5 score:  15.0 average score 225.8 best score 268.0 epsilon 1.00 steps  3781
episode  6 score:  35.0 average score 198.6 best score 268.0 epsilon 1.00 steps  4186
episode  7 score:  175.0 average score 195.6 best score 268.0 epsilon 1.00 steps  4897
episode  8 score:  50.0 average score 179.4 best score 268.0 epsilon 0.99 steps  5200
episode  9 score:  110.0 average score 172.5 best score 268.0 epsil

KeyboardInterrupt: ignored

## Plot and save learning curve

In [None]:
plot_learning_curve(steps_array, scores, eps_history, figure_file)