# Chapter 7 DQN Extensions

## Categorical DQN

* [source code](https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter07/07_dqn_distrib.py)
* [Bellemare et al. 2017 Paper](https://arxiv.org/pdf/1707.06887v1.pdf)

### key idea
* Q value를  scalar value(기대값 형태의)가 아니라, 말그대로 가능한 Q value의 probability distribution으로 구하자는 아이디어.
* 참고 할 만한 리뷰 논문은 [Lowet et al. 2020](https://www.sciencedirect.com/science/article/pii/S0166223620301983)

In [1]:
import gym
import ptan
import numpy as np
import argparse

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import tensorflow as tf

from lib import common

In [3]:
SAVE_STATES_IMG = False
SAVE_TRANSTIONS_IMG = False

if SAVE_STATES_IMG or SAVE_TRANSTIONS_IMG: # 그래프를 위한 code인듯
    import matplotlib as mpl
    mpl.use("Agg")
    import matplotlib.pylab as plt # pyplot이 아님.

In [4]:
Vmax = 10
Vmin = -10
N_ATOMS = 51 # distribution의 bins number
DELTA_Z = (Vmax - Vmin) / (N_ATOMS - 1) # size of bins

STATES_TO_EVALUATE = 1000
EVAL_EVERY_FRAME = 100

In [None]:
class DistributionalDQN(nn.Module):
    def __init__(self, input_shape, n_actions):
        super(DistributionalDQN, self).__init__()
        
        self.conv = nn.Sequential(
            nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=4, stride=2), 
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=3, stride=1),
            nn.ReLU()
        )

        conv_out_size = self._get_conv_out(input_shape) # conv가 아니라 input_shape이 들어가는 이유는 _get_conv_out function을 보면 이해 됨.
        self.fc = nn.Sequential(
            nn.Linear(conv_out_size, 512),
            nn.ReLU(),
            nn.Linear(512, n_actions * N_ATOMS)
        )

        self.register_buffer("supports", torch.arange(Vmin, Vmax + DELTA_Z, DELTA_Z))
