# Custom model tutorial

In this notebook, you will learn how to replace the default network model with your own customized model following below 5 steps.

(0. Preparation of this notebook)
1. Setting up the training environment 
2. Build customized Q-funciton model for training
3. Create a ModelBuilder
4. Setup the DQN algorithm
5. Run the training

## Preparation

Let's start by first installing nnabla-rl and importing required packages for training.

In [None]:
!pip install nnabla-rl

In [None]:
import gym
import nnabla as nn
from nnabla import functions as NF
from nnabla import parametric_functions as NPF

import nnabla_rl
import nnabla_rl.algorithms as A
import nnabla_rl.functions as RF
import nnabla_rl.writers as W
from nnabla_rl.builders import ModelBuilder, SolverBuilder
from nnabla_rl.models.q_function import DiscreteQFunction
from nnabla_rl.environments.wrappers import NumpyFloat32Env, ScreenRenderEnv
from nnabla_rl.utils.evaluator import EpisodicEvaluator
from nnabla_rl.utils.reproductions import set_global_seed

In [None]:
!bash package_install.sh

In [None]:
%run ./colab_utils.py

In [None]:
nn.clear_parameters()

## Setting up the training environment

Set up the "MountainCar" environment provided by the OpenAI Gym.

In [None]:
def build_env(env_name):
    env = gym.make(env_name)
    env = NumpyFloat32Env(env)
    env = ScreenRenderEnv(env)  # for rendering screen
    env.seed(0) # optinal
    return env

In [None]:
env_name = "MountainCar-v0"
env = build_env(env_name)
set_global_seed(0) # optional

## Build customized Q-function model for training

Let's prepare a customized network model for the training of "MountainCar".  
The DQN algorithm that we will use in this notebook requires a model of Q-function to train.  
So we will implement a customized Q-function model in this notebook.  
Implementing Q-function is easy!

In [None]:
class MountainCarQFunction(DiscreteQFunction):
    def __init__(self, scope_name: str, n_action: int):
        super(MountainCarQFunction, self).__init__(scope_name)
        self._n_action = n_action
    
    def all_q(self, s: nn.Variable) -> nn.Variable:
        with nn.parameter_scope(self.scope_name):
            h = NF.relu(NPF.affine(s, 50, name="affine-1"))
            h = NF.relu(NPF.affine(h, 50, name="affine-2"))
            q = NPF.affine(h, self._n_action, name="pred-q")
        return q

## Create a ModelBuilder

To use your customized model, you'll need to create a ModelBuilder.  

In [None]:
class MountainCarQFunctionBuilder(ModelBuilder[QFunction]):
    def build_model(self, scope_name, env_info, algorithm_params, **kwargs):
        return MountainCarQFunction(scope_name, env_info.action_dim)

## Set up the DQN algorithm

We are almost ready to start the training. Finally, let's set up the DQN algorithm.

In [None]:
config = A.DQNConfig(
    gpu_id=0,
    gamma=0.9,
    learning_rate=1e-3,
    batch_size=32,
    learner_update_frequency=1,
    target_update_frequency=200,
    start_timesteps=200,
    replay_buffer_size=10000,
    max_explore_steps=10000,
    initial_epsilon=1.0,
    final_epsilon=0.001,
    test_epsilon=0.05,
    grad_clip=None
)

In [None]:
dqn = A.DQN(
    env,
    config=config,
    q_func_builder=MountainCarQFunctionBuilder() # Feeding the builder to use customized model
)

## Hook (optional)

We will append a RenderHook to the algorithm to visually check the training status．This step is optional.
This hook may slow down the training.

In [None]:
render_hook = RenderHook(env=env)

In [None]:
dqn.set_hooks([render_hook])

## Run the training

The training takes time (10-20 min).  
After 10-20 min, you will see the cart reaching to the flag on the top of mountain  (Not always, in some trials).

In [None]:
try:
    dqn.train(env, total_iterations=100000)
finally:
    env.close()