# ReplayBuffer tutorial

In this notebook, you will learn how to replace the default replay buffer with different replay buffer implementation following below 4 steps.

(0. Preparation of this notebook)
1. Setting up the training environment 
2. Create a ReplayBufferBuilder
3. Setup the DDPG algorithm
4. Run the training

## Preparation

Let's start by first installing nnabla-rl and importing required packages for training.

In [None]:
!pip install nnabla-rl

In [None]:
import gym
import nnabla as nn
from nnabla import functions as NF
from nnabla import parametric_functions as NPF
from nnabla import solvers as NS

import nnabla_rl
import nnabla_rl.algorithms as A
import nnabla_rl.writers as W
import nnabla_rl.functions as RF
import nnabla_rl.replay_buffers as RB
from nnabla_rl.builders import ReplayBufferBuilder
from nnabla_rl.environments.environment_info import EnvironmentInfo
from nnabla_rl.environments.wrappers import NumpyFloat32Env, ScreenRenderEnv
from nnabla_rl.replay_buffer import ReplayBuffer
from nnabla_rl.utils.reproductions import set_global_seed

In [None]:
!bash package_install.sh

In [None]:
%run ./colab_utils.py

In [None]:
nn.clear_parameters()

## Setting up the training environment

Set up the "Pendulum" environment provided by the OpenAI Gym.

In [None]:
def build_env(env_name):
    env = gym.make(env_name)
    env = NumpyFloat32Env(env)
    env = ScreenRenderEnv(env)  # for rendering environment
    env.seed(0)
    return env

In [None]:
env_name = "Pendulum-v0"
env = build_env(env_name)
set_global_seed(0)

## Create a ReplayBufferBuilder

The default replay buffer used in DDPG algorithm samples each data uniformly.  
We will replace this replay buffer with PrioritizedReplayBuffer which samples data according to the priority(importance) of the data.

In [None]:
class PrioritizedReplayBufferBuilder(ReplayBufferBuilder):
    def build_replay_buffer(self,  # type: ignore[override]
                            env_info: EnvironmentInfo,
                            algorithm_config: A.DDPGConfig,
                            **kwargs) -> ReplayBuffer:
        return RB.PrioritizedReplayBuffer(capacity=algorithm_config.replay_buffer_size)

## Preparation of Algorithm

We are almost ready to start the training. Finally, let's setup the DDPG algorithm.  
Here, we provide the ReplayBufferBuilder that we just implemented to replace the default buffer. 

In [None]:
config = A.DDPGConfig(gpu_id=0, start_timesteps=500)

In [None]:
ddpg = A.DDPG(
    env_or_env_info=env,
    config=config,
    replay_buffer_builder=PrioritizedReplayBufferBuilder()
)

## Preparation of Hook (optional)

We append RenderHook to visually check the training status.​ This step is optional.  
This hook may slow down the training.

In [None]:
render_hook = RenderHook(env=env)

In [None]:
ddpg.set_hooks([render_hook])

## Run the training

The training takes time (10-20 min).  
After 10-20 min, you will see the agent swinging up the pendulum.

In [None]:
ddpg.train(env, total_iterations=50000)