# 1. Pre-training

Currently, **RLLTE** only supports online pre-training via intrinsic reward. To turn on the pre-training mode, 
it suffices to write a `train.py` like:

In [2]:
from rllte.agent import PPO
from rllte.env import make_atari_env
from rllte.xplore.reward import RE3

if __name__ == "__main__":
    # env setup
    device = "cuda:0"
    env = make_atari_env(device=device)
    eval_env = make_atari_env(device=device)
    # create agent and turn on pre-training mode
    agent = PPO(env=env, 
                eval_env=eval_env, 
                device=device,
                tag="ppo_atari",
                pretraining=True)
    # create intrinsic reward
    re3 = RE3(observation_space=env.observation_space,
              action_space=env.action_space,
              device=device)
    # set the reward module
    agent.set(reward=re3)
    # start training
    agent.train(num_train_steps=5000)

pygame 2.4.0 (SDL 2.26.4, Python 3.8.16)
Hello from the pygame community. https://www.pygame.org/contribute.html


A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]


[08/29/2023 12:02:06 PM] - [[1m[34mINFO.[0m] - Invoking RLLTE Engine...
[08/29/2023 12:02:06 PM] - [[1m[34mINFO.[0m] - Tag               : ppo_atari
[08/29/2023 12:02:06 PM] - [[1m[34mINFO.[0m] - Device            : NVIDIA GeForce RTX 3090
[08/29/2023 12:02:06 PM] - [[1m[33mDEBUG[0m] - Agent             : PPO
[08/29/2023 12:02:06 PM] - [[1m[33mDEBUG[0m] - Encoder           : MnihCnnEncoder
[08/29/2023 12:02:06 PM] - [[1m[33mDEBUG[0m] - Policy            : OnPolicySharedActorCritic
[08/29/2023 12:02:06 PM] - [[1m[33mDEBUG[0m] - Storage           : VanillaRolloutStorage
[08/29/2023 12:02:06 PM] - [[1m[33mDEBUG[0m] - Distribution      : Categorical
[08/29/2023 12:02:06 PM] - [[1m[33mDEBUG[0m] - Augmentation      : False
[08/29/2023 12:02:06 PM] - [[1m[33mDEBUG[0m] - Intrinsic Reward  : True, RE3
[08/29/2023 12:02:06 PM] - [[1m[34mINFO.[0m] - Pre-training Mode : On
[08/29/2023 12:02:09 PM] - [[1m[32mEVAL.[0m] - S: 0           | E: 0           | L: 30    

We can find that the **pre-training** mode is on. Note that a `reward` module module must be specified when the pre-training mode is on! For all supported reward modules, see [API Documentation](https://docs.rllte.dev/api/).


# 2. Fine-tuning

Once the pre-training is finished, you can find the model parameters in the `pretrained` subfolder of the working directory. To 
load the parameters, just turn off the pre-training mode and load the parameters with `.load()` function. Run the following cell and you'll see the pre-trained model parameters are loaded:

In [3]:
from rllte.agent import PPO
from rllte.env import make_atari_env

if __name__ == "__main__":
    # env setup
    device = "cuda:0"
    env = make_atari_env(device=device)
    eval_env = make_atari_env(device=device)
    # create agent and turn off pre-training mode
    agent = PPO(env=env, 
                eval_env=eval_env, 
                device=device,
                tag="ppo_atari",
                pretraining=False)
    # start training
    agent.train(num_train_steps=5000,
                init_model_path="/export/yuanmingqi/code/rllte/examples/logs/ppo_atari/2023-08-29-12-02-05/pretrained/pretrained.pth")

[08/29/2023 12:06:26 PM] - [[1m[34mINFO.[0m] - Invoking RLLTE Engine...
[08/29/2023 12:06:26 PM] - [[1m[34mINFO.[0m] - Tag               : ppo_atari
[08/29/2023 12:06:26 PM] - [[1m[34mINFO.[0m] - Device            : NVIDIA GeForce RTX 3090
[08/29/2023 12:06:26 PM] - [[1m[33mDEBUG[0m] - Agent             : PPO
[08/29/2023 12:06:26 PM] - [[1m[33mDEBUG[0m] - Encoder           : MnihCnnEncoder
[08/29/2023 12:06:26 PM] - [[1m[33mDEBUG[0m] - Policy            : OnPolicySharedActorCritic
[08/29/2023 12:06:26 PM] - [[1m[33mDEBUG[0m] - Storage           : VanillaRolloutStorage
[08/29/2023 12:06:26 PM] - [[1m[33mDEBUG[0m] - Distribution      : Categorical
[08/29/2023 12:06:26 PM] - [[1m[33mDEBUG[0m] - Augmentation      : False
[08/29/2023 12:06:26 PM] - [[1m[33mDEBUG[0m] - Intrinsic Reward  : False
[08/29/2023 12:06:26 PM] - [[1m[34mINFO.[0m] - Loading Initial Parameters from /export/yuanmingqi/code/rllte/examples/logs/ppo_atari/2023-08-29-12-02-05/pretrained/pre