# Single Agent restoration
This guide uses an RLlib checkpoint and restores it for further training or evaluation using the ```tune.run``` API. We use a PPO checkpoint here, but if you train an agent using the Single agent example you can try to use that here.

This guide uses the ScreenCapturer by default, which requires FFMPEG installed on your computer. If you don't want to use it just comment out the import statement and line when it is used as a wrapper.

In [None]:
# imports
from pathlib import Path
import os

# malmoenv imports
import malmoenv
from malmoenv.utils.launcher import launch_minecraft
from malmoenv.utils.wrappers import DownsampleObs
from examples.utils.screenrecorder import ScreenCapturer
from examples.utils.utils import update_checkpoint_for_rollout, get_config

import ray
from ray import tune

Define some constants.

In [None]:
ENV_NAME = "malmo"
MISSION_XML = os.path.realpath('missions/mobchase_single_agent.xml')
COMMAND_PORT = 8999 # first port's number
xml = Path(MISSION_XML).read_text()

CHECKPOINT_FREQ = 100     # in terms of number of algorithm iterations
LOG_DIR = "results/"       # creates a new directory and puts results there

NUM_WORKERS = 1
NUM_GPUS = 0
TOTAL_STEPS = int(1e6)
launch_script = "./launchClient_quiet.sh"

checkpoint_file = "/home/mballa/data/PPO/PPO_malmo_5ef58_00000_0_2020-11-15_00-34-50/checkpoint_80/checkpoint-80"
update_checkpoint_for_rollout(checkpoint_file)

Env creator function. This is the part where the ScreenCapturer can be utilised. By default it records the native resolution of Malmo, which is defined in the mission XML file.

The ```format``` argument sets the file format, supported formats are : ```gif```(default) and ```mp4```.
The ```size``` argument expects a tuple of ```(width, height)``` dimensions and will convert the output to this size.
Multiple episodes can be accumulated and recorded into a single video by supplying the number of episodes to the ```accumulate_episodes``` argument.

In [None]:
def create_env(config):
    env = malmoenv.make()
    env.init(xml, COMMAND_PORT + config.worker_index, reshape=True)
    env.reward_range = (-float('inf'), float('inf'))

    env = ScreenCapturer(env)
    env = DownsampleObs(env, shape=tuple((84, 84)))
    return env

The next step is to load the original config and overwrite some parameters. We want to get the same setting as we did for the training, but we don't necessarily want to use the same hardware for further training/evaluation. Let's say we trained an agent on a cluster with multiple CPUs and a GPU, but we would like to evaluate the checkpoint locally using a single env and without a GPU. To do this we can just overwrite these entries in the config. We can also disable the exploration as shown below. Depending on the chosen algorithm there are more configurations that might be useful for evaluation see the RLlib documentation for more details.

In [None]:
config = get_config(checkpoint_file)
config["num_workers"] = NUM_WORKERS
config["num_gpus"] = NUM_GPUS
config["explore"] = False

As in the previous examples, the next step is to start the Malmo instances

In [None]:
GAME_INSTANCE_PORTS = [COMMAND_PORT + 1 + i for i in range(NUM_WORKERS)]
instances = launch_minecraft(GAME_INSTANCE_PORTS, launch_script=launch_script)

With ```tune.run``` we can restore a checkpoint and continue training. In this case we use it to visualise the trained agent, but for a more thorough evaluation a better setting is required, which is shown in the next example.

In [None]:
tune.run(
    "PPO",
    config=config,
    stop={"timesteps_total": TOTAL_STEPS},
    checkpoint_at_end=False,
    checkpoint_freq=CHECKPOINT_FREQ,
    local_dir=LOG_DIR,
    restore=checkpoint_file
)