This is a snippet of `PolicySaver` and `CheckPointer`.

It consists of the following commands:
* to create an REINFORCEMENT agent
* to save the policy of the agent by a `PolicySaver`
	* and load the saved policy
* to save the agent by a `CheckPointer`
	* and load the agent

Note that this snippet skips any code of training agents because it's not necessary for saving agents

# 1. Create a REINFORCEMENT agent:

The codes of this section come from a [tf_agents tutorial](https://github.com/tensorflow/agents/blob/master/docs/tutorials/6_reinforce_tutorial.ipynb)

In [None]:
import tensorflow as tf
from tf_agents.agents.reinforce import reinforce_agent
from tf_agents.environments import suite_gym
from tf_agents.environments import tf_py_environment
from tf_agents.networks import actor_distribution_network
from tf_agents.train.utils import train_utils

In [None]:
env_name = "CartPole-v0" # @param {type:"string"}
fc_layer_params = (100,)
learning_rate = 1e-3 # @param {type:"number"}

In [None]:
train_py_env = suite_gym.load(env_name)
eval_py_env = suite_gym.load(env_name)

train_env = tf_py_environment.TFPyEnvironment(train_py_env)
eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)

In [None]:
def createAgent():
    actor_net = actor_distribution_network.ActorDistributionNetwork(
        train_env.observation_spec(),
        train_env.action_spec(),
        fc_layer_params=fc_layer_params)
    
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

    train_step_counter = train_utils.create_train_step()

    tf_agent = reinforce_agent.ReinforceAgent(
        train_env.time_step_spec(),
        train_env.action_spec(),
        actor_network=actor_net,
        optimizer=optimizer,
        normalize_returns=True,
        train_step_counter=train_step_counter)
    tf_agent.initialize()
    return tf_agent, train_step_counter

# 2. A snippet of using `PolicySaver`
Please, refer to a [tutorial](https://www.tensorflow.org/agents/api_docs/python/tf_agents/policies/PolicySaver) about `PolicySaver`.

In [None]:
from tf_agents.policies import PolicySaver

## 2.1 save the policy of the agent by a PolicySaver

In [None]:
saved_policy_label = "a_saved_policy"

create an agent:

In [None]:
tf_agent, train_step_counter = createAgent()

create a policy saver from a given policy:

In [None]:
policy = tf_agent.policy
policy_saver = PolicySaver(policy)

On training process, save a trained policy, occasionally:

In [None]:
#
# run training process ...
#

# save trained policies
policy_saver.save(saved_policy_label)

## 2.2. load the saved policy

In [None]:
trained_policy = tf.saved_model.load(saved_policy_label)

In [None]:
print(type(trained_policy));
# >> <class 'tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject'>

In [None]:
import shutil

shutil.rmtree(saved_policy_label)

# 3. A snippet of using `CheckPointer`
Please, refer to a [tutorial](https://www.tensorflow.org/agents/api_docs/python/tf_agents/policies/PolicySaver) about `PolicySaver`.

## 3.1. save the parameters of an agents

In [None]:
from tf_agents.utils.common import Checkpointer
import os
import shutil

In [None]:
check_pointer_folder_path = "saved_agents"
if os.path.exists(check_pointer_folder_path):
    shutil.rmtree(check_pointer_folder_path)

In [None]:
def createCheckPointer(tf_agent, train_step_counter):
    check_pointer = Checkpointer(check_pointer_folder_path, max_to_keep = 1, agent = tf_agent, global_step = train_step_counter)
    return check_pointer

In [None]:
tf_agent, train_step_counter = createAgent()
check_pointer = createCheckPointer(tf_agent, train_step_counter)

On training process, save a trained policy, occasionally:

In [None]:
#
# run training process ...
#

train_step_counter = train_step_counter + 32

# save trained policies
check_pointer.save(global_step=train_step_counter)

In [None]:
print(tf_agent.trainable_variables[0][0,:3], train_step_counter)

## 3.2. load the saved parameters

Replace the old agent with a new one

In [None]:
tf_agent, train_step_counter = createAgent()
print(tf_agent.trainable_variables[0][0,:3], train_step_counter)

The above agent has different values of parameters from the saved agents.

In [None]:
check_pointer = createCheckPointer(tf_agent, train_step_counter)
print(tf_agent.trainable_variables[0][0,:3], train_step_counter)

The saved parameters have been loaded, then the agent instance has the same values of parameters with the saved agent.

In [None]:
import shutil

shutil.rmtree(check_pointer_folder_path)