# Training

The API provides both a [Unity ml-agents](https://github.com/Unity-Technologies/ml-agents) and an [OpenAI Gym](https://github.com/openai/gym) interface. We include training examples for both in [the examples folder](./examples); the former uses ml-agents' own training library which is optimised for the environment, the latter uses [OpenAI baselines](https://github.com/openai/baselines).


In this notebook we show you how to run the `animal-ai` trainers which are optimized for training on the AnimalAI environment. It's a powerful modular library you can tinker with in order to implement your own algorithms. We strongly recommend that you have a look at its various parts (described at the end of this tutorial) should you wish to make modifications.

## Can your agent self control? - Part II

If you haven't done so already, go through the environement tutorial where we decribe the problem of self-control in animals. We created a curriculum which includes increasingly difficult levels in which the agent must retrieve food, while being introduced to objects similar to those in the final experiment, without encountering the exact testing configuration(s).

Having created a curriculum in the previous notebook, we now need to configure the training environment. The `animalai-train` library provides all the tools you'll need to train using PPO or SAC - we'll be using the former here.

First, we need to set all the hyperparameters of our model, which is done by creating a yaml file as follows:

In [1]:
with open("configurations/training_configurations/train_ml_agents_config_ppo.yaml") as f:
    print(f.read())

AnimalAI:
    trainer: ppo
    epsilon: 0.2
    lambd: 0.95
    learning_rate: 3.0e-4
    learning_rate_schedule: linear
    memory_size: 128
    normalize: false
    sequence_length: 64
    summary_freq: 10000
    use_recurrent: false
    vis_encode_type: simple
    time_horizon: 128
    batch_size: 64
    buffer_size: 2024
    hidden_units: 256
    num_layers: 1
    beta: 1.0e-2
    max_steps: 0.5e7
    num_epoch: 3
    reward_signals:
        extrinsic:
            strength: 1.0
            gamma: 0.99
        curiosity:
            strength: 0.01
            gamma: 0.99
            encoding_size: 256


If you're already familiar with RL algorithms in general, these parameters should be fairly self-explanatory. Nonetheless, you can have a look at [this page](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Configuration-File.md) for explanations of the parameters specified for both PPO and SAC.

You then need to configure the trainer, which is just a named tuple defining parameters such as:
- the paths to the environment and your configuration file (above) 
- how many environments to launch in parallel and with how many agent per environment
- the path to your curriculum
- and many more!

This is all done as follows:

In [3]:
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

from mlagents.trainers.trainer_util import load_config;
from animalai_train.run_options_aai import RunOptionsAAI;
from animalai_train.run_training_aai import run_training_aai;


trainer_config_path = (
    "configurations/training_configurations/train_ml_agents_config_ppo.yaml"
)
environment_path = "env/AnimalAI"
curriculum_path = "configurations/curriculum"
run_id = "Agent_training_2"
base_port = 5005
number_of_environments = 6
number_of_arenas_per_environment = 12

args = RunOptionsAAI(
    trainer_config=load_config(trainer_config_path),
    env_path=environment_path,
    run_id=run_id,
    base_port=base_port,
    num_envs=number_of_environments,
    curriculum_config=curriculum_path,
    n_arenas_per_env=number_of_arenas_per_environment,
)

Once this is done we're pretty much just left with a one liner! The training library isn't verbose, but you can monitor training via Tensorboard. The first few lines just load tensorboard, once it is launched and you can see the orange window below, just click on the refresh button in the top right of Tensorboard - graphs will appear after a few training steps.

_Note_: in case you don't want to wait for the model to train, you can jump ahead to the next step as we provide a pre-trained model for inference.

In [4]:
import os
# logging.getLogger('tensorflow').disabled = True

logs_dir = "summaries/"
os.makedirs(logs_dir, exist_ok=True)
%load_ext tensorboard
%tensorboard --logdir {logs_dir}

run_training_aai(0, args)

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 10148), started 2:14:16 ago. (Use '!kill 10148' to kill it.)

Converting ./models/Agent_training_2/AnimalAI/frozen_graph_def.pb to ./models/Agent_training_2/AnimalAI.nn
GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape'
IN: 'visual_observation_0': [-1, 84, 84, 3] => 'policy/main_graph_0_encoder0/conv_1/BiasAdd'
IN: 'vector_observation': [-1, 1, 1, 3] => 'policy/main_graph_0/hidden_0/BiasAdd'
IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice'
IN: 'action_masks': [-1, 1, 1, 6] => 'policy_1/strided_slice_1'
OUT: 'policy/concat/concat', 'action'
DONE: wrote ./models/Agent_training_2/AnimalAI.nn file.


# Prueba del Modelo

La siguiente celda permite cargar el modelo entrenado y ponerlo a prueba.

**Integrantes**

Álvaro Jimenez

Andrés Quintero

Jheyson Villavisan

In [None]:
import warnings
warnings.filterwarnings('ignore')
from animalai.envs.arena_config import ArenaConfig
from animalai_train.run_options_aai import RunOptionsAAI;
from animalai_train.run_training_aai import run_training_aai;
from mlagents.trainers.trainer_util import load_config;

trainer_config_path = (    "configurations/training_configurations/train_ml_agents_config_ppo.yaml")
environment_path = "env/AnimalAI"
curriculum_path = "configurations/curriculum"
run_id = "Agent_training_2"
base_port = 5005
number_of_environments = 6
number_of_arenas_per_environment = 12

args = RunOptionsAAI(
    trainer_config=load_config(trainer_config_path),
    env_path=environment_path,
    run_id=run_id,
    base_port=base_port+3,
    load_model=True,
    train_model=False,
    arena_config=ArenaConfig("configurations/TEST/1-22-3.yml")
)
run_training_aai(0, args)












Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
Instructions for updating:
Please use `layer.__call__` method instead.
Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Use keras.layers.Dense instead.

Instructions for updating:
Use `tf.random.categorical` instead.





Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:tensorflow:Restoring parameters from ./models/Agent_training_2/AnimalAI\model-2000108.ckpt


In [None]:
if environment:
    environment.close() # takes a few seconds

You should see the agent get the reward about 50% of the time. It's far from perfect, but it's a good start! Remember, this problem is meant to be hard! You can now have a go at making your own algorithm to train agents that can solve one or more tasks in the `competition_configurations` folder!

## Using ML-Agents and AnimalAI for your algorithms

As mentioned earlier, AnimalAI is built on top of ML-Agents, and we strongly recommend that you have a look at the various bits and pieces which you can tinker with in order to implement your own agents. This part provides a brief overview of where you can find these parts at the heart of most RL algortihms. We'll start from high level controllers and work our way down to the basic bricks of RL algorithms. Should you wish to modify them, you'll need to clone the [ml-agents repository](https://github.com/Unity-Technologies/ml-agents).

- `animalai_train.run_training`: contains the highest level of control for training an agent. You can find all the subroutines you need in order to do so. The most import ones are:
    - `animalai_train.subprocess_env_manager_aai.SubprocessEnvManagerAAI`: an environment manager which derives from `mlagents.trainers.subprocess_env_manager.SubprocessEnvManager` and can run multiple environments in parallel. In prcatice you shouldn't need to change this part.
    - `mlagents.trainers.trainer_util.TrainerFactory`: a factory method which is in charge of creating trainer methods to manage the agents in the environment. In practice we only have a single type of agent in all of the environments, therefore there will only be one trainer to manage all the agents. **You might need to change this code** if you add a new RL algorithm, as it was designed to handle PPO and SAC.
    - `animalai_train.trainer_controller_aai.TrainerControllerAAI`: derives from `mlagents.trainers.trainer_controller.TrainerController` and is where the training loop is.

The basic elements which are most likely to be of interest to you:

- **Curriculum**: managed in `animalai_train.meta_curriculum_aai.MetaCurriculumAAI` and `animalai_train.meta_curriculum_aai.CurriculumAAI`.
- **RL algo**: you can find the implementations for PPO and SAC in `mlagents.trainers.ppo.trainer` and `mlagents.trainers.sac.trainer` respectively. They both implment the base class `mlagents.trainers.trainer.trainer` which you can implement and plug directly into the overall training setup (managing all the necessary model parameters in the `TrainerFactory` mentioned above).
- **Exploration**: there is a curiosity module already provided in `mlagents.trainers.components.reward_signals`.
- **Buffer**: the agent's replay buffer is in `mlagents.trainers.buffer`.

There are many more components which you can find; two which are not implemented for AnimalAI, but which are on our todo list, are imitation learning and the option to record player actions in the environmnent.

That's pretty much all there is to know, we hope you enjoy the environment!