# Exercise 03. Tune the hyperparameters of a RLlib Multi-Agent Model using Ray Tune

© 2019-2022, Anyscale. All Rights Reserved

### Learning objectives
In this this tutorial, you will learn:
 * [How to configure Ray Tune to find solid hyperparameters more easily](#configure_ray_tune)
 * [The details behind Ray RLlib resource allocation](#resource_allocation)
 

### How to configure Ray Tune to find solid hyperparameters more easily <a class="anchor" id="multi_agent_env"></a>

💡 <b>Right-click on the cell below and choose "Enable Scrolling for Outputs"!</b>  This will make it easier to view, since model training output can be very long!

In [None]:
###############
# EXAMPLE USING RAY TUNE API .run() IN A LOOP UNTIL STOP CONDITION
# Note about Ray Tune verbosity.
# Screen verbosity in Ray Tune is defined as verbose = 0, 1, 2, or 3, where:
# 0 = silent
# 1 = only status updates, no logging messages
# 2 = status and brief trial results, includes logging messages
# 3 = status and detailed trial results, includes logging messages
# Defaults to 3.
###############

# To start fresh, restart Ray in case it is already running
if ray.is_initialized():
    ray.shutdown()

verbosity = 3 # Tune screen verbosity

# Define trainer runtime config values
checkpoint_freq = 1                     # freq save checkpoints >= evaulation_interval
checkpoint_at_end = True                # always save last checkpoint
relative_checkpoint_dir = "multiagent_PPO_logs" # redirect logs instead of ~/ray_results/


experiment_results = tune.run("PPO", 
    # Stopping criteria whichever occurs first: average reward over training episodes, or ...
    stop={#"episode_reward_mean": 400, # stop if average (sum of) rewards in an episode is 400 or more
          "training_iteration": 5,  # stop after 5 training iterations (calls to `Algorithm.train()`)
          # "timesteps_total": 100000,  # stop if reached 100,000 sampling timesteps
         },  
              
    # training config params
    config = config.to_dict(),
                    
    # redirect logs instead of default ~/ray_results/
    local_dir = relative_checkpoint_dir,
         
    # set frequency saving checkpoints >= evaulation_interval
    checkpoint_freq = checkpoint_freq,
    checkpoint_at_end=True,
         
    # Reduce logging messages
    verbose = verbosity,
                   
    # Define what we are comparing for, when we search for the
    # "best" checkpoint at the end.
    metric="episode_reward_mean",
    mode="max",
    )

print("Training completed.")
print("Best checkpoint: ", experiment_results.best_checkpoint)


### The details behind Ray RLlib resource allocation <a class="anchor" id="multi_agent_env"></a>



## HomeWork



### Exercises <a ></a>

1. 

 ## References
 * 