# OmniSafe Tutorial - Getting Started

OmniSafe: https://github.com/PKU-Alignment/omnisafe

Documentation: https://omnisafe.readthedocs.io/en/latest/

Safety-Gymnasium: https://www.safety-gymnasium.com/

[Safety-Gymnasium](https://www.safety-gymnasium.com/) is a highly scalable and customizable Safe Reinforcement Learning library, aiming to deliver a good view of benchmarking Safe Reinforcement Learning (Safe RL) algorithms and a more standardized setting of environments. 


## Introduction

In this tutorial, we will present some fundamental applications of OmniSafe. Through clear and comprehensible examples, we aim to enable OmniSafe to promptly and dependably serve users in their research and production tasks. With the introduction of this chapter and the subsequent tutorials, we aspire to assist you in establishing a comprehensive comprehension of the characteristics and design philosophy of OmniSafe.


In [None]:
# Install from PyPI. if you already have OmniSafe installed, please ignore the code in this cell.
!pip install omnisafe

In [None]:
# Install OmniSafe and dependencies from source
## if you already have OmniSafe installed, please ignore the code in this cell.
## Clone the repo
!git clone https://github.com/PKU-Alignment/omnisafe
%cd omnisafe

## Install OmniSafe
!pip install -e .

## Basic Usage Examples

### Train from default configs

By executing four lines of code, we can train a SafeRL agent using PPOLag.

During the development process, OmniSafe underwent extensive testing and adjustments. The **default hyperparameters** are designed to achieve the best overall performance across as many benchmark environments as possible with **minimal tuning** and **tricks**. For the initial usage, it is advisable to overlook all the details and indulge in the joy of ready-to-use functionality.

With the specification of the environment ID and algorithm, you can promptly commence your SafeRL journey.


In [None]:
import omnisafe


env_id = 'SafetyPointGoal0-v0'

agent = omnisafe.Agent('PPOLag', env_id)
agent.learn()

The results of the execution will be automatically saved to the directory where you run the Python script.



### Train from custom dict

Hyperparameters in reinforcement learning have a significant impact on performance. After getting a taste of the default settings, you can explore new insights and methods by specifying parameter values through a dictionary when facing specific problems. You can refer to the default parameters and their formats in the GitHub repo [here](https://github.com/PKU-Alignment/omnisafe/tree/main/omnisafe/configs).

The following code runs `two epochs` with a total of `2048` interactions and updates the policy every `1024` steps on `SafetyPointGoal1-v0`. The environment and thread parallelism are both set to `1`.

In [None]:
import omnisafe


env_id = 'SafetyPointGoal1-v0'
custom_cfgs = {
    'train_cfgs': {
        'total_steps': 2048,
        'vector_env_nums': 1,
        'parallel': 1,
    },
    'algo_cfgs': {
        'steps_per_epoch': 1024,
        'update_iters': 1,
    },
    'logger_cfgs': {
        'use_wandb': False,
    },
}

agent = omnisafe.Agent('TRPO', env_id, custom_cfgs=custom_cfgs)
agent.learn()

### Render and evaluate your policy

RL has made significant progress in the past, but in the present day, we believe that the performance of an algorithm should not be solely evaluated by the accumulated reward of the agent. Instead, more emphasis should be placed on whether the learning of the agent results in meaningful behavior, especially when safe factors are taken into consideration. It is crucial to assess whether the agent can actually produce decision sequences with safety constraints.

Therefore, OmniSafe supports fast rendering and evaluation of policy models. After training is complete, you can easily visualize the results and complete a one-stop workflow in OmniSafe, saving valuable time.

The following line of code will plot the training curve of the agent that has been trained above, displaying the changes of reward and cost over the course of the entire training process with respect to the number of interactions.

In [None]:
agent.plot(smooth=1)

After reviewing the training curve, whether you are satisfied or disappointed with the results, you must be eager to understand the behavior that led to such a curve. Therefore, OmniSafe supports the entire workflow from training to visualization and data analysis.

Running visualization on cloud containers necessitates some additional dependencies.

In [None]:
%%bash
apt-get install libosmesa6-dev
apt-get install python3-opengl

In [None]:
%env MUJOCO_GL=osmesa
%env PYOPENGL_PLATFORM=osmesa

In [None]:
agent.render(num_episodes=1, render_mode='rgb_array', width=256, height=256)

Try playing the video by inputting the provided file path above!

In [None]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay


def show_videos(video_path='', prefix=''):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            '''<video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>'''.format(
                mp4, video_b64.decode('ascii')
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))


# Please fill the path of folder containing your video which is shown above here
show_videos(video_path='')

Alternatively, you may simply wish to observe how the converged policy performs numerically during testing.

In [None]:
agent.evaluate(num_episodes=1)

We aim to provide abundant information so that you can gain new insights. Through the aforementioned approach, you can gain a multidimensional understanding of this experiment.



Certainly, you can also analyze the saved policies from the past. Hey! Try pasting the saved policy path above and running the following script, which is the most common usage of Evaluator. It visualizes all models under the specified experiment path in the given manner. Generally, **this code is more flexible and useful on your personal server than in Colab**.

By modifying the code, you can visualize model files under any path. Several crucial parameters are essential to understand for visualization.

`render_mode`: This parameter specifies the display mode during visualization, which typically includes `rgb_array`, `depth_array`, `human`. You may encounter difficulties when specifying them on a server without a display, so please refer to [issue72](https://github.com/PKU-Alignment/omnisafe/issues/72) and [issue27](https://github.com/PKU-Alignment/omnisafe/issues/27).

`camera_name`: Its value depends on the design of the environment library. In Safety-Gymnasium, it supports [these](https://www.safety-gymnasium.com/en/latest/api/builder.html#safety_gymnasium.builder.Builder.__init__).

`width, height`: These two parameters are used to specify the image resolution. The larger the numerical value, the higher the image quality, and the greater the demand for hardware resources. We suggest you try different values according to your hardware resources and choose the one that meets your needs.

In [None]:
# Single Python File
import os
import omnisafe

# Just fill your experiment's log directory in here.
# Such as: ~/omnisafe/examples/runs/PPOLag-{SafetyPointGoal1-v0}/seed-000-2023-03-07-20-25-48
LOG_DIR = './runs/PPOLag-{SafetyPointGoal1-v0}/seed-000-2023-04-01-02-44-35'
evaluator = omnisafe.Evaluator(render_mode='rgb_array')
for item in os.scandir(os.path.join(LOG_DIR, 'torch_save')):
    if item.is_file() and item.name.split('.')[-1] == 'pt':
        evaluator.load_saved(
            save_dir=LOG_DIR, model_name=item.name, camera_name='track', width=256, height=256
        )
        evaluator.render(num_episodes=1)
        evaluator.evaluate(num_episodes=1)

### Benchmark your research
During the research process, running a large number of experiments is often a laborious and error-prone task. To allow researchers to focus on valuable work, OmniSafe provides the experiment grid module. With it, you can quickly and reliably run a large number of experiments by providing all the possible parameter values of interest. This module is incredibly convenient, whether you're creating baselines for your algorithm or tuning parameters.





From an implementation perspective, you can simply understand it as:

1. Generating all feasible combinations of parameters.
2. Using Python's multiprocessing tool to execute the same function with different parameters simultaneously, according to the specified parallelism.

Therefore, you need to define a function that needs to be executed through `experiment_grid`. We have already done this for you, and you can simply copy our [example](https://github.com/PKU-Alignment/omnisafe/tree/main/examples), or you can refer to the examples on GitHub, which contain code examples for almost all the content discussed in this section.

**Note**: You don't need to pay attention to the implementation details of this code, and only need to modify it when you have highly customized requirements.







In [None]:
"""Example of training a policy from exp-x config with OmniSafe."""

import os
import sys
import warnings

import torch

import omnisafe
from omnisafe.common.experiment_grid import ExperimentGrid
from omnisafe.typing import NamedTuple, Tuple


def train(
    exp_id: str, algo: str, env_id: str, custom_cfgs: NamedTuple
) -> Tuple[float, float, float]:
    """Train a policy from exp-x config with OmniSafe.

    Args:
        exp_id (str): Experiment ID.
        algo (str): Algorithm to train.
        env_id (str): The name of test environment.
        custom_cfgs (NamedTuple): Custom configurations.
        num_threads (int, optional): Number of threads. Defaults to 6.
    """
    terminal_log_name = 'terminal.log'
    error_log_name = 'error.log'
    if 'seed' in custom_cfgs:
        terminal_log_name = f'seed{custom_cfgs["seed"]}_{terminal_log_name}'
        error_log_name = f'seed{custom_cfgs["seed"]}_{error_log_name}'
    sys.stdout = sys.__stdout__
    sys.stderr = sys.__stderr__
    print(f'exp-x: {exp_id} is training...')
    if not os.path.exists(custom_cfgs['logger_cfgs']['log_dir']):
        os.makedirs(custom_cfgs['logger_cfgs']['log_dir'], exist_ok=True)
    # pylint: disable-next=consider-using-with
    sys.stdout = open(
        os.path.join(f'{custom_cfgs["logger_cfgs"]["log_dir"]}', terminal_log_name),
        'w',
        encoding='utf-8',
    )
    # pylint: disable-next=consider-using-with
    sys.stderr = open(
        os.path.join(f'{custom_cfgs["logger_cfgs"]["log_dir"]}', error_log_name),
        'w',
        encoding='utf-8',
    )
    agent = omnisafe.Agent(algo, env_id, custom_cfgs=custom_cfgs)
    reward, cost, ep_len = agent.learn()
    return reward, cost, ep_len


Continuing, you can create an instance of the `experiment grid` and call `eg.add` to specify your parameters. For multiple parameters, please store them in a list.

**Note**: whether to call it `eg` depends on the variable name you specify for the `experiment grid` instance.

In [None]:
eg = ExperimentGrid(exp_name='Tutorial_benchmark')

# Set the algorithms.
base_policy = ['PolicyGradient', 'NaturalPG', 'TRPO', 'PPO']

# Set the environments.
mujoco_envs = [
    'SafetyAntVelocity-v1',
    'SafetyHopperVelocity-v1',
    'SafetyHumanoidVelocity-v1',
]
eg.add('env_id', mujoco_envs)
eg.add('algo', base_policy)
eg.add('logger_cfgs:use_wandb', [False])
eg.add('train_cfgs:vector_env_nums', [1])
eg.add('train_cfgs:torch_threads', [1])
eg.add('train_cfgs:total_steps', [2048])
eg.add('algo_cfgs:steps_per_epoch', [1024])
eg.add('seed', [0])

CUDA is a powerful acceleration tool for machine learning, and we provide support for it as well. You can evenly distribute your experiments among multiple GPUs for execution. Here is an example to illustrate this. (As Colab does not support it, we have commented out the code. You may try to use it on your own machine.)

In [None]:
# # Set the device.
# avaliable_gpus = [num for num in range(torch.cuda.device_count())]
# gpu_id = [0, 1, 2, 3]
# # if you want to use CPU, please set gpu_id = None
# # gpu_id = None

# if set(gpu_id) > set(avaliable_gpus):
#     warnings.warn('The GPU ID is not available, use CPU instead.')
#     gpu_id = None

Next, you can specify the number of processes to run in parallel and make full use of your machine's capabilities!

One important point to note is that we recommend setting the value of `num_pool` to a number that can evenly divide the total number of tasks. This ensures that your computer's workload is evenly distributed at all times, maximizing its computing power.

In [None]:
# total experiment num must can be divided by num_pool
# meanwhile, users should decide this value according to their machine
eg.run(train, num_pool=12)

If you are using CUDA, the calling method will be slightly different, as you need to pass in `gpu_id`.

In [None]:
# eg.run(train, num_pool=12, gpu_id=gpu_id)


Once the training is complete, you can use various data analysis tools mentioned earlier to analyze the experimental results from different perspectives. These tools are independent modules that can be called either simultaneously during training or separately after training. We will explain in detail how to **use and combine** these modules flexibly in later sections.

In the following code block, you can specify a parameter and analyze the impact of its different values on performance.

`parameter`: specifies the parameter whose impact needs to be analyzed.

`values`: specifies several values that need to be displayed on the same graph for comparison.

`compare_num`: specifies the maximum number of values to be displayed on the same graph for comparison.

`cost_limit`: specifies the cost threshold to be plotted on the graph.

**Note**: `values` and `compare_num` are conflicting parameters and cannot be specified simultaneously. If both are set to None, `compare_num` will default to the maximum feasible value.

Here are two possible ways to use this functionality:

1. Analyzing specified parameter values.

We have completed our benchmark on three different environments, and now we want to compare the performance of PPO and PolicyGradient on these environments.

In [None]:
# just fill in the name of the parameter of which value you want to compare.
# then you can specify the value of the parameter you want to compare,
# or you can just specify how many values you want to compare in single graph at most,
# and the function will automatically generate all possible combinations of the graph.
# but the two mode can not be used at the same time.
eg.analyze(parameter='algo', values=['PPO', 'PolicyGradient'], compare_num=None, cost_limit=None)

2. Comparing all possible scenarios to find the best algorithm.

After a hard day's work, we don't want to look at confusing graphs. So let's give our brains a break. You can ask OmniSafe to generate all possible graphs containing up to three algorithms on a single image. OmniSafe will automatically generate all possible graphs with three algorithms in your current experiment

In [None]:
eg.analyze(parameter='algo', values=None, compare_num=3, cost_limit=None)

Of course, as before, you can also visualize all models. Leave all the tedious work to OmniSafe and just review the final results.



In [None]:
eg.render(num_episodes=1, render_mode='rgb_array', width=256, height=256)
eg.evaluate(num_episodes=1)

Certainly, you can also use Python code to analyze past experiments using this tool. Here's a simple example:

In [None]:
# Single Python File
from omnisafe.common.statistics_tools import StatisticsTools


# just fill in the path in which experiment grid runs.
path = ''
st = StatisticsTools()
st.load_source(path)
# just fill in the name of the parameter of which value you want to compare.
# then you can specify the value of the parameter you want to compare,
# or you can just specify how many values you want to compare in single graph at most,
# and the function will automatically generate all possible combinations of the graph.
# but the two mode can not be used at the same time.
st.draw_graph(parameter='', values=None, compare_num=2, cost_limit=None, show_image=True)

After reviewing the above example and explanation, you have acquired the fundamental usage of OmniSafe. In the following section, we will introduce you to the **CLI** tool in OmniSafe.