Skip to content

MO-Gymnasium becomes mature

Compare
Choose a tag to compare
@ffelten ffelten released this 12 Jun 15:30
· 49 commits to main since this release
8fe7d78

MO-Gymnasium 1.0.0 Release Notes

We are thrilled to introduce the mature release of MO-Gymnasium, a standardized API and collection of environments designed for Multi-Objective Reinforcement Learning (MORL).

MORL expands the capabilities of RL to scenarios where agents need to optimize multiple objectives, which may potentially conflict with each other. Each objective is represented by a distinct reward function. In this context, the agent learns to make trade-offs between these objectives based on a reward vector received after each step. For instance, in the well-known Mujoco halfcheetah environment, reward components are combined linearly using predefined weights as shown in the following code snippet from Gymnasium:

ctrl_cost = self.control_cost(action)
forward_reward = self._forward_reward_weight * x_velocity
reward = forward_reward - ctrl_cost

With MORL, users have the flexibility to determine the compromises they desire based on their preferences for each objective. Consequently, the environments in MO-Gymnasium do not have predefined weights. Thus, MO-Gymnasium extends the capabilities of Gymnasium to the multi-objective setting, where the agents receives a vectorial reward.

For example, here is an illustration of the multiple policies learned by an MORL agent for the mo-halfcheetah domain, balancing between saving battery and speed:

This release marks the first mature version of MO-Gymnasium within Farama, indicating that the API is stable, and we have achieved a high level of quality in this library.

API

import gymnasium as gym
import mo_gymnasium as mo_gym
import numpy as np

# It follows the original Gymnasium API ...
env = mo_gym.make('minecart-v0')

obs, info = env.reset()
# but vector_reward is a numpy array!
next_obs, vector_reward, terminated, truncated, info = env.step(your_agent.act(obs))

# Optionally, you can scalarize the reward function with the LinearReward wrapper.
# This allows to fall back to single objective RL
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

We support environments ranging from MORL literature to inherently multi-objective problems in the RL literature such as Mujoco. An exhaustive list of environments is available on our documentation website.

Wrappers

Additionally, we provide a set of wrappers tailor made for MORL, such as MONormalizeReward which normalizes an element of the reward vector, or LinearWrapper which transforms the MOMDP into an MDP. See also our documentation.

New features and improvements

  • Bump highway-env version in #50
  • Add mo-lunar-lander-continuous-v2 and mo-hopper-2d-v4 environments in #51
  • Add normalized action option to water-reservoir-v0 in #52
  • Accept zero-dimension numpy array as discrete action in #55
  • Update pre-commit versions and fix small spelling mistake in #56
  • Add method to compute known Pareto Front of fruit tree in #57
  • Improve reward bounds on: Mario, minecart, mountain car, resource gathering, reacher in #59, #60, #61
  • Add Python 3.11 support, drop Python 3.7 in #65

Bug fixes and documentation updates

  • Fix water-reservoir bug caused by numpy randint deprecation in #53
  • Fix missing edit button in website in #58
  • Fix reward space and add reward bound tests in #62
  • Add MO-Gymnasium logo to docs in #64

Full Changelog: v0.3.4...v1.0.0