Major improvements

AdamGleave released this 26 Jul 21:07

· 147 commits to master since this release

New features:

New algorithm: Deep RL from Human Preferences (thanks to @ejnnr @norabelrose et al)
Notebooks with examples (thanks to @ernestum)
Serialized trajectories using NumPy arrays rather than pickles, ensuring stability across versions and saving space on disk (thanks to @norabelrose)
Weights and Biases logging support (thanks to @yawen-d)

Improvements:

Port MCE IRL from JAX to Torch, eliminating the JAX dependency. (thanks to @qxcv)
Refactor RewardNet code to be independent from AIRL, and shared across algorithms. (thanks to @ejnnr)
Add Windows support including continuous integration. (thanks to @taufeeque9)

Contributors

ernestum, qxcv, and 4 other contributors

Assets 2