Major improvements
New features:
- New algorithm: Deep RL from Human Preferences (thanks to @ejnnr @norabelrose et al)
- Notebooks with examples (thanks to @ernestum)
- Serialized trajectories using NumPy arrays rather than pickles, ensuring stability across versions and saving space on disk (thanks to @norabelrose)
- Weights and Biases logging support (thanks to @yawen-d)
Improvements:
- Port MCE IRL from JAX to Torch, eliminating the JAX dependency. (thanks to @qxcv)
- Refactor RewardNet code to be independent from AIRL, and shared across algorithms. (thanks to @ejnnr)
- Add Windows support including continuous integration. (thanks to @taufeeque9)