Skip to content

Latest commit

 

History

History
150 lines (94 loc) · 4.93 KB

release_notes.rst

File metadata and controls

150 lines (94 loc) · 4.93 KB

Release Notes

If you need any of the features from the pre-release version listed under "Upcoming" you can just install coax from the main branch:

$ pip install git+https://github.com/coax-dev/coax.git@main

Upcoming

  • ...

v0.1.13

  • Switch from legacy gym to gymnasium (#21)
  • Upgrade dependencies.

v0.1.12

v0.1.11

  • Bug fix: #21
  • Fix deprecation warnings from using jax.tree_multimap and gym.envs.registry.env_specs.
  • Upgrade dependencies.

v0.1.10

  • Bug fixes: #16
  • Replace old jax.ops.index* scatter operations with the new jax.numpy.ndarray.at interface.
  • Upgrade dependencies.

v0.1.9

Bumped version to drop hard dependence on ray.

v0.1.8

Implemented stochastic q-learning using quantile regression in coax.StochasticQ, see example: IQN <examples/cartpole/iqn>

  • Use coax.utils.quantiles for equally spaced quantile fractions as in QR-DQN.
  • Use coax.utils.quantiles_uniform for uniformly sampled quantile fractions as in IQN.

v0.1.7

This is not much of a release. It's only really the dependencies that were updated.

v0.1.6

  • Add basic support for distributed agents, see example: Ape-X DQN <examples/atari/apex_dqn>
  • Fixed issues with serialization of jit-compiled functions, see jax#5043 and jax#5153
  • Add support for sample weights in reward tracers

v0.1.5

  • Implemented coax.td_learning.SoftQLearning.
  • Added soft q-learning stub <examples/stubs/soft_qlearning> and script <examples/atari/dqn_soft>.
  • Added serialization utils: coax.utils.dump, coax.utils.dumps, coax.utils.load, coax.utils.loads.

v0.1.4

Implemented Prioritized Experience Replay:

  • Implemented SegmentTree <coax.experience_replay.SegmentTree> that allows for batched updating.
  • Implemented SumTree <coax.experience_replay.SumTree> subclass that allows for batched weighted sampling.
  • Drop TransitionSingle (only use TransitionBatch <coax.reward_tracing.TransitionBatch> from now on).
  • Added TransitionBatch.from_single <coax.reward_tracing.TransitionBatch.from_single> constructor.
  • Added TransitionBatch.idx <coax.reward_tracing.TransitionBatch.idx> field to identify specific transitions.
  • Added TransitionBatch.W <coax.reward_tracing.TransitionBatch.W> field to collect sample weights
  • Made all td_learning <coax.td_learning> and policy_objectives <coax.policy_objectives> updaters compatible with TransitionBatch.W <coax.reward_tracing.TransitionBatch.W>
  • Implemented the PrioritizedReplayBuffer <coax.experience_replay.PrioritizedReplayBuffer> class itself.
  • Added scripts and notebooks: agent stub <examples/stubs/dqn_per> and pong <examples/atari/dqn_per>.

Other utilities:

  • Added FrameStacking <coax.wrappers.FrameStacking> wrapper that respects the gym.space API and is compatible with the jax.tree_util module.
  • Added data summary (min, median, max) for arrays in pretty_repr <coax.utils.pretty_repr> util.
  • Added StepwiseLinearFunction <coax.utils.StepwiseLinearFunction> utility, which is handy for hyperparameter schedules, see example usage here <examples/stubs/dqn_per>.

v0.1.3

Implemented Distributional RL algorithm:

  • Added two new methods to all proba_dists: mean and affine_transform, see coax.proba_dists.
  • Made TD-learning updaters compatible with coax.StochasticV and coax.StochasticQ.
  • Made value-based policies compatible with coax.StochasticQ.

v0.1.2

First version to go public.