If you need any of the features from the pre-release version listed under "Upcoming" you can just install coax from the main branch:
$ pip install git+https://github.com/coax-dev/coax.git@main
- ...
- Switch from legacy
gym
togymnasium
(#21) - Upgrade dependencies.
- Add DeepMind Control Suite example (#29); see
/examples/dmc/sac
. - Add
coax.utils.sync_shared_params
utility; example inA2C stub </examples/stubs/a2c>
. - Improved performance for replay buffer (#25)
- Bug fix: random_seed in _prioritized (#24)
- Update to new Jax API (#27)
- Add Update to
gym==0.26.x
(#28). - Bug fix: set logging level on
TrainMonitor.logger
itself (550a965 <https://github.com/coax-dev/coax/commit/550a965d17002bf552ab2fbea49801c65b322c7b>_). - Bug fix: fix affine transform for composite distributions (48ca9ce <https://github.com/coax-dev/coax/commit/48ca9ced42123e906969076dff88540b98e6d0bb>_)
- Bug fix: #33
- Bug fix: #21
- Fix deprecation warnings from using
jax.tree_multimap
andgym.envs.registry.env_specs
. - Upgrade dependencies.
- Bug fixes: #16
- Replace old
jax.ops.index*
scatter operations with the newjax.numpy.ndarray.at
interface. - Upgrade dependencies.
Bumped version to drop hard dependence on ray
.
Implemented stochastic q-learning using quantile regression in coax.StochasticQ
, see example: IQN <examples/cartpole/iqn>
- Use
coax.utils.quantiles
for equally spaced quantile fractions as in QR-DQN. - Use
coax.utils.quantiles_uniform
for uniformly sampled quantile fractions as in IQN.
This is not much of a release. It's only really the dependencies that were updated.
- Add basic support for distributed agents, see example:
Ape-X DQN <examples/atari/apex_dqn>
- Fixed issues with serialization of jit-compiled functions, see jax#5043 and jax#5153
- Add support for sample weights in reward tracers
- Implemented
coax.td_learning.SoftQLearning
. - Added soft q-learning
stub <examples/stubs/soft_qlearning>
andscript <examples/atari/dqn_soft>
. - Added serialization utils:
coax.utils.dump
,coax.utils.dumps
,coax.utils.load
,coax.utils.loads
.
Implemented Prioritized Experience Replay:
- Implemented
SegmentTree <coax.experience_replay.SegmentTree>
that allows for batched updating. - Implemented
SumTree <coax.experience_replay.SumTree>
subclass that allows for batched weighted sampling. - Drop TransitionSingle (only use
TransitionBatch <coax.reward_tracing.TransitionBatch>
from now on). - Added
TransitionBatch.from_single <coax.reward_tracing.TransitionBatch.from_single>
constructor. - Added
TransitionBatch.idx <coax.reward_tracing.TransitionBatch.idx>
field to identify specific transitions. - Added
TransitionBatch.W <coax.reward_tracing.TransitionBatch.W>
field to collect sample weights - Made all
td_learning <coax.td_learning>
andpolicy_objectives <coax.policy_objectives>
updaters compatible withTransitionBatch.W <coax.reward_tracing.TransitionBatch.W>
- Implemented the
PrioritizedReplayBuffer <coax.experience_replay.PrioritizedReplayBuffer>
class itself. - Added scripts and notebooks:
agent stub <examples/stubs/dqn_per>
andpong <examples/atari/dqn_per>
.
Other utilities:
- Added
FrameStacking <coax.wrappers.FrameStacking>
wrapper that respects thegym.space
API and is compatible with thejax.tree_util
module. - Added data summary (min, median, max) for arrays in
pretty_repr <coax.utils.pretty_repr>
util. - Added
StepwiseLinearFunction <coax.utils.StepwiseLinearFunction>
utility, which is handy for hyperparameter schedules, see example usagehere <examples/stubs/dqn_per>
.
Implemented Distributional RL algorithm:
- Added two new methods to all proba_dists:
mean
andaffine_transform
, seecoax.proba_dists
. - Made TD-learning updaters compatible with
coax.StochasticV
andcoax.StochasticQ
. - Made value-based policies compatible with
coax.StochasticQ
.
First version to go public.