Check that the code implementation is accurate and reasonable #34

StepNeverStop · 2021-01-06T03:13:25Z

1. fixed n-step replay buffer 2. reconstruct representation net 3. remove 'use_stack' 4. implement multi-agent algorithms with shared parameters 5. optimized agent network

StepNeverStop · 2021-07-06T10:40:04Z

检查将代码中关于运算维度的选择(dim/axis)把能设置为-1的都设置为-1。

1. fixed MLP 2. fixed gradient passing problem when sharing representation network between actor and critic, and so solved problem "one of the variables needed for gradient computation has been modified by an inplace operation"

1. fixed bugs in `iqn`, `c51`, `rainbow`, `dddqn`, `maxsqn`, `sql`, `bootstrappeddqn`, `averaged_dqn`

1. moved `logger2file` from agent class to main loop 2. updated folder `gym_env_list` 3. fixed bugs in `*.yaml` 4. added class property `n_copys` instead using `env._n_copys` 5. updated README

1. added `test.yaml` for quickly verify RLs 2. change folder name from `algos` to `algorithms` for better reading 3. removed single agent recoder, all algorithms(sarl&marl) using `SimpleMovingAverageRecoder` 4. removed `GymVectorizedType` in `common/specs.py` 5. removed `common/train/*`, and implement unified training interface in `rls/train` 6. reconstructed `make_env` function in `rls/envs/make_env` 7. optimized function `load_config` 8. moved `off_policy_buffer.yaml` to `rls/configs/buffer` 9. removed configurations like `eval_while_train`, `add_noise2buffer` etc. 10. optimized environments' configuration files 11. optimized environment wrappers and implemented unified env interface for `gym` and `unity`, see `env_base.py` 12. updated dockerfiles 13. updated README

…ng. (#34, #25) 1. updated `setup.py` 2. removed redundant packages 3. fixed bugs in unity wrapper 4. fixed bugs in agent models that occurred in continuous-action training tasks 5. fixed bugs in class `MLP`

1. optimized `iTensor_oNumpy` 2. renamed `train_time_step` to `rnn_time_steps`, `burn_in_time_step` to `burn_in_time_steps` 3. optimized `on_policy_buffer.py` 4. optimized `EpisodeExperienceReplay` 5. fixed off-policy rnn training 6. optimized&fixed `to_numpy` and `to_tensor` 7. reimplemented `call` and invoking it in `__call__`

…gzoo. (#41, #34) 1. fixed a little bug in maddpg

1. fixed bugs in maddpg and vdn 2. implemented `VDNMixer` 3. optimized parameters synchronizing function

1. adjusted reward in `vdn`

1. updated README 2. added `__repe__` in class `TargetTwin` 3. fixed bugs in marl algorithms

1. added `_has_global_state` in pettingzoo env wrapper and marl policis

1. removed redundant function 2. optimized `q_target_func`

1. optimized `vdn`

1. updated README 2. optimized representation model

@BlueFisher

…ion `squash_action`. (#34) thanks to @BlueFisher

1. fixed bug in pettingzoo wrapper that not scale continuous actions from [-1, 1] to [low, hight] 2. fixed bugs in `sac`, `sac_v`, `tac`, `maxsqn` 3. implemented `masac` 4. fixed bugs in `squash_action` 5. implemented PER in marl 6. added several env configuration files when using platform pettingzoo 7. updated README

StepNeverStop · 2021-08-31T08:00:52Z

校正RNN隐状态在使用探索策略时的迭代更新 abf6b0a
实现按策略与环境交互的间隔更新策略 abf6b0a

1. fixed rnn hidden states iteration 2. renamed `n_time_step` to `chunk_length` 2. added `train_interval` to both sarl and marl off-policy agorithms so as to control the training frequency related to data collecting 3. added `n_step_value` to calculate n-step return 4. updated README

1. renamed `iTensor_oNumpy` to `iton` 2. optimized `auto_format.py` 3. added general params `oplr_params` to initializing optimizer

*. redefine version to v0.0.1 1. removed package `supersuit` 2. implemented class `MPIEnv` 3. implemented class `VECEnv` 4. optimized env wrappers, implemented `render` method to `gyms` environment. 5. reconstructed some of returns of `env.step` from `obs` to `obs_fa` and `obs_fs`. - `obs_fa` is used to choose action based by agent/policy. For the cross point of episode i and i+1, `obs_fa` represents $observation_{i+1}^{0}$, otherwise it keeps same with `obs_fs` which represents $observation_{i}^{t}$. - `obs_fs` is used to be stored in buffer. For the cross point of episode i and i+1, `obs_fs` represents $observation_{i}^{T}$, otherwise it keeps same with `obs_fa`. 6. optimzed `rssm` related based on mentioned `obs_fs`.

1. optimized on-policy algorithms 2. renamed `cell_state` to `rnncs` 3. renamed `next_cell_state` to `rnncs_` 4. fixed bugs when storing the first experience into replay buffer 5. optimized algorithm code format. 6. fixed bugs in `c51` and `qrdqn`

StepNeverStop created this issue from a note in Tasks (In Progress) Jan 6, 2021

StepNeverStop self-assigned this Jan 6, 2021

StepNeverStop added a commit that referenced this issue Jan 6, 2021

refactor: refactor action noise and optimize td3, ddpg etc. (#34)

6ec60cd

StepNeverStop added the optimization Better performance or solution label Jan 6, 2021

StepNeverStop added a commit that referenced this issue Jan 6, 2021

fear: make dpg, ddpg, pd_ddpg close to offcial implement (#34)

6a7a11b

StepNeverStop added a commit that referenced this issue Jan 6, 2021

feat: make dpg, ddpg, pd_ddpg close to offcial implement (#34)

3bf7bc0

StepNeverStop added a commit that referenced this issue Jan 6, 2021

fix&refactor(unity): optimize unity wrapper, fix ddpg (#25, #34)

7dad01d

StepNeverStop added a commit that referenced this issue Jan 7, 2021

feat(sac): add soft_clip and set it as default (#34)

0b60db8

StepNeverStop added a commit that referenced this issue Jan 7, 2021

style(icm): optimize ICM (#34)

22e74c4

StepNeverStop added a commit that referenced this issue Jan 9, 2021

fix: fix hiro importing error and discrete action trainaing (#34)

1f374b0

StepNeverStop added a commit that referenced this issue Jan 12, 2021

fix: fix the distribution of discrete actions (#34)

3614ad7

StepNeverStop added a commit that referenced this issue Jul 11, 2021

fix&perf: added function wrapper 'iTensor_oNumpy'. (#43, #34, #31)

b1e6dc4

StepNeverStop added a commit that referenced this issue Jul 12, 2021

feat: implement tensor to device. (#43, #34)

597632f

StepNeverStop added a commit that referenced this issue Jul 28, 2021

fixed bugs. (#43, #34)

e18974d

1. fixed bugs in `iqn`, `c51`, `rainbow`, `dddqn`, `maxsqn`, `sql`, `bootstrappeddqn`, `averaged_dqn`

StepNeverStop added a commit that referenced this issue Jul 30, 2021

v4.1.4 perf(pettingzoo): added multi-agent environment support pettin…

cca69b2

…gzoo. (#41, #34) 1. fixed a little bug in maddpg

StepNeverStop added a commit that referenced this issue Jul 30, 2021

v4.1.5 feat: fixed bugs in maddpg and vdn. (#41, #34)

d4bd9ff

1. fixed bugs in maddpg and vdn 2. implemented `VDNMixer` 3. optimized parameters synchronizing function

StepNeverStop added a commit that referenced this issue Aug 25, 2021

perf: reconstruct repo(#47, #25, #46, #34, #31, #33, #39, #41, #45, #26)

67b8979

StepNeverStop added a commit that referenced this issue Aug 26, 2021

feat: added pre_action and agent_id for agent observation (#48, #34)

514a366

1. adjusted reward in `vdn`

StepNeverStop added a commit that referenced this issue Aug 27, 2021

fix(unity): fixed visual input training using Unity3D (#34, #25)

39d60c3

StepNeverStop added a commit that referenced this issue Aug 27, 2021

perf: optimized c51, rainbow (#34)

d20cf42

StepNeverStop added a commit that referenced this issue Aug 28, 2021

perf: optimized vdn (#34)

464f189

1. updated README 2. added `__repe__` in class `TargetTwin` 3. fixed bugs in marl algorithms

StepNeverStop added a commit that referenced this issue Aug 28, 2021

fix(maddpg): fixed bugs in maddpg (#34)

8f2eac9

1. added `_has_global_state` in pettingzoo env wrapper and marl policis

StepNeverStop added a commit that referenced this issue Aug 28, 2021

perf&fix: optimized rnn training (#34)

789c23c

1. removed redundant function 2. optimized `q_target_func`

StepNeverStop added a commit that referenced this issue Aug 28, 2021

v5.1.0 fix(q_target_func): fixed bugs in q_target_func (#34)

ddc8f47

StepNeverStop added a commit that referenced this issue Aug 28, 2021

feat(marl): added Qatten (#41, #34)

ad8be31

1. optimized `vdn`

StepNeverStop added a commit that referenced this issue Aug 29, 2021

v5.1.3 perf(rnn): optimized representation model (#34, #51)

23f910c

1. updated README 2. optimized representation model

StepNeverStop added a commit that referenced this issue Aug 30, 2021

fix(log_prob): fixed bugs when calculating log_prob while using funct…

078b523

…ion `squash_action`. (#34) thanks to @BlueFisher

StepNeverStop pinned this issue Aug 31, 2021

StepNeverStop added a commit that referenced this issue Sep 3, 2021

v5.1.10 perf: optimized dreamer related. (#34)

309d63f

1. renamed `iTensor_oNumpy` to `iton` 2. optimized `auto_format.py` 3. added general params `oplr_params` to initializing optimizer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check that the code implementation is accurate and reasonable #34

Check that the code implementation is accurate and reasonable #34

StepNeverStop commented Jan 6, 2021 •

edited

Loading

StepNeverStop commented Jul 6, 2021 •

edited

Loading

StepNeverStop commented Aug 31, 2021 •

edited

Loading

Check that the code implementation is accurate and reasonable #34

Check that the code implementation is accurate and reasonable #34

Comments

StepNeverStop commented Jan 6, 2021 • edited Loading

StepNeverStop commented Jul 6, 2021 • edited Loading

StepNeverStop commented Aug 31, 2021 • edited Loading

StepNeverStop commented Jan 6, 2021 •

edited

Loading

StepNeverStop commented Jul 6, 2021 •

edited

Loading

StepNeverStop commented Aug 31, 2021 •

edited

Loading