通用的经验表示格式，结合经验池机制优化数据流 #31

StepNeverStop · 2021-01-02T02:37:36Z

No description provided.

- rename `indexs.py` to `specs.py` - add 3 namedtuple named `ModelObservations`, `Experience` and `SingleModelInformation`

- add `BatchExperiences`, `ModelObservations`, `NamedTupleStaticClass` in `specs.py` - fix bugs of `dpg` and `ClippedNormalActionNoise` - refactor ExperienceReplay - remove redundant function

StepNeverStop · 2021-01-03T14:50:27Z

适配gym
测试LSTM是否正确开启训练
修复On-policy算法

- rename `memories` to `BATCH` for better reading - implement new type of `DataBuffer` to adapt namedtuple `BatchExperiences` - fix PPO

- fix `pg`, `a2c`, `aoc`, `ppoc`, `trpo`

…, #32)

…#31)

#32) - remove redundant calculation - add length-equal check function for on-policy algorithm storing valid data - optimize `trpo`

- support multi-vector and multi-visual input - optimize `gym` and `unity` wrapper - fix `ActorCriticValueCts` - tag 2.0.0 - add `ObsSpec` - refactor `SingleAgentEnvArgs` and `MultiAgentEnvArgs` - remove `self.s_dim`, use `self.concat_vector_dim` instead - stop using vector input normalization temporarily

…training. (#41,#25,#31) 1. change variable name from "is_lg_batch_size" to "can_sample" 2. optimized unity wrapper 3. optimized multi-agents replay buffers

1. fixed n-step replay buffer 2. reconstruct representation net 3. remove 'use_stack' 4. implement multi-agent algorithms with shared parameters 5. optimized agent network

instead. (#31, #33)

…les to torch. (#43, #31)

1. added `test.yaml` for quickly verify RLs 2. change folder name from `algos` to `algorithms` for better reading 3. removed single agent recoder, all algorithms(sarl&marl) using `SimpleMovingAverageRecoder` 4. removed `GymVectorizedType` in `common/specs.py` 5. removed `common/train/*`, and implement unified training interface in `rls/train` 6. reconstructed `make_env` function in `rls/envs/make_env` 7. optimized function `load_config` 8. moved `off_policy_buffer.yaml` to `rls/configs/buffer` 9. removed configurations like `eval_while_train`, `add_noise2buffer` etc. 10. optimized environments' configuration files 11. optimized environment wrappers and implemented unified env interface for `gym` and `unity`, see `env_base.py` 12. updated dockerfiles 13. updated README

use `Once` to control buffer that only be builded one time.

*. redefine version to v0.0.1 1. removed package `supersuit` 2. implemented class `MPIEnv` 3. implemented class `VECEnv` 4. optimized env wrappers, implemented `render` method to `gyms` environment. 5. reconstructed some of returns of `env.step` from `obs` to `obs_fa` and `obs_fs`. - `obs_fa` is used to choose action based by agent/policy. For the cross point of episode i and i+1, `obs_fa` represents $observation_{i+1}^{0}$, otherwise it keeps same with `obs_fs` which represents $observation_{i}^{t}$. - `obs_fs` is used to be stored in buffer. For the cross point of episode i and i+1, `obs_fs` represents $observation_{i}^{T}$, otherwise it keeps same with `obs_fa`. 6. optimzed `rssm` related based on mentioned `obs_fs`.

StepNeverStop self-assigned this Jan 2, 2021

StepNeverStop added the enhancement New feature or request label Jan 2, 2021

StepNeverStop added a commit that referenced this issue Jan 2, 2021

refactor(data format): use custom namedtuple to transfer data (#31)

38a7a7c

- rename `indexs.py` to `specs.py` - add 3 namedtuple named `ModelObservations`, `Experience` and `SingleModelInformation`

StepNeverStop added a commit that referenced this issue Jan 3, 2021

fix(rnn): fix data shape when using rnn training (#31)

76f7333

StepNeverStop added a commit that referenced this issue Jan 3, 2021

perf(rnn): change data dimension from 3D to 2D when using rnn (#31)

08532ef

StepNeverStop added a commit that referenced this issue Jan 4, 2021

refactor(on-policy): refactor on-policy experience mechanism (#31)

e9de81f

- rename `memories` to `BATCH` for better reading - implement new type of `DataBuffer` to adapt namedtuple `BatchExperiences` - fix PPO

StepNeverStop added a commit that referenced this issue Jan 4, 2021

fix(ICM): fix intrinsic curiosity mechanism of on-policy (#31)

d8ba3f3

StepNeverStop added a commit that referenced this issue Jan 4, 2021

fix(on-policy): fix on-policy algorithms (#31)

a810d79

- fix `pg`, `a2c`, `aoc`, `ppoc`, `trpo`

StepNeverStop added a commit that referenced this issue Jan 4, 2021

refactor(train): change default dimension of reward and done to 2D (#31)

0a54a23

StepNeverStop added a commit that referenced this issue Jan 4, 2021

refactor(train): change default dimension of reward and done to 2D (#31…

2e291ab

…, #32)

StepNeverStop added a commit that referenced this issue Jan 4, 2021

fix(gym): fix gym training (#31)

eda5daf

StepNeverStop added a commit that referenced this issue Jan 4, 2021

fix(train): fix discrete action training and vector obs normalization (…

8496a46

…#31)

StepNeverStop added a commit that referenced this issue Jan 4, 2021

perf(train): improve the tolerance of data dimension storage (#31, close

706006d

#32) - remove redundant calculation - add length-equal check function for on-policy algorithm storing valid data - optimize `trpo`

StepNeverStop added a commit that referenced this issue Jan 4, 2021

style: rename some identifiers (#31)

9bd1ad4

StepNeverStop closed this as completed in f6bd14f Jan 4, 2021

StepNeverStop reopened this Jan 5, 2021

StepNeverStop added the optimization Better performance or solution label Jan 6, 2021

StepNeverStop added a commit that referenced this issue Jul 6, 2021

perf(data): remove namedtuple and NamedTuple, use dataclasses.@DataClass

a0e5be7

instead. (#31, #33)

StepNeverStop added a commit that referenced this issue Jul 11, 2021

Chang RLs from TF2 to PyTorch, replace almost 90% function and variab…

f325a21

…les to torch. (#43, #31)

StepNeverStop added a commit that referenced this issue Jul 11, 2021

fix&perf: added function wrapper 'iTensor_oNumpy'. (#43, #34, #31)

b1e6dc4

StepNeverStop added a commit that referenced this issue Aug 25, 2021

perf: reconstruct repo(#47, #25, #46, #34, #31, #33, #39, #41, #45, #26)

67b8979

StepNeverStop added a commit that referenced this issue Sep 3, 2021

perf(buffer): optimized replay buffer. (#31)

6081ec0

use `Once` to control buffer that only be builded one time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

通用的经验表示格式，结合经验池机制优化数据流 #31

通用的经验表示格式，结合经验池机制优化数据流 #31

StepNeverStop commented Jan 2, 2021

StepNeverStop commented Jan 3, 2021

通用的经验表示格式，结合经验池机制优化数据流 #31

通用的经验表示格式，结合经验池机制优化数据流 #31

Comments

StepNeverStop commented Jan 2, 2021

StepNeverStop commented Jan 3, 2021