Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A major (performance) update on the submodule: srl-zoo; fixes several issues #50

Closed
wants to merge 79 commits into from

Conversation

ncble
Copy link
Collaborator

@ncble ncble commented Jun 13, 2019

Since the srl_zoo is less popular than robotic-rl-srl, I decided to post the changelog here:

Highlights

  1. 2~5 times speed-up (overall) (srl_zoo) compared to the current version of origin/master
  2. Better SRL training mechanism (more intuitive, better modularity) that support sophisticated update (e.g. GAN)
  3. Add GAN to srl_zoo
  4. Scalable SRL models (support any image shape, see the detail below)
  5. ~9 times faster DataLoader which is also simpler, since it's natively supported by pytorch.
  6. Remove several redundant codes to speed-up training.
  7. Fixes several issues of (robotics-rl_srl):
  8. [NEW 19/7/2019] 2~7 times speed-up of RL training (for 2D environments) compared to the current version of origin/master
  9. [NEW 19/7/2019] Add new environment Labyrinth and MobileRobotX which can run at speed 36,000 FPS (frame-per-second) on 10 threads of Intel CPU i9-9900K (image resolution 128², 20,000 FPS for 224²). Compared to previous MobileRobotGymEnv-v0, it only runs at speed 800 FPS.

Fixes #41
Fixes #42
Fixes #43
Fixes #46
Fixes #47
Fixes #48
Fixes #49
Fixes #51
Fixes #53
Fixes #54

Note: Before, we need to modify 8 scripts in order to add one new model, now only two scripts (at most three): models/modules.py and models/my_custom_model.py (see the template models/new_model_template.py)

Changelog

SRL part

  • Support any image resolution for the entire toolbox (--img-shape="(3,128,128)"), including the DataLoader, Environment, SRL models. Before, the SRL models are not scalable with respect to image shape, and it's not sufficient to modify only the input shape (e.g. need to manually calculate each layers' shape, size, etc). Now, all models function more like keras.

  • Support adversarial state representation learning. (e.g. GAN)

  • [New scripts] models/base_trainer.py, models/new_model_template.py, models/gan.py

    • base_trainer.py: new trainig pipeline for better modularity.
    • new_model_template.py: a simple example of "how to add new model to srl_zoo".
    • gan.py: adversarial state representation learning.
  • Better (simpler, ~10 times faster) plots

  • Support new monitor mode (--monitor) "loss" (before, there is only "pbar" progressbar): monitor losses during training, calculate GTC per epoch.

  • Support control of number of CPU for dataloader (--num-worker).

  • Support "anytime training": load the previous trained SRL model weights to continue the training --srl-pre-weights (weights path)

  • Change validation mechanism to the classic one (i.e. within one epoch: train then valid). Before, we alternate between training and validation mode at batch level.

  • Support specific GPU number. (by --gpu-num=0, --gpu_num1, etc)

  • [Remove]: preprocessing/preprocess.py (it's useless), models/custom_layers.py

  • [Rename] the models/models.py is renamed to models/base_models.py, since it's more intuitive for the outsider. Currently, there are several confusing names "custom_layers.py", "modules.py", "learner.py" "models.py

RL part

  • support any image shape.
  • support specific GPU number. (by --gpu-num=0, --gpu_num1, etc)
  • support --srl-model-path indicate the SRL model weights path. Before, we can only load either the latest (by calling --latest) or manually change the config/srl_model.yaml model weights path.
  • register new srl models
  • Add new environments (extremely fast, about 20 times faster than all current environment): Labyrinth-v0 and MobileRobotX-v0 (enable interactive play) which can run at speed 36,000 FPS (frame-per-second) on 10 threads of Intel CPU i9-9900K (image resolution 128², 20,000 FPS for 224²). Compared to previous MobileRobotGymEnv-v0, it only runs at speed 800 FPS.
  • 2~7 times speed-up of RL training (for 2D environments)
  • fix issues:
    • image rotated by 90 degrees
    • --log-folder folder doesn't exist.
    • several issues in environments
  • [New] replay/plot_pipeline.py: aggregate all losses and plot on one figure. (draft code to be refined, merged with replay/aggregate_plots.py, compare_plots.py, gather_results.py)

@ncble ncble changed the title A major (performance) update on the submodule: srl-zoo; fix several issues #41~43, #46~49. A major (performance) update on the submodule: srl-zoo; fixes several issues Jul 16, 2019
@ncble
Copy link
Collaborator Author

ncble commented Aug 9, 2019

The original version (master) of the whole Toolbox has 18026 lines of code (including srl_zoo). My pull-request has already added and removed +5886/-249, +4145/-1453 (total +10031/-1702 ?) lines of code, thus more than 40% lines of code have been modified. There is no need to be merged to the master branch.

@ncble ncble closed this Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment