Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A major (performance) update on the submodule: srl-zoo; fix several issues #41~43, #46~49. #44

Closed
wants to merge 36 commits into from

Conversation

ncble
Copy link
Collaborator

@ncble ncble commented May 21, 2019

Fix the following issues:

@ncble ncble requested a review from araffin May 21, 2019 09:21
@araffin araffin requested a review from kalifou May 21, 2019 09:28
Copy link
Collaborator

@kalifou kalifou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ncble thanks for pointing out bugs
As the repo as been update soon after your PR,
please merge again, also fix 42 should be a bit adapted, however fix 43 seems to be good.
For fix 41 I will take a look asap.
Regarding sub-modules, please consider another PR

@@ -1,4 +1,4 @@
[submodule "srl_zoo"]
path = srl_zoo
branch = master
url = https://github.com/araffin/srl-zoo.git
url = https://github.com/ncble/srl-zoo.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, If you want to add additional fixes in srl-zoo submodule,
please do so with a PR in https://github.com/araffin/srl-zoo/.
This url shall not be changed.

@@ -82,7 +83,7 @@ def env_thread(args, thread_num, partition=True, use_ppo2=False):
(thread_num if thread_num <= args.num_episode % args.num_cpu else args.num_episode % args.num_cpu)

env.seed(seed)
prng.seed(seed) # this is for the sample() function from gym.space
# prng.seed(seed) # this is for the sample() function from gym.space
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the repository as been update this morning,
we now use gym==0.11.0 and already removed line 85.
Since ValueError: Seed must be between 0 and 2**32 - 1is still remaining,
the suggested fix would be appreciated (seed = seed % 2^32).
Thanks @ncble

@kalifou kalifou closed this May 21, 2019
@kalifou kalifou reopened this May 21, 2019
@ncble
Copy link
Collaborator Author

ncble commented Jun 11, 2019

Since the srl_zoo is less popular than robotic-rl-srl, I decided to post the changelog here:

Highlights

  1. 2~5 times speed-up (overall) (srl_zoo) compared to the current version of origin/master
  2. Better SRL training mechanism (more intuitive, better modularity) that support sophisticated update (e.g. GAN)
  3. Add GAN to srl_zoo
  4. Scalable SRL models (support any image shape, see the detail below)
  5. ~9 times faster DataLoader which is also simpler, since it's natively supported by pytorch.
  6. Remove several redundant codes to speed-up training.
  7. Fix several issues of (robotics-rl_srl): [bug report] Images rotated by 90 degrees  #41, [bug report] Seed must be between 0 and 2**32 - 1; gym version too old #42, [bug report] sac, ddpg, deepq: train.py: error: unrecognized arguments: --policy #43, [feature improving request (srl_zoo)] Around 30% speed-up by changing few lines code #46, [bug report (unsolved)/question] SRL training slows down per epoch (Memory leak?). #47, [bug report] SRL training mechanism is wrong: validation set is used to update weights during training. #48, [bug report] GTC seems incorrect. #49...

Note: Before, we need to modify 8 scripts in order to add one new model, now only two scripts (at most three): models/modules.py and models/my_custom_model.py (see the template models/new_model_template.py)

Changelog

SRL part

  • Support any image resolution for the entire toolbox (--img-shape="(3,128,128)"), including the DataLoader, Environment, SRL models. Before, the SRL models are not scalable with respect to image shape, and it's not sufficient to modify only the input shape (e.g. need to manually calculate each layers' shape, size, etc). Now, all models function more like keras.

  • Support adversarial state representation learning. (e.g. GAN)

  • [New scripts] models/base_trainer.py, models/new_model_template.py, models/gan.py

    • base_trainer.py: new trainig pipeline for better modularity.
    • new_model_template.py: a simple example of "how to add new model to srl_zoo".
    • gan.py: adversarial state representation learning.
  • Better (simpler, ~10 times faster) plots

  • Support new monitor mode (--monitor) "loss" (before, there is only "pbar" progressbar): monitor losses during training, calculate GTC per epoch.

  • Support control of number of CPU for dataloader (--num-worker).

  • Support "anytime training": load the previous trained SRL model weights to continue the training --srl-pre-weights (weights path)

  • Change validation mechanism to the classic one (i.e. within one epoch: train then valid). Before, we alternate between training and validation mode at batch level.

  • Support specific GPU number. (by --gpu-num=0, --gpu_num1, etc)

  • [Remove]: preprocessing/preprocess.py (it's useless), models/custom_layers.py

  • [Rename] the models/models.py is renamed to models/base_models.py, since it's more intuitive for the outsider. Currently, there are several confusing names "custom_layers.py", "modules.py", "learner.py" "models.py

RL part

  • support any image shape.

  • support specific GPU number. (by --gpu-num=0, --gpu_num1, etc)

  • support --srl-model-path indicate the SRL model weights path. Before, we can only load either the latest (by calling --latest) or manually change the config/srl_model.yaml model weights path.

  • register new srl models

  • fix issues:

    • image rotated by 90 degrees
    • --log-folder folder doesn't exist.
  • [New] replay/plot_pipeline.py: aggregate all losses and plot on one figure. (draft code to be refined, merged with replay/aggregate_plots.py, compare_plots.py, gather_results.py)

@ncble ncble changed the title Fix issues #41 #42 #43 A major (performance) update on the submodule: srl-zoo; fix several issues #41~43, #46~49. Jun 11, 2019
@araffin
Copy link
Owner

araffin commented Jun 12, 2019

Quick question: why didn't you update both changelog? (I mean the .rst files) (and I'm ok if you duplicate the srl zoo changelog to include it in the rl toolbox)

@ncble
Copy link
Collaborator Author

ncble commented Jun 13, 2019

@araffin Yes, once the code pass review, I will update then to the .rst files ;)


Since there was a pull request (with OmniRobot) before mine, and I merged them directly to my branch. Now the code comparison has become a little messy, thus I will close this one and do a clean pull request again.

@ncble ncble closed this Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants