Multi-gpu support in training loop #657

ehsanmok · 2020-02-21T02:21:18Z

*Issue #162

Description of changes:

Adding the multi-gpu support. Besides local test, all examples are replicable and been tested on p3.8xlarge with 4 gpus.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jaheba

Thanks!

I think if we set ctx to be a list, we might want to rename it to ctxs or something similar.

test/model/test_models_synthetic.py

src/gluonts/core/component.py

src/gluonts/dataset/loader.py

ehsanmok · 2020-02-22T00:57:08Z

@jaheba thanks for the review! all the comments have been addressed. I think it's better to avoid making incompatible names changes to mxnet so keeping the ctx name.

alexw91 · 2020-02-26T23:34:32Z

Codecov Report

Merging #657 into master will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #657   +/-   ##
=======================================
  Coverage   84.63%   84.64%           
=======================================
  Files         178      178           
  Lines       10401    10425   +24     
=======================================
+ Hits         8803     8824   +21     
- Misses       1598     1601    +3

Impacted Files	Coverage Δ
src/gluonts/core/component.py	`86.22% <0.00%> (-0.96%)`	⬇️

codecov-io · 2020-02-26T23:34:37Z

Codecov Report

Merging #657 into master will increase coverage by <.01%.
The diff coverage is 94%.

@@            Coverage Diff             @@
##           master     #657      +/-   ##
==========================================
+ Coverage   84.63%   84.64%   +<.01%     
==========================================
  Files         178      178              
  Lines       10401    10425      +24     
==========================================
+ Hits         8803     8824      +21     
- Misses       1598     1601       +3

Impacted Files	Coverage Δ
src/gluonts/trainer/_base.py	`93.79% <100%> (+0.14%)`	⬆️
src/gluonts/model/wavenet/_estimator.py	`95.95% <100%> (ø)`	⬆️
src/gluonts/model/deepar/_network.py	`98.47% <100%> (+0.06%)`	⬆️
src/gluonts/dataset/loader.py	`95.31% <100%> (ø)`	⬆️
src/gluonts/model/gp_forecaster/_network.py	`100% <100%> (ø)`	⬆️
...rc/gluonts/model/gp_forecaster/gaussian_process.py	`73.41% <100%> (ø)`	⬆️
src/gluonts/model/estimator.py	`89.55% <100%> (ø)`	⬆️
src/gluonts/support/util.py	`90.68% <100%> (+0.29%)`	⬆️
src/gluonts/model/predictor.py	`81.94% <33.33%> (ø)`	⬆️
src/gluonts/core/component.py	`86.22% <91.66%> (-0.96%)`	⬇️
... and 1 more

alexw91 · 2020-02-26T23:35:21Z

Codecov Report

Merging #657 into master will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #657   +/-   ##
=======================================
  Coverage   84.63%   84.64%           
=======================================
  Files         178      178           
  Lines       10401    10425   +24     
=======================================
+ Hits         8803     8824   +21     
- Misses       1598     1601    +3

Impacted Files	Coverage Δ
src/gluonts/core/component.py	`86.22% <0.00%> (-0.96%)`	⬇️

istorch · 2020-04-25T01:57:07Z

@ehsanmok @jaheba Sorry to just randomly comment here (this is my first comment on this project), but when can we expect to see this released? I am a new user of GluonTS and it would be nice to have multi GPU support (my specific application uses DeepAR). Thanks!

lostella · 2020-04-28T07:11:04Z

@istorch the changes look good to me, so this could be merged into master soon and be included in the next release (either this or next week)

lostella

Hey @ehsanmok, see my specific comments inline. The main concern I have is that we currently lack proper regression tests (running extensive training on some datasets), and also I'm not familiar with multiple contexts and split_and_load in mxnet, so I'll need to look into this deeper or have someone else also review this.

Could you maybe add a MWE to the original post, describing expected behavior before/after the PR?

lostella · 2020-04-28T10:16:46Z

src/gluonts/support/util.py

+        params = net_source.collect_params()
+        for p in params:
+            ctx = params[p].list_ctx()
+            break
+        # force init otherwise may not load params
+        # since not all params have the same ctx in net_source
+        net_dest.initialize(ctx=ctx, force_reinit=True)
        net_dest.load_parameters(
            model_dir_path,
-            ctx=mx.current_context(),
+            ctx=ctx,


I think this limits the way this function can act: correct me if I'm wrong, this way the code puts all the parameters of net_dest in the same context, and this context is the one of the first parameter of net_source, right?

I think the point of using mx.current_context() here is that of being able to (potentially, not sure this is done anywhere) perform training on GPU and have the predictor network use the CPU instead.

Edit: can you explain why this is needed?

If I remember, during testing, when not all params are in a single context, forcing to reinit with a list of contexts solved the issue. Please see my general comment at the end.

lostella · 2020-04-28T11:24:37Z

src/gluonts/dataset/loader.py

@@ -96,7 +96,7 @@ def __init__(
            cyclic=self.cyclic,
            is_train=self.is_train,
            batch_size=self.batch_size,
-            ctx=self.ctx,
+            ctx=self.ctx[0],


I'm not sure I understand this: one can specify a list of contexts, but then the data is loaded in the first one? But I'm really ignorant about the argument, so maybe I'm missing something.

Edit: is the idea to put everything in the first context, and then have split_and_load in the training loop take care of that?

is the idea to put everything in the first context, and then have split_and_load in the training loop take care of that?

Yes, indeed!

lostella · 2020-04-28T11:36:43Z

src/gluonts/model/deepar/_network.py

+        if F is mx.ndarray:
+            ctx = (
+                inputs.context
+                if isinstance(inputs, mx.gluon.tensor_types)
+                else inputs[0].context
+            )
+            with ctx:
+                begin_state = self.rnn.begin_state(
+                    func=F.zeros, dtype=self.dtype, batch_size=inputs.shape[0]
+                )
+        else:
+            begin_state = self.rnn.begin_state(
+                func=F.zeros, dtype=self.dtype, batch_size=0
+            )


It looks to me like this context logic should not be in the network code. Maybe it would be better to control the context once and for all from the outside somehow? Otherwise I can see how the code of every model could be filled with similar context logic, and lose readability. What do you think?

Edit: maybe one idea would be to invoke the network within with ctx for the right ctx? I guess that would be in the trainer.

Agree and in fact this has been repeated. Though I was expecting to have this in upstream mxnet to not hack this into.

istorch · 2020-05-13T02:57:11Z

@ehsanmok @lostella Sorry to bug you guys again, but has there been any progress on this PR?

ehsanmok · 2020-05-13T23:39:57Z

@lostella Thanks for your comments! in general, I think, since gluonts hasn't taken multi-gpu into considerations and has been designed for single gpu (it felt like that at least at the time of creating this PR), this PR initiations a sub-optimal solution at least with rooms to improvements of course.

As for the MVE, I've run all the existing examples and observed their improved sub-optimal speed ups which I can report later here.

ehsanmok · 2020-05-13T23:46:56Z

Hi @istorch, thanks for your patience! As @lostella pointed out earlier, this PR offers a solution and it'd be helpful to have some external validations too, so if you can, please run your application to test things us further. You can install this patch via

python -m pip install git+git://github.com/ehsanmok/gluon-ts@multi-gpu

istorch · 2020-05-14T20:17:39Z

@ehsanmok Thanks for picking this up again! I have installed your branch and passed the context in as a list, like ctx = [mx.gpu(i) for i in range(8)]. Watching nvidia-smi, I see that all 8 GPUs are utilized during training. However, when I go to generate predictions, it still only uses one GPU. Is there anything special I need to do to make it use all GPUs for prediction as well?

ehsanmok · 2020-05-14T20:35:39Z

@istorch thanks a lot for the report! I'll look into the prediction part.

istorch · 2020-05-15T02:57:05Z

@ehsanmok also, I did a little benchmarking and I'm not seeing any improvement in using multiple GPUs. With the same settings, my training ran in 7min 40s on a single GPU, and 8min 51s on 8 GPUs. I set epochs to 50, batch_size to 512, and num_batches_per_epoch to 16 (this is on a p2.8xlarge instance).

ehsanmok · 2020-05-15T16:10:27Z

@istorch The speedup is dependent of how parallelizable your model is too. To make a correct comparison, did you multiply the batch size by the number of GPUs (like if you've used 512 for single GPU, then you'd need to use 512 * 8 for your 8 GPU context)? also please keep num_batches_per_epoch the same when changing context. It'd be helpful if you could share more context about what your code is doing and what model are you training.

ehsanmok · 2020-05-15T23:26:50Z

@lostella @istorch after an investigation it turns out the adding the complete multi-gpu feature is harder than what I initially thought. This PR add the multi-gpu for training loop only. The current data loader seems to be a major bottleneck as it wasn't written for single context only and the other parts of GluonTS estimator is hard to decouple and extend. This is one more reason to consider #829.

ehsanmok requested review from jaheba and lostella February 21, 2020 02:22

jaheba reviewed Feb 21, 2020

View reviewed changes

ehsanmok requested a review from jaheba February 26, 2020 23:54

Ehsan M. Kermani added 6 commits April 27, 2020 17:29

Add multi-gpu support

912ba20

Fix multi split input issue

1fafa45

Address review comments

71af090

Fix multi ctx copy_params

8c7c9ce

Limit serde in one ctx

13b084c

Fix deepar unroll begin_state ctx

6d59d57

lostella reviewed Apr 28, 2020

View reviewed changes

Merge branch 'master' into multi-gpu

ef76f38

ehsanmok changed the title ~~Multi-gpu support~~ Multi-gpu support in training loop May 15, 2020

ehsanmok closed this Jul 16, 2021

joranE mentioned this pull request Oct 23, 2023

mxnet & gluonts minimum versions required for multiple GPU support? business-science/modeltime.gluonts#58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-gpu support in training loop #657

Multi-gpu support in training loop #657

ehsanmok commented Feb 21, 2020 •

edited

Loading

jaheba left a comment

ehsanmok commented Feb 22, 2020 •

edited

Loading

alexw91 commented Feb 26, 2020

codecov-io commented Feb 26, 2020

alexw91 commented Feb 26, 2020

istorch commented Apr 25, 2020

lostella commented Apr 28, 2020

lostella left a comment

lostella Apr 28, 2020 •

edited

Loading

ehsanmok May 13, 2020

lostella Apr 28, 2020 •

edited

Loading

ehsanmok May 13, 2020

lostella Apr 28, 2020 •

edited

Loading

ehsanmok May 13, 2020

istorch commented May 13, 2020

ehsanmok commented May 13, 2020

ehsanmok commented May 13, 2020

istorch commented May 14, 2020

ehsanmok commented May 14, 2020

istorch commented May 15, 2020

ehsanmok commented May 15, 2020 •

edited

Loading

ehsanmok commented May 15, 2020

Multi-gpu support in training loop #657

Multi-gpu support in training loop #657

Conversation

ehsanmok commented Feb 21, 2020 • edited Loading

jaheba left a comment

Choose a reason for hiding this comment

ehsanmok commented Feb 22, 2020 • edited Loading

alexw91 commented Feb 26, 2020

Codecov Report

codecov-io commented Feb 26, 2020

Codecov Report

alexw91 commented Feb 26, 2020

Codecov Report

istorch commented Apr 25, 2020

lostella commented Apr 28, 2020

lostella left a comment

Choose a reason for hiding this comment

lostella Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

ehsanmok May 13, 2020

Choose a reason for hiding this comment

lostella Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

ehsanmok May 13, 2020

Choose a reason for hiding this comment

lostella Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

ehsanmok May 13, 2020

Choose a reason for hiding this comment

istorch commented May 13, 2020

ehsanmok commented May 13, 2020

ehsanmok commented May 13, 2020

istorch commented May 14, 2020

ehsanmok commented May 14, 2020

istorch commented May 15, 2020

ehsanmok commented May 15, 2020 • edited Loading

ehsanmok commented May 15, 2020

ehsanmok commented Feb 21, 2020 •

edited

Loading

ehsanmok commented Feb 22, 2020 •

edited

Loading

lostella Apr 28, 2020 •

edited

Loading

lostella Apr 28, 2020 •

edited

Loading

lostella Apr 28, 2020 •

edited

Loading

ehsanmok commented May 15, 2020 •

edited

Loading