Ddp2 fix #448

williamFalcon · 2019-11-01T19:20:49Z

This PR does the following:

Fixes Add a way to operate on all outputs from training_step #446
Adds a hook for modifying ddp init. Related to Support alternative distributed communication #353
adds a hook for modifying apex.

Borda · 2019-11-03T07:26:38Z

pytorch_lightning/root_module/root_module.py

+        return loss, dict with metrics for tqdm
+        :param called with batch, batch_nb
+        additional: optimizer_i if multiple optimizers used
+        :return:


what s the return? if there is none, drop this like

Borda · 2019-11-03T07:27:28Z

pytorch_lightning/root_module/root_module.py

+        Must return model.
+        :param model:
+        :param device_ids:
+        :return:


return model...

Borda · 2019-11-03T07:29:07Z

pytorch_lightning/trainer/logging_mixin.py

                output[k] = self.reduce_distributed_output(output[k], nb_gpus)

+            # do nothing when there's a scalar
+            elif isinstance(output[k], torch.Tensor) and output[k].dim() == 0:


in such case, you can skip this branching condition, right?

williamFalcon · 2019-11-05T14:28:39Z

gpu tests passed. waiting on circle CI

neggert · 2019-11-05T15:20:44Z

It's probably too late, but it occurs to me that since DDP and AMP have nothing to do with the actual research code, it might be better to pass them as callbacks or something rather than bundling them with the model. Maybe something to think about for the future.

Borda · 2019-11-06T00:25:49Z

Do we have a CircleCI? I do not see its config in the repo master... @williamFalcon

williamFalcon added 14 commits October 31, 2019 16:40

added training_end

36610b2

added training_end

4a9dd77

added training_end

57dfd81

added training_end

ed8f161

added training_end

afbdf6f

added training_end

5b6387a

added training_end

3d7b4c7

added training_end

33592de

added training_end

89ed38a

added training_end

0e91a0d

added training_end

79a139a

added training_end

092a606

allow ddp and apex to be configured

80a1fd5

allow ddp and apex to be configured

5417f67

Borda reviewed Nov 3, 2019

View reviewed changes

williamFalcon mentioned this pull request Nov 3, 2019

Add a way to operate on all outputs from training_step #446

Closed

williamFalcon added 6 commits November 3, 2019 05:50

bananas

0ba0640

bananas

8fc9da6

bananas

5277cc8

bananas

661e925

bananas

2914120

bananas

6a04ac1

williamFalcon changed the title ~~Ddp2 fix~~ [WIP] Ddp2 fix Nov 3, 2019

williamFalcon added 7 commits November 3, 2019 06:16

bananas

fddcf9d

bananas

089c709

bananas

f602c3a

bananas

4a283c0

bananas

fe150ff

bananas

c1c042f

bananas

bf214e4

williamFalcon and others added 25 commits November 5, 2019 09:02

added training_end

8d5dca0

added training_end

ebd3c3b

added training_end

7330dec

allow ddp and apex to be configured

0cfcc50

allow ddp and apex to be configured

5551586

bananas

274a6be

bananas

2260fac

bananas

d473daa

bananas

25d7351

bananas

91982f1

bananas

a916ab6

bananas

9de558a

bananas

534d68f

bananas

c40e8ce

bananas

caaca3b

bananas

b9556a3

bananas

2f046cd

bananas

d0f55fc

bananas

2e1a534

bananas

6b5d865

added eval and train for redundancy

f3e8ff7

added eval and train for redundancy

f1fcdc1

Merge branch 'master' into ddp2_fix

7de6f0e

added eval and train for redundancy

5c4f214

added eval and train for redundancy

9f11a04

williamFalcon merged commit 3e38005 into master Nov 5, 2019

williamFalcon deleted the ddp2_fix branch November 5, 2019 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ddp2 fix #448

Ddp2 fix #448

Uh oh!

williamFalcon commented Nov 1, 2019 •

edited

Loading

Uh oh!

Borda Nov 3, 2019

Uh oh!

Borda Nov 3, 2019

Uh oh!

Borda Nov 3, 2019

Uh oh!

williamFalcon commented Nov 5, 2019

Uh oh!

neggert commented Nov 5, 2019

Uh oh!

Borda commented Nov 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ddp2 fix #448

Ddp2 fix #448

Uh oh!

Conversation

williamFalcon commented Nov 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Borda Nov 3, 2019

Choose a reason for hiding this comment

Uh oh!

Borda Nov 3, 2019

Choose a reason for hiding this comment

Uh oh!

Borda Nov 3, 2019

Choose a reason for hiding this comment

Uh oh!

williamFalcon commented Nov 5, 2019

Uh oh!

neggert commented Nov 5, 2019

Uh oh!

Borda commented Nov 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

williamFalcon commented Nov 1, 2019 •

edited

Loading