【paddle.fleet】add auto parallel L1 implementations #27090

guru4elephant · 2020-09-06T04:08:50Z

PR types

New features

PR changes

APIs

Describe

make distributed strategy easy to config with the following example

import paddle
import paddle.distributed.fleet as fleet
fleet.init(is_collective=True)
input_x = paddle.fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = paddle.fluid.layers.data(name="y", shape=[1], dtype='int64')

fc_1 = paddle.fluid.layers.fc(input=input_x, size=64, act='tanh')
fc_2 = paddle.fluid.layers.fc(input=fc_1, size=64, act='tanh')
prediction = paddle.fluid.layers.fc(input=[fc_2], size=2, act='softmax')
cost = paddle.fluid.layers.cross_entropy(input=prediction, label=input_y)
avg_cost = paddle.fluid.layers.mean(x=cost)

strategy = paddle.distributed.fleet.DistributedStrategy()
strategy.auto = True
optimizer = paddle.fluid.optimizer.SGD(learning_rate=0.01)
optimizer = fleet.distributed_optimizer(optimizer, strategy=strategy)
optimizer.minimize(avg_cost)

test=develop

paddle-bot-old · 2020-09-06T04:08:55Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot-old · 2020-09-06T04:08:57Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

test=develop

gavin1332 · 2020-09-07T02:07:16Z

python/paddle/distributed/fleet/base/distributed_strategy.py

+    return __impl__
+
+
+is_strict_auto = wrap_decorator(__non_auto_func_called__)


This keyword is a little hard to be understood. Could we name it 'reset_auto_flag' as what it exactly does?

this is just a check, whether a user is using strictly auto configuration. Here is a strictly auto configuration.

strategy = DistributedStrategy() strategy.auto = True

Here is a case that is not a strict auto configuration.

strategy = DistributedStrategy() strategy.amp = True strategy.auto = True

gavin1332

I think the global auto flag should not be shared with other strategies. For it is hard to be controlled. E.g. two strategies have auto capacity both, and one requires auto but the other not.

So I suggest to assign auto the highest priority, which shields other strategies . And if one strategy could be 'auto' separately, it should has its own auto configuration.

gavin1332 · 2020-09-07T02:19:47Z

It is not good to see the the is_strict_auto decorator broadcast everywhere.

wangxicoding · 2020-09-07T03:11:19Z

python/paddle/distributed/fleet/meta_optimizers/dgc_optimizer.py

@@ -69,6 +69,10 @@ def _disable_strategy(self, dist_strategy):
        dist_strategy.dgc = False
        dist_strategy.dgc_configs = {}

+    def _enable_strategy(self, dist_strategy):
+        dist_strategy.dgc = True


Will auto parallel open all the options that can be turned on? I think some options should not be turned on by default, otherwise accuracy may be lost.
For example, DGC may require warm-up which needs to be set by users, otherwise accuracy may be lost. Auto parallel needs to ensure accuracy without loss.

depends on the network condition. I think current dgc is not intelligent that we should improve to remove the warmup thing. The enable_strategy interface does not require all meta optimizers to be valid.

guru4elephant · 2020-09-07T04:07:50Z

I think the global auto flag should not be shared with other strategies. For it is hard to be controlled. E.g. two strategies have auto capacity both, and one requires auto but the other not.

So I suggest to assign auto the highest priority, which shields other strategies . And if one strategy could be 'auto' separately, it should has its own auto configuration.

The aim of auto is to make the interface easy to learn and use, not to show we can do different strategies automatically.
It is recommended to remove user defined configurations for each meta optimizer, otherwise meta optimizers should not be opened if user defined configurations exist.

guru4elephant · 2020-09-07T06:21:31Z

It is not good to see the the is_strict_auto decorator broadcast everywhere.

if you set localsgd with auto, the strict auto will not be valid.

strategy = DistributedStrategy()
strategy.localsgd = True
strategy.auto = True

gavin1332

LGTM

wangxicoding

LGTM

add auto parallel L1 implementation

fbfb89a

test=develop

guru4elephant requested review from wangxicoding, seiriosPlus, mapingshuo, gavin1332 and raindrops2sea September 6, 2020 04:10

guru4elephant changed the title ~~【paddle.fleet】add auto parallel L1 implementation~~ 【paddle.fleet】add auto parallel L1 implementations Sep 6, 2020

guru4elephant added 7 commits September 6, 2020 12:12

add test_fleet_auto.py

2505585

test=develop

update ut

dc3acb9

test=develop

fix decorator problem

1a36ae0

test=develop

remove two options from lars_config

33792cc

test=develop

solve the copy problem

304ca5d

test=develop

remove sync nccl all reduce flags setting in windows and mac

99eefa9

test=develop

remove test_fleet_auto from mac

29c502b

test=develop

guru4elephant requested a review from kolinwei September 6, 2020 12:01

kolinwei approved these changes Sep 6, 2020

View reviewed changes

gavin1332 reviewed Sep 7, 2020

View reviewed changes

wangxicoding reviewed Sep 7, 2020

View reviewed changes

gavin1332 approved these changes Sep 7, 2020

View reviewed changes

wangxicoding approved these changes Sep 7, 2020

View reviewed changes

guru4elephant merged commit 0443b48 into PaddlePaddle:develop Sep 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【paddle.fleet】add auto parallel L1 implementations #27090

【paddle.fleet】add auto parallel L1 implementations #27090

guru4elephant commented Sep 6, 2020 •

edited

Loading

paddle-bot-old bot commented Sep 6, 2020

paddle-bot-old bot commented Sep 6, 2020 •

edited

Loading

gavin1332 Sep 7, 2020

guru4elephant Sep 7, 2020

gavin1332 left a comment

gavin1332 commented Sep 7, 2020

wangxicoding Sep 7, 2020

guru4elephant Sep 7, 2020

guru4elephant commented Sep 7, 2020

guru4elephant commented Sep 7, 2020

gavin1332 left a comment

wangxicoding left a comment

		return __impl__


		is_strict_auto = wrap_decorator(__non_auto_func_called__)

【paddle.fleet】add auto parallel L1 implementations #27090

【paddle.fleet】add auto parallel L1 implementations #27090

Conversation

guru4elephant commented Sep 6, 2020 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Sep 6, 2020

paddle-bot-old bot commented Sep 6, 2020 • edited Loading

gavin1332 Sep 7, 2020

Choose a reason for hiding this comment

guru4elephant Sep 7, 2020

Choose a reason for hiding this comment

gavin1332 left a comment

Choose a reason for hiding this comment

gavin1332 commented Sep 7, 2020

wangxicoding Sep 7, 2020

Choose a reason for hiding this comment

guru4elephant Sep 7, 2020

Choose a reason for hiding this comment

guru4elephant commented Sep 7, 2020

guru4elephant commented Sep 7, 2020

gavin1332 left a comment

Choose a reason for hiding this comment

wangxicoding left a comment

Choose a reason for hiding this comment

guru4elephant commented Sep 6, 2020 •

edited

Loading

paddle-bot-old bot commented Sep 6, 2020 •

edited

Loading