Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【paddle.fleet】add auto parallel L1 implementations #27090

Merged

Conversation

guru4elephant
Copy link
Member

@guru4elephant guru4elephant commented Sep 6, 2020

PR types

New features

PR changes

APIs

Describe

make distributed strategy easy to config with the following example

import paddle
import paddle.distributed.fleet as fleet
fleet.init(is_collective=True)
input_x = paddle.fluid.layers.data(name="x", shape=[32], dtype='float32')
input_y = paddle.fluid.layers.data(name="y", shape=[1], dtype='int64')

fc_1 = paddle.fluid.layers.fc(input=input_x, size=64, act='tanh')
fc_2 = paddle.fluid.layers.fc(input=fc_1, size=64, act='tanh')
prediction = paddle.fluid.layers.fc(input=[fc_2], size=2, act='softmax')
cost = paddle.fluid.layers.cross_entropy(input=prediction, label=input_y)
avg_cost = paddle.fluid.layers.mean(x=cost)

strategy = paddle.distributed.fleet.DistributedStrategy()
strategy.auto = True
optimizer = paddle.fluid.optimizer.SGD(learning_rate=0.01)
optimizer = fleet.distributed_optimizer(optimizer, strategy=strategy)
optimizer.minimize(avg_cost)

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 6, 2020

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 6, 2020

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@guru4elephant guru4elephant changed the title 【paddle.fleet】add auto parallel L1 implementation 【paddle.fleet】add auto parallel L1 implementations Sep 6, 2020
return __impl__


is_strict_auto = wrap_decorator(__non_auto_func_called__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This keyword is a little hard to be understood. Could we name it 'reset_auto_flag' as what it exactly does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a check, whether a user is using strictly auto configuration. Here is a strictly auto configuration.

strategy = DistributedStrategy()
strategy.auto = True

Here is a case that is not a strict auto configuration.

strategy = DistributedStrategy()
strategy.amp = True
strategy.auto = True

Copy link
Collaborator

@gavin1332 gavin1332 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the global auto flag should not be shared with other strategies. For it is hard to be controlled. E.g. two strategies have auto capacity both, and one requires auto but the other not.

So I suggest to assign auto the highest priority, which shields other strategies . And if one strategy could be 'auto' separately, it should has its own auto configuration.

@gavin1332
Copy link
Collaborator

It is not good to see the the is_strict_auto decorator broadcast everywhere.

@@ -69,6 +69,10 @@ def _disable_strategy(self, dist_strategy):
dist_strategy.dgc = False
dist_strategy.dgc_configs = {}

def _enable_strategy(self, dist_strategy):
dist_strategy.dgc = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will auto parallel open all the options that can be turned on? I think some options should not be turned on by default, otherwise accuracy may be lost.
For example, DGC may require warm-up which needs to be set by users, otherwise accuracy may be lost. Auto parallel needs to ensure accuracy without loss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on the network condition. I think current dgc is not intelligent that we should improve to remove the warmup thing. The enable_strategy interface does not require all meta optimizers to be valid.

@guru4elephant
Copy link
Member Author

I think the global auto flag should not be shared with other strategies. For it is hard to be controlled. E.g. two strategies have auto capacity both, and one requires auto but the other not.

So I suggest to assign auto the highest priority, which shields other strategies . And if one strategy could be 'auto' separately, it should has its own auto configuration.

The aim of auto is to make the interface easy to learn and use, not to show we can do different strategies automatically.
It is recommended to remove user defined configurations for each meta optimizer, otherwise meta optimizers should not be opened if user defined configurations exist.

@guru4elephant
Copy link
Member Author

It is not good to see the the is_strict_auto decorator broadcast everywhere.

if you set localsgd with auto, the strict auto will not be valid.

strategy = DistributedStrategy()
strategy.localsgd = True
strategy.auto = True

Copy link
Collaborator

@gavin1332 gavin1332 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wangxicoding wangxicoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guru4elephant guru4elephant merged commit 0443b48 into PaddlePaddle:develop Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants