Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【paddle.fleet】refine launch and distributed repr string for print #27093

Merged
merged 5 commits into from
Sep 9, 2020

Conversation

guru4elephant
Copy link
Member

@guru4elephant guru4elephant commented Sep 6, 2020

PR types

Function optimization

PR changes

Others

Describe

In Paddle 2.0, paddle.distributed.fleet.DistributedStrategy can be serialized into protobuf, but it is still not in pretty format for printing. This PR mainly optimizes the log format of DistributedStrategy. Log samples are listed below

fleetrun log sample:

    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:25538               |
    |                     PADDLE_TRAINERS_NUM                        4                      |
    |                     FLAGS_selected_gpus                        4                      |                              |                PADDLE_TRAINER_ENDPOINTS  ... 0.1:49874,127.0.0.1:15274,127.0.0.1:58230|
    |                       PADDLE_TRAINER_ID                        0                      |
    +=======================================================================================+

DistributedStrategy log sample:

    +==============================================================================+                        
    |                                                                              |
    |                         DistributedStrategy Overview                         |
    |                                                                              |
    +==============================================================================+
    |                     amp = True, please check amp_configs                     |
    +------------------------------------------------------------------------------+
    |                     init_loss_scaling                 32768.0                |
    |                    incr_every_n_steps                   1000                 |
    |               decr_every_n_nan_or_inf                    2                   |
    |                            incr_ratio                   2.0                  |
    |                            decr_ratio              0.800000011921            |
    |              use_dynamic_loss_scaling                   True                 |
    +==============================================================================+
    |               recompute = True, please check recompute_configs               |
    +------------------------------------------------------------------------------+
    |                           checkpoints              pool2d_0.tmp_0            |
    |                                               res2a.add.output.5.tmp_1       |
    |                                               res2b.add.output.5.tmp_1       |
    |                                               res2c.add.output.5.tmp_1       |
    |                                               res3a.add.output.5.tmp_1       |
    |                                               res3b.add.output.5.tmp_1       |
    |                                               res3c.add.output.5.tmp_1       |
    |                                               res3d.add.output.5.tmp_1       |
    |                                               res4a.add.output.5.tmp_1       |
    |                                               res4b.add.output.5.tmp_1       |
    |                                               res4c.add.output.5.tmp_1       |
    |                                               res4d.add.output.5.tmp_1       |
    |                                               res4e.add.output.5.tmp_1       |
    |                                               res4f.add.output.5.tmp_1       |
    |                                               res5a.add.output.5.tmp_1       |
    |                                               res5b.add.output.5.tmp_1       |
    |                                               res5c.add.output.5.tmp_1       |
    |                                                    pool2d_1.tmp_0       |
    |                                                      fc_0.tmp_1              |                      
    +==============================================================================+
    |                  a_sync = True, please check a_sync_configs                  |
    +------------------------------------------------------------------------------+
    |                               k_steps                    -1                  |
    |                     max_merge_var_num                    1                   |
    |                       send_queue_size                    16                  |
    |               independent_recv_thread                  False                 |
    |         min_send_grad_num_before_recv                    1                   |
    |                      thread_pool_size                    1                   |
    |                       send_wait_times                    1                   |
    |               runtime_split_send_recv                  False                 |
    +==============================================================================+
    |                    Environment Flags, Communication Flags                    |
    +------------------------------------------------------------------------------+
    |                                  mode                    1                   |
    |                               elastic                  False                 |
    |                                  auto                  False                 |
    |                   sync_nccl_allreduce                   True                 |
    |                         nccl_comm_num                    1                   |
    |            use_hierarchical_allreduce                  False                 |
    |   hierarchical_allreduce_inter_nranks                    1                   |
    |                       sync_batch_norm                  False                 |
    |                   fuse_all_reduce_ops                   True                 |
    |                  fuse_grad_size_in_MB                    32                  |
    |              fuse_grad_size_in_TFLOPS                   50.0                 |
    |               cudnn_exhaustive_search                   True                 |
    |             conv_workspace_size_limit                   4000                 |
    |    cudnn_batchnorm_spatial_persistent                   True                 |
    +==============================================================================+
    |                                Build Strategy                                |
    +------------------------------------------------------------------------------+
    |           enable_sequential_execution                  False                 |
    |              fuse_elewise_add_act_ops                  False                 |
    |                       fuse_bn_act_ops                  False                 |
    |              fuse_relu_depthwise_conv                  False                 |
    |                    fuse_broadcast_ops                  False                 |
    |                fuse_all_optimizer_ops                  False                 |
    |                        enable_inplace                  False                 |
    |     enable_backward_optimizer_op_deps                   True                 |
    |                 cache_runtime_context                  False                 |
    +==============================================================================+
    |                              Execution Strategy                              |
    +------------------------------------------------------------------------------+
    |                           num_threads                    1                   |
    |          num_iteration_per_drop_scope                    10                  |
    |                 num_iteration_per_run                    1                   |
    |                    use_thread_barrier                  False                 |
    +==============================================================================+

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 6, 2020

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

danleifeng
danleifeng previously approved these changes Sep 6, 2020
@guru4elephant guru4elephant changed the title refine launch and distributed repr string for print 【paddle.fleet】refine launch and distributed repr string for print Sep 7, 2020
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Sep 8, 2020
@PaddlePaddle PaddlePaddle unlocked this conversation Sep 8, 2020
@guru4elephant guru4elephant merged commit f7d08b7 into PaddlePaddle:develop Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants