[AutoTuner] Add auto tuner to obtain optima configuration #54460

Caozhou1995 · 2023-06-08T06:59:25Z

PR types

New features

PR changes

Others

Description

Pcard-72023

The optimal configuration for large model with distributed training/inference often requires designing multiple sets of experiments based on experiences (network, parameter size, gpu memory or flops, etc.), and comparing the results to determine the optimal configuration. This process heavily relies on human experience, and the determined optimal configuration may not be the global optimal configuration. When any condition changes, the above process needs to be repeated repeatedly, resulting in poor usability of large models.

To address the above issues, we have implemented AutoTuner based on Profiling, with the main modules as follows:

Provide clear json configuration for users to directly use AutoTuner, avoiding additional coding work for users
launch multi tasks one by one and automatically schedule and monitor.
Implement search module and pruning module, support multiple search algorithms and pruning strategies.

At present, we have built-in grid search support for 8 dimensions, including dp degree, mp degree, pp degree, mbs, sharding degree, sharding stage, recompute, and recompute granularity. The example JSON is as follows:

The usage is as follows:
python -m paddle.distributed.launch --devices "0,1,2,3,4,5,6,7" --auto_tuner_json=test.json your_train.py your_args

NOTE: Since the auto_tuner is non-invasive, users need to expose args in their script to enable the configuration generated by auto_tuner be executed.

paddle-bot · 2023-06-08T06:59:30Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

XieYunshen

LGTM for set_tests_properties(test_auto_tuner PROPERTIES LABELS "RUN_TYPE=EXCLUSIVE" TIMEOUT 100)

zhiqiu

LGTM, u can refine code with the comments in the next pr.

zhiqiu · 2023-06-14T06:33:34Z

test/auto_parallel/test_auto_tuner.py

+
+        process = subprocess.Popen(cmd)
+        process.wait()
+        self.assertEqual(process.returncode, 0)


Check the config searched?

zhiqiu · 2023-06-14T08:27:06Z

python/paddle/distributed/launch/main.py

+        import copy
+        import json
+        import signal
+        import sys
+        import time
+
+        from ..auto_tuner.tuner import AutoTuner
+        from ..auto_tuner.utils import gen_new_args
+        from . import controllers


Better import at the top of the file

zhiqiu · 2023-06-14T08:30:31Z

python/paddle/distributed/launch/main.py

+        cur_cfg = auto_tuner.search_once()
+
+        # get max time per task run
+        max_time_per_task = tuner_cfg.get("max_time_per_task", 1800)


max_time_per_task -> max_time_in_seconds_per_task?

zhiqiu · 2023-06-14T08:42:21Z

python/paddle/distributed/auto_tuner/tuner.py

+
+    def __init__(self, tuner_cfg):
+        self.cur_task_id = 1
+        self.task_limit = tuner_cfg.get("task_limit", 100)


DEFAULT_MAX_TASK_LIMIT = 100 ?

add auto tuner

2e96268

Caozhou1995 added 6 commits June 8, 2023 07:55

fix prune

34ff765

fix sharding prune and mbs candidates

a89f044

fix cfg

a73d1db

fix launch

e487c76

fix launch

86c6ee6

add unittest

6d47943

Caozhou1995 force-pushed the auto_configurator branch from 90ea799 to 6d47943 Compare June 12, 2023 08:39

fix code style

ad767cf

XieYunshen approved these changes Jun 14, 2023

View reviewed changes

zhiqiu approved these changes Jun 14, 2023

View reviewed changes

zhiqiu merged commit e12d286 into PaddlePaddle:develop Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoTuner] Add auto tuner to obtain optima configuration #54460

[AutoTuner] Add auto tuner to obtain optima configuration #54460

Caozhou1995 commented Jun 8, 2023 •

edited

Loading

paddle-bot bot commented Jun 8, 2023

XieYunshen left a comment

zhiqiu left a comment

zhiqiu Jun 14, 2023

zhiqiu Jun 14, 2023

zhiqiu Jun 14, 2023

zhiqiu Jun 14, 2023

[AutoTuner] Add auto tuner to obtain optima configuration #54460

[AutoTuner] Add auto tuner to obtain optima configuration #54460

Conversation

Caozhou1995 commented Jun 8, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jun 8, 2023

XieYunshen left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

zhiqiu Jun 14, 2023

Choose a reason for hiding this comment

zhiqiu Jun 14, 2023

Choose a reason for hiding this comment

zhiqiu Jun 14, 2023

Choose a reason for hiding this comment

zhiqiu Jun 14, 2023

Choose a reason for hiding this comment

Caozhou1995 commented Jun 8, 2023 •

edited

Loading