-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distributed training robust cases into fluid benchmark test #11206
Comments
when this is done. I recommend adding more model variations to the suite |
After communicating with @typhoonzero , I divide the table above into two parts, CI part and CE part. The features in CE part should be added into test cases in ce-latest-kpi, and implemented in fluid benchmark, features in CE part include:
The features in CI part should be covered by unit test, as we listed in #11213, the features in CI part include:
|
The test-case-pairs in CE part was construct with AllPairs as following: import os
import metacomm.combinatorics.all_pairs2
all_pairs = metacomm.combinatorics.all_pairs2.all_pairs2
def generate_aws_pserver_cmd(i, v):
args = []
for arg in v:
if arg:
args.append(arg)
trainer_command = ','.join(args)
pserver_command = trainer_command
print trainer_command
aws_parameters = [ [ "gpus:2", "gpus:1" ] # If use ParallelExecutor or NOT
, [ "", "no_split_var:"] # If split variables into blocks or NOT
, [ "", "async_mode:" ] # if ASGD
, [ "device:GPU", "device:CPU" ] # if train with GPU
, [ "model:resnet", "model:machine_translation" ] # models
, [ "update_method:pserver", "update_method:nccl2" ] # parameters update method
]
aws_pairwise = all_pairs(aws_parameters)
print "PAIRWISE:"
for i, v in enumerate(aws_pairwise):
print("%i:" % i)
generate_aws_pserver_cmd(i, v) And 7 cases was contructted as follows:
|
分布式CE需要校验的模型
另外在训练的规模上需要考虑:多机单CPU、多机单GPU、多机多CPU、多机多GPU |
分布式CE需要校验的模型
另外在训练的规模上需要考虑:多机单CPU、多机单GPU、多机多CPU、多机多GPU |
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! |
the PaddlePaddle robust feature combination includes:
I construct the test-case-pairs with AllPair as following:
The text was updated successfully, but these errors were encountered: