add vgg16_aws_dist #33

putcn · 2018-06-04T18:55:33Z

add vgg16_aws_dist.
this test will be performed in aws utilizing aws_runner to dynamically allocating instances, network resources, collect metrics data and save results to CE for further analytics.
Testing models are using benchmar/fluid. this case is running vgg16 dist train test, but it's very easy to add more round of tests with other models and cluster configs. these configs can be updated in run.xsh and continuous_evaluation.py

Please note: nccl2 is not yet supported, will add it in next PR

Superjomn · 2018-06-05T06:32:03Z

看了代码格式，貌似没有跑 pre-commit ?

putcn · 2018-06-05T06:33:29Z

跑了啊,

paddle-ce-latest-kpis git:(vgg16_aws_dist) ✗ pre-commit run -a
yapf.....................................................................Passed
Check for added large files..............................................Passed
Check for merge conflicts................................................Passed
Check for broken symlinks................................................Passed
Fix End of Files.........................................................Passed

Superjomn · 2018-06-05T06:34:27Z

vgg16_aws_dist/ce_runner.py

+            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
+            writer.writeheader()
+            for row in rows:
+                writer.writerow(row)


add an empty line here.

Yancey1989

Do we have a proposal to reference the fluid_benchmark.py and kube_gen_job.py from Paddle the repo https://github.com/PaddlePaddle/Paddle/tree/develop/benchmark/fluid, if so that we don't need to update them twice.

putcn · 2018-06-06T03:15:37Z

@Yancey1989 currently we can't, because the metric retrieve part is different, we have to output specific format of metrics so that aws_runner can identify and catch these values.

Yancey1989

LGTM++ with some tiny comment.

Yancey1989 · 2018-06-06T03:27:41Z

vgg16_aws_dist/run.xsh

+    --online_mode yes \
+    --pserver_command $training_command \
+    --trainer_command $training_command \
+    --docker_image $fluid_benchmark_dockerhub_tag


Seems need a blank line at the end of the file.

sure, will update

Yancey1989 · 2018-06-06T03:31:00Z

vgg16_aws_dist/speedup_vgg_16_1_1_0_factor.txt

@@ -0,0 +1 @@
+[1.0]


Can we remove these KPI result files?

no, CE have to have initial KPI data

Yancey1989

LGTM again.

add vgg16_aws_dist

ec08436

putcn requested review from Superjomn and guochaorong June 4, 2018 18:55

putcn mentioned this pull request Jun 4, 2018

Add vgg16 aws dist test #27

Closed

Superjomn reviewed Jun 5, 2018

View reviewed changes

run through pre-commit

80bebe5

Yancey1989 reviewed Jun 6, 2018

View reviewed changes

Yancey1989 previously approved these changes Jun 6, 2018

View reviewed changes

add blank line at the EOF

9e570df

putcn dismissed Yancey1989’s stale review via 9e570df June 6, 2018 03:43

Yancey1989 approved these changes Jun 6, 2018

View reviewed changes

putcn merged commit 07cc82f into PaddlePaddle:master Jun 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add vgg16_aws_dist #33

add vgg16_aws_dist #33

putcn commented Jun 4, 2018 •

edited

Loading

Superjomn commented Jun 5, 2018

putcn commented Jun 5, 2018

Superjomn Jun 5, 2018

Yancey1989 left a comment

putcn commented Jun 6, 2018

Yancey1989 left a comment

Yancey1989 Jun 6, 2018

putcn Jun 6, 2018

Yancey1989 Jun 6, 2018

putcn Jun 6, 2018

Yancey1989 left a comment

		@@ -0,0 +1 @@
		[1.0]

add vgg16_aws_dist #33

add vgg16_aws_dist #33

Conversation

putcn commented Jun 4, 2018 • edited Loading

Superjomn commented Jun 5, 2018

putcn commented Jun 5, 2018

Superjomn Jun 5, 2018

Choose a reason for hiding this comment

Yancey1989 left a comment

Choose a reason for hiding this comment

putcn commented Jun 6, 2018

Yancey1989 left a comment

Choose a reason for hiding this comment

Yancey1989 Jun 6, 2018

Choose a reason for hiding this comment

putcn Jun 6, 2018

Choose a reason for hiding this comment

Yancey1989 Jun 6, 2018

Choose a reason for hiding this comment

putcn Jun 6, 2018

Choose a reason for hiding this comment

Yancey1989 left a comment

Choose a reason for hiding this comment

putcn commented Jun 4, 2018 •

edited

Loading