-
Notifications
You must be signed in to change notification settings - Fork 39
Conversation
看了代码格式,貌似没有跑 |
跑了啊,
|
writer = csv.DictWriter(csvfile, fieldnames=fieldnames) | ||
writer.writeheader() | ||
for row in rows: | ||
writer.writerow(row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an empty line here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a proposal to reference the fluid_benchmark.py and kube_gen_job.py from Paddle the repo https://github.com/PaddlePaddle/Paddle/tree/develop/benchmark/fluid, if so that we don't need to update them twice.
@Yancey1989 currently we can't, because the metric retrieve part is different, we have to output specific format of metrics so that aws_runner can identify and catch these values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++ with some tiny comment.
--online_mode yes \ | ||
--pserver_command $training_command \ | ||
--trainer_command $training_command \ | ||
--docker_image $fluid_benchmark_dockerhub_tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems need a blank line at the end of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will update
@@ -0,0 +1 @@ | |||
[1.0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these KPI result files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, CE have to have initial KPI data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM again.
add vgg16_aws_dist.
this test will be performed in aws utilizing aws_runner to dynamically allocating instances, network resources, collect metrics data and save results to CE for further analytics.
Testing models are using benchmar/fluid. this case is running vgg16 dist train test, but it's very easy to add more round of tests with other models and cluster configs. these configs can be updated in run.xsh and continuous_evaluation.py
Please note: nccl2 is not yet supported, will add it in next PR