make benchmark really working #11215

panyx0718 · 2018-06-06T02:42:58Z

user complain they crash when following our doc.

gongweibao

If we have run_fluid_benchmark.sh, do we need the run step in README.md?

gongweibao · 2018-06-06T02:56:54Z

benchmark/fluid/README.md

@@ -29,9 +29,11 @@ Currently supported `--model` argument include:
    You can choose to use GPU/CPU training. With GPU training, you can specify
    `--gpus <gpu_num>` to run multi GPU training.
 * Run distributed training with parameter servers:
+    * see run_fluid_benchmark.sh as an example.


Need a link here.

Not sure this is needed. link can be broken and the file is just in this folder

[run_fluid_benchmark.sh](./run_fluid_benchmark.sh)

gongweibao · 2018-06-06T02:57:16Z

benchmark/fluid/README.md

@@ -29,9 +29,11 @@ Currently supported `--model` argument include:
    You can choose to use GPU/CPU training. With GPU training, you can specify
    `--gpus <gpu_num>` to run multi GPU training.
 * Run distributed training with parameter servers:
+    * see run_fluid_benchmark.sh as an example.
    * start parameter servers:
        ```bash
        PADDLE_TRAINING_ROLE=PSERVER PADDLE_PSERVER_PORT=7164 PADDLE_PSERVER_IPS=127.0.0.1 PADDLE_TRAINERS=1 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model mnist  --device GPU --update_method pserver


pserver runs on CPU

Yancey1989 · 2018-06-06T02:58:42Z

benchmark/fluid/run_fluid_benchmark.sh

@@ -0,0 +1,10 @@
+#!/bin/bash
+
+PADDLE_TRAINING_ROLE=PSERVER PADDLE_PSERVER_PORT=7164 PADDLE_PSERVER_IPS=127.0.0.1 PADDLE_TRAINERS=2 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model resnet --device GPU --update_method pserver --iterations=10000 &


Seems this command would print all logs on the terminal, we can startup them as follows:

PADDLE_TRAINING_ROLE=PSERVER ... stdbuf -oL nohup python fluid_benchmark.py <args> 2>&1 > server.log &

And then users would check the logs in the server.log file.

I think it's fine to print out some logs to give user some feedback. There aren't many outputs

Pass 0, batch 162, loss [2.7855887 2.973915 ] Pass 0, batch 162, loss [3.0754983 3.2426462] Pass 0, batch 171, loss [3.4701207 4.438573 ] Pass 0, batch 171, loss [3.7791452 3.3191109]

gongweibao

LGTM

make benchmark really working

5a13d1b

panyx0718 requested a review from typhoonzero June 6, 2018 02:42

clean

b725ff5

panyx0718 requested a review from jacquesqiao June 6, 2018 02:45

gongweibao requested changes Jun 6, 2018

View reviewed changes

Yancey1989 reviewed Jun 6, 2018

View reviewed changes

typhoonzero previously approved these changes Jun 6, 2018

View reviewed changes

follow comments

b11d15e

panyx0718 dismissed typhoonzero’s stale review via b11d15e June 6, 2018 03:13

gongweibao previously approved these changes Jun 6, 2018

View reviewed changes

code style

0b014db

panyx0718 dismissed gongweibao’s stale review via 0b014db June 6, 2018 04:27

gongweibao approved these changes Jun 7, 2018

View reviewed changes

gongweibao merged commit 53a509d into PaddlePaddle:develop Jun 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make benchmark really working #11215

make benchmark really working #11215

panyx0718 commented Jun 6, 2018

gongweibao left a comment

gongweibao Jun 6, 2018

panyx0718 Jun 6, 2018

gongweibao Jun 6, 2018 •

edited

gongweibao Jun 6, 2018

panyx0718 Jun 6, 2018

Yancey1989 Jun 6, 2018

panyx0718 Jun 6, 2018

gongweibao left a comment

		@@ -0,0 +1,10 @@
		#!/bin/bash

		PADDLE_TRAINING_ROLE=PSERVER PADDLE_PSERVER_PORT=7164 PADDLE_PSERVER_IPS=127.0.0.1 PADDLE_TRAINERS=2 PADDLE_CURRENT_IP=127.0.0.1 PADDLE_TRAINER_ID=0 python fluid_benchmark.py --model resnet --device GPU --update_method pserver --iterations=10000 &

make benchmark really working #11215

make benchmark really working #11215

Conversation

panyx0718 commented Jun 6, 2018

gongweibao left a comment

Choose a reason for hiding this comment

gongweibao Jun 6, 2018

Choose a reason for hiding this comment

panyx0718 Jun 6, 2018

Choose a reason for hiding this comment

gongweibao Jun 6, 2018 • edited

Choose a reason for hiding this comment

gongweibao Jun 6, 2018

Choose a reason for hiding this comment

panyx0718 Jun 6, 2018

Choose a reason for hiding this comment

Yancey1989 Jun 6, 2018

Choose a reason for hiding this comment

panyx0718 Jun 6, 2018

Choose a reason for hiding this comment

gongweibao left a comment

Choose a reason for hiding this comment

gongweibao Jun 6, 2018 •

edited