Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine fluid_benchmark.py #11118

Closed

Conversation

chengduoZH
Copy link
Contributor

No description provided.

"./flowers_1.recordio", "./flowers_1.recordio",
"./flowers_1.recordio", "./flowers_1.recordio",
"./flowers_1.recordio", "./flowers_1.recordio"
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_list = ["./flowers_1.recordio"] * 8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

train_losses.append(loss)
print("Pass: %d, Iter: %d, Loss: %f\n" %
(pass_id, iters, np.mean(train_losses)))
if args.use_recordio:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请问use_recordio对CPU训练有效果么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recordio现在主要是为了加速GPU上数据的读取,对于CPU可能作用不大,并且现在parallel executor还不支持CPU。

kolinwei
kolinwei previously approved these changes Jun 1, 2018
for batch_id, data in enumerate(train_reader()):
train_losses = []
if args.use_recordio:
pass_id = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

成舵老师, use_recordio 场景只是想训练1个pass 么?

Copy link
Contributor Author

@chengduoZH chengduoZH Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用recordio乐意训练很多个pass的,只是需要训练多少个pass是在这里指定的,这里的pass_id其实是没有什么用的。

examples_per_sec = num_samples / train_elapsed
print('Total examples: %d, total time: %.5f, %.5f examples/sec' %
(num_samples, train_elapsed, examples_per_sec))
print("Pass: %d, Loss: %f" % (pass_id, np.mean(train_losses)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面几行似乎缩进不对,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengduoZH does the code still runs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panyx0718 @guochaorong This code can work, but line 265 is unnecessary.

(num_samples, train_elapsed, examples_per_sec))
if not args.no_test and batch_acc != None:
test_acc = test(startup_exe, infer_prog, test_reader, feeder, batch_acc)
print("Pass: %d, Test Accuracy: %f" % (pass_id, test_acc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面6,7行缩进也不对吧

fluid.layers.data(
name='label', shape=[1], dtype='int64'),
],
place=fluid.CPUPlace())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place 是cpu还是cuda

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU

def get_model(args):
model = resnet_cifar10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一行去掉了, 倒数第3行 model 可能未定义。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不会的,model是有定义的

startup_exe = fluid.Executor(place)
startup_exe.run(startup_prog)
strategy = fluid.ExecutionStrategy()
strategy.num_threads = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this default to 1?

Copy link
Contributor Author

@chengduoZH chengduoZH Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I saw that strategy.num_threads is set to 1 in train_parallel, and @Yancey1989 ever said that the distributed program will be hung when strategy.num_threads is greater than 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

startup_exe.run(startup_prog)
strategy = fluid.ExecutionStrategy()
strategy.num_threads = 1
strategy.allow_op_delay = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be default to false if not often used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this line is unnecessary in fact because the default value is false.

train_losses = []
for batch_id, data in enumerate(train_reader()):
train_losses = []
if args.use_recordio:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the original training loop not able to use_recordio? Seems hacking the use_recordio into this place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the number pass has been set here, so the for pass_id in range(args.pass_num) is meanless for use_recordio.

examples_per_sec = num_samples / train_elapsed
print('Total examples: %d, total time: %.5f, %.5f examples/sec' %
(num_samples, train_elapsed, examples_per_sec))
print("Pass: %d, Loss: %f" % (pass_id, np.mean(train_losses)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengduoZH does the code still runs?

@@ -337,21 +389,26 @@ def print_arguments(args):

def main():
args = parse_args()
cards = os.getenv("CUDA_VISIBLE_DEVICES") or ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use --gpus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

print_arguments(args)

# the unique trainer id, starting from 0, needed by trainer
# only
nccl_id_var, num_trainers, trainer_id = (
None, 1, int(os.getenv("PADDLE_TRAINER_ID", "-1")))
None, 1, int(os.getenv("PADDLE_TRAINER_ID", "0")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because trainer_id is the last parameter of ParallelExecutor but the last parameter's type is size_t, so it cannot be -1.

pe.def(py::init<const std::vector<platform::Place> &,
const std::unordered_set<std::string> &,
const std::unordered_set<std::string> &, const ProgramDesc &,
const std::string &, Scope *, std::vector<Scope *> &,
const ExecutionStrategy &, const BuildStrategy &, size_t,
size_t>())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I made this a illegal default number so that PADDLE_TRAINER_ID must be set to posiive for ParallelExecutor.

input = fluid.layers.data(name='data', shape=dshape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
if args.use_recordio:
recordio_name = './cifar10_1.recordio' if args.data_set == 'cifar10' else './flowers_1.recordio'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long?

num_samples = 0
start_time = time.time()

if args.use_recordio:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass_num doesn't work when use_recordio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

@guochaorong guochaorong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix the logical problems

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought this is the same work as #11121?

@@ -39,6 +39,11 @@ def parse_args():
help='The model to run benchmark with.')
parser.add_argument(
'--batch_size', type=int, default=32, help='The minibatch size.')
parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change the meaning of --batch_size is OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

num_samples += len(data)
iters += 1
if batch_id % 1 == 0:
num_samples += args.batch_size # dev_cnt * args.batch_size?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last batch size may be different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but currently, we didn't get the current batch size when we use recordio. This is the problem.
The batch size is set here, if the last batch size of one pass is less than args.batch_size and the current pass is not last, recordio will read data for the next pass to make up the batch.

@chengduoZH chengduoZH force-pushed the refine_benchmark_py branch 5 times, most recently from 2a7a1dd to f7414a5 Compare June 4, 2018 05:43
startup_exe = fluid.Executor(place)
startup_exe.run(startup_prog)
strategy = fluid.ExecutionStrategy()
strategy.num_threads = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

print_arguments(args)

# the unique trainer id, starting from 0, needed by trainer
# only
nccl_id_var, num_trainers, trainer_id = (
None, 1, int(os.getenv("PADDLE_TRAINER_ID", "-1")))
None, 1, int(os.getenv("PADDLE_TRAINER_ID", "0")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I made this a illegal default number so that PADDLE_TRAINER_ID must be set to posiive for ParallelExecutor.

generate_recordio(dshape, data_set_iterator, recordio_name)

batch_size_per_gpu = args.batch_size / args.gpus
file_list = [recordio_name] * 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to generate 8 sharded files?
In current way, it's easy to have duplicated data if shuffling buffer is not big enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This script currently only takes care of the performance. If the length of file_list and thread_num are equal to the device count, the program will be faster.

@@ -337,21 +386,26 @@ def print_arguments(args):

def main():
args = parse_args()
gpus = os.getenv("CUDA_VISIBLE_DEVICES") or ""
args.gpus = len(gpus.split(","))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems use can use --gpus instead of CUDA_VISIBLE_DEVCIES

train_exe.bcast_params()

num_samples += args.batch_size
if iters % 1 == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line seems no necessary

num_samples, start_time = 0, time.time()

if args.use_recordio:
for iters in xrange(args.iterations):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems add a pass_num control will be better

@chengduoZH
Copy link
Contributor Author

This PR is only used by @kolinwei for getting the performance of the current Fluid, now the mission of this PR has completed. Meanwhile, #11121 had given a better benchmark script, so this PR can be closed.

@chengduoZH chengduoZH closed this Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants