当前版本的速度和半个月之前的版本CPU上速度慢4倍 #10639

dyning · 2018-05-14T10:15:39Z

对比的之前版本提交号是：4a5bfa89c342771688f5d62dc2156df85933af50
在CPU测试下面程序，速度从3s变成12s，变慢。

import paddle
import paddle.fluid as fluid

import sys, os
import time
import numpy as np
import math
import random 

def reader_test():
    def reader():
        index = range(0, 10000)
        random.shuffle(index)
        for idx in index:
            image = np.random.rand(3, 224, 224)
            loc = np.random.rand(4)
            weight = np.random.rand(4)
            yield image, loc, weight     
    return reader 

def main():
    data = fluid.layers.data(name='data', shape=[3, 224, 224], dtype='float32')
    bbox_targets = fluid.layers.data(name='bbox_targets', shape=[4], dtype='float32')
    bbox_loss_weights = fluid.layers.data(name='bbox_loss_weights', shape=[4], dtype='float32')
    fea_fc = fluid.layers.fc(input=data, size=1024, act='relu') 
    fc_loc = fluid.layers.fc(input=fea_fc, size=4, act='relu') 
    loss_loc = fluid.layers.smooth_l1(fc_loc, bbox_targets)
    #loss_loc = fluid.layers.smooth_l1(fc_loc, bbox_targets, inside_weight=bbox_loss_weights, outside_weight=bbox_loss_weights)
    avg_loss = fluid.layers.mean(x=loss_loc)

    bd = [80000]
    lr = [0.001, 0.0001]
    optimizer = fluid.optimizer.Momentum(learning_rate=fluid.layers.piecewise_decay(boundaries=bd, values=lr), momentum=0.9,
        regularization=fluid.regularizer.L2Decay(1e-4))
    opts = optimizer.minimize(avg_loss)  
    #place = fluid.CUDAPlace(0)
    place = fluid.CPUPlace()
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())    
    
    train_reader = paddle.batch(reader_test(), batch_size=128)
    feeder = fluid.DataFeeder(place=place, feed_list=[data, bbox_targets, bbox_loss_weights])
    
    #train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=avg_loss.name)

    for pass_id in range(0, 20):
        for batch_id, blobs in enumerate(train_reader()):
            starttime = time.time()
            #train_exe.run([avg_loss.name], feed=feeder.feed(blobs))
            exe.run(fluid.default_main_program(), feed=feeder.feed(blobs), fetch_list=[avg_loss.name])
            print batch_id, time.time() - starttime
    print "ok"  


if __name__ == '__main__':
    main()

guochaorong · 2018-05-14T13:09:40Z

我编译了最新版本paddle，测试了下数据和一个月前的版本的数据差不多。可以top看下跑模型时候的 cpu 和mem 情况。看看有没有其它外部干扰。

最新版本：
69 1.69508504868
70 1.71081590652
71 1.70212697983
72 1.71085691452
73 1.69607281685
74 1.71060395241
75 1.69598388672
76 1.71035504341
77 1.6958591938
78 0.792373895645
0 1.71148610115
1 1.70132303238
2 1.71033000946
3 1.69501399994
4 1.71113204956
5 1.69488310814
6 1.7130792141
7 1.6964969635
8 1.71299219131
9 1.69600582123
10 1.7109541893
11 1.69559907913
12 1.71031713486
13 1.69494009018
14 1.71046519279
15 1.69500017166
16 1.71071195602
17 1.69542908669
18 1.71022891998
19 1.69576907158
20 1.71009302139
21 1.69588899612

一个月前的版本：
51 1.72976112366
52 1.57088303566
53 1.79439806938
54 1.5992538929
55 1.59160399437
56 1.76902699471
57 1.71167206764
58 1.71636605263
59 1.68664383888
60 1.71221590042
61 1.72089004517
62 1.5582549572
63 1.96613001823
64 1.67614603043
65 1.56630992889
66 1.75715208054
67 1.70690798759
68 1.72569799423
69 1.73480200768
70 1.7015838623
71 1.70937609673
72 1.55996894836
73 1.82411003113
74 1.7260529995
75 1.57758808136
76 1.79415202141
77 1.70930290222
78 0.918473958969

dyning · 2018-05-16T09:28:21Z

编译时没有使用MKL，如果按如下重新build，则速度正常：
cmake .. -DWITH_GPU=ON
-DWITH_TESTING=ON
-DWITH_FAST_BUNDLE_TEST=OFF
-DCMAKE_BUILD_TYPE=Release
-DCUDA_ARCH_NAME=Auto
-DCMAKE_INSTALL_PREFIX=pwd/output
-DPYTHON_LIBRARY=/opt/_internal/cpython-2.7.11-ucs4/lib/libpython2.7.so
-DCMAKE_EXE_LINKER_FLAGS="-lutil"

guochaorong · 2018-05-16T09:38:41Z

收到，周二的时候和宇宁确认过不要加编译参数 DWITH_MKL=OFF

panyx0718 · 2018-05-16T13:14:52Z

@qingqing01 let's also add this to FAQ.

qingqing01 assigned guochaorong May 14, 2018

gongweibao assigned panyx0718, gongweibao, guochaorong and Superjomn and unassigned panyx0718 and guochaorong May 16, 2018

dyning closed this as completed May 16, 2018

panyx0718 assigned qingqing01 May 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

当前版本的速度和半个月之前的版本CPU上速度慢4倍 #10639

当前版本的速度和半个月之前的版本CPU上速度慢4倍 #10639

dyning commented May 14, 2018 •

edited by qingqing01

Loading

guochaorong commented May 14, 2018 •

edited

Loading

dyning commented May 16, 2018

guochaorong commented May 16, 2018 •

edited

Loading

panyx0718 commented May 16, 2018

当前版本的速度和半个月之前的版本CPU上速度慢4倍 #10639

当前版本的速度和半个月之前的版本CPU上速度慢4倍 #10639

Comments

dyning commented May 14, 2018 • edited by qingqing01 Loading

guochaorong commented May 14, 2018 • edited Loading

dyning commented May 16, 2018

guochaorong commented May 16, 2018 • edited Loading

panyx0718 commented May 16, 2018

dyning commented May 14, 2018 •

edited by qingqing01

Loading

guochaorong commented May 14, 2018 •

edited

Loading

guochaorong commented May 16, 2018 •

edited

Loading