Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

当前版本的速度和半个月之前的版本CPU上速度慢4倍 #10639

Closed
dyning opened this issue May 14, 2018 · 4 comments
Closed

当前版本的速度和半个月之前的版本CPU上速度慢4倍 #10639

dyning opened this issue May 14, 2018 · 4 comments
Assignees

Comments

@dyning
Copy link
Contributor

dyning commented May 14, 2018

对比的之前版本提交号是:4a5bfa89c342771688f5d62dc2156df85933af50
在CPU测试下面程序,速度从3s变成12s,变慢。

import paddle
import paddle.fluid as fluid

import sys, os
import time
import numpy as np
import math
import random 

def reader_test():
    def reader():
        index = range(0, 10000)
        random.shuffle(index)
        for idx in index:
            image = np.random.rand(3, 224, 224)
            loc = np.random.rand(4)
            weight = np.random.rand(4)
            yield image, loc, weight     
    return reader 

def main():
    data = fluid.layers.data(name='data', shape=[3, 224, 224], dtype='float32')
    bbox_targets = fluid.layers.data(name='bbox_targets', shape=[4], dtype='float32')
    bbox_loss_weights = fluid.layers.data(name='bbox_loss_weights', shape=[4], dtype='float32')
    fea_fc = fluid.layers.fc(input=data, size=1024, act='relu') 
    fc_loc = fluid.layers.fc(input=fea_fc, size=4, act='relu') 
    loss_loc = fluid.layers.smooth_l1(fc_loc, bbox_targets)
    #loss_loc = fluid.layers.smooth_l1(fc_loc, bbox_targets, inside_weight=bbox_loss_weights, outside_weight=bbox_loss_weights)
    avg_loss = fluid.layers.mean(x=loss_loc)

    bd = [80000]
    lr = [0.001, 0.0001]
    optimizer = fluid.optimizer.Momentum(learning_rate=fluid.layers.piecewise_decay(boundaries=bd, values=lr), momentum=0.9,
        regularization=fluid.regularizer.L2Decay(1e-4))
    opts = optimizer.minimize(avg_loss)  
    #place = fluid.CUDAPlace(0)
    place = fluid.CPUPlace()
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())    
    
    train_reader = paddle.batch(reader_test(), batch_size=128)
    feeder = fluid.DataFeeder(place=place, feed_list=[data, bbox_targets, bbox_loss_weights])
    
    #train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=avg_loss.name)

    for pass_id in range(0, 20):
        for batch_id, blobs in enumerate(train_reader()):
            starttime = time.time()
            #train_exe.run([avg_loss.name], feed=feeder.feed(blobs))
            exe.run(fluid.default_main_program(), feed=feeder.feed(blobs), fetch_list=[avg_loss.name])
            print batch_id, time.time() - starttime
    print "ok"  


if __name__ == '__main__':
    main()
@guochaorong
Copy link
Contributor

guochaorong commented May 14, 2018

我编译了最新版本paddle,测试了下数据和一个月前的版本的数据差不多。可以top看下跑模型时候的 cpu 和mem 情况。 看看有没有其它外部干扰。

最新版本:
69 1.69508504868
70 1.71081590652
71 1.70212697983
72 1.71085691452
73 1.69607281685
74 1.71060395241
75 1.69598388672
76 1.71035504341
77 1.6958591938
78 0.792373895645
0 1.71148610115
1 1.70132303238
2 1.71033000946
3 1.69501399994
4 1.71113204956
5 1.69488310814
6 1.7130792141
7 1.6964969635
8 1.71299219131
9 1.69600582123
10 1.7109541893
11 1.69559907913
12 1.71031713486
13 1.69494009018
14 1.71046519279
15 1.69500017166
16 1.71071195602
17 1.69542908669
18 1.71022891998
19 1.69576907158
20 1.71009302139
21 1.69588899612

一个月前的版本:
51 1.72976112366
52 1.57088303566
53 1.79439806938
54 1.5992538929
55 1.59160399437
56 1.76902699471
57 1.71167206764
58 1.71636605263
59 1.68664383888
60 1.71221590042
61 1.72089004517
62 1.5582549572
63 1.96613001823
64 1.67614603043
65 1.56630992889
66 1.75715208054
67 1.70690798759
68 1.72569799423
69 1.73480200768
70 1.7015838623
71 1.70937609673
72 1.55996894836
73 1.82411003113
74 1.7260529995
75 1.57758808136
76 1.79415202141
77 1.70930290222
78 0.918473958969

@dyning
Copy link
Contributor Author

dyning commented May 16, 2018

编译时没有使用MKL,如果按如下重新build,则速度正常:
cmake .. -DWITH_GPU=ON
-DWITH_TESTING=ON
-DWITH_FAST_BUNDLE_TEST=OFF
-DCMAKE_BUILD_TYPE=Release
-DCUDA_ARCH_NAME=Auto
-DCMAKE_INSTALL_PREFIX=pwd/output
-DPYTHON_LIBRARY=/opt/_internal/cpython-2.7.11-ucs4/lib/libpython2.7.so
-DCMAKE_EXE_LINKER_FLAGS="-lutil"

@dyning dyning closed this as completed May 16, 2018
@guochaorong
Copy link
Contributor

guochaorong commented May 16, 2018

收到,周二的时候和宇宁确认过 不要加编译参数 DWITH_MKL=OFF

@panyx0718
Copy link
Contributor

@qingqing01 let's also add this to FAQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants