Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C预测程序中,如何跳过部分底层网络,将中间层作为输入? #3915

Closed
fty8788 opened this issue Sep 6, 2017 · 13 comments
Closed
Labels
Good Question User 用于标记用户问题

Comments

@fty8788
Copy link

fty8788 commented Sep 6, 2017

背景:
DSSM模型中,一个输入是query,另一个输入是候选词。为了优化线上计算性能,候选词可以离线计算出来,在线计算只计算query部分的网络,在线和离线结果直接给softmax层

做法:

  1. 离线导出候选词最高隐层的输出结果
  2. 在线计算,修改trainer_config.py,删除target部分的网络,输入设为dense_vector_sequence类型
  3. main.c中,将离线结果生成ivector,输入给模型计算预测结果

问题:
加载原模型后,运行出core:

F0906 16:00:37.229890 28618 Parameter.cpp:349] Check failed: header.size == getSize() (32768 vs. 3455094) The size (32768) in the file does not match the size (3455094) of the parameter: ___fc_layer_1__.w0
*** Check failure stack trace: ***
    @     0x7ffff6cee87d  google::LogMessage::Fail()
    @     0x7ffff6cf232c  google::LogMessage::SendToLog()
    @     0x7ffff6cee3a3  google::LogMessage::Flush()
    @     0x7ffff6cf383e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7ffff6c6ea0a  paddle::Parameter::load()
    @     0x7ffff6c6f0c7  paddle::Parameter::load()
    @     0x7ffff6b610f6  paddle::GradientMachine::loadParameters()
    @     0x7ffff6a1c433  paddle_gradient_machine_load_parameter_from_disk
    @           0x401557  main
    @     0x7ffff6483bd5  __libc_start_main
    @           0x401259  (unknown)
    @              (nil)  (unknown)

Program received signal SIGABRT, Aborted.
0x00007ffff64973f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
(gdb) bt
#0  0x00007ffff64973f7 in raise () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#1  0x00007ffff64987d8 in abort () from /opt/compiler/gcc-4.8.2/lib/libc.so.6
#2  0x00007ffff6cf7715 in google::DumpStackTraceAndExit() () from /home/yanchunwei/third_party/tengfei/Paddle/paddle/capi/examples/model_inference/usr/local/lib/libpaddle_capi_shared.so
#3  0x00007ffff6cee87d in google::LogMessage::Fail() () from /home/yanchunwei/third_party/tengfei/Paddle/paddle/capi/examples/model_inference/usr/local/lib/libpaddle_capi_shared.so
#4  0x00007ffff6cf232c in google::LogMessage::SendToLog() () from /home/yanchunwei/third_party/tengfei/Paddle/paddle/capi/examples/model_inference/usr/local/lib/libpaddle_capi_shared.so
#5  0x00007ffff6cee3a3 in google::LogMessage::Flush() () from /home/yanchunwei/third_party/tengfei/Paddle/paddle/capi/examples/model_inference/usr/local/lib/libpaddle_capi_shared.so
#6  0x00007ffff6cf383e in google::LogMessageFatal::~LogMessageFatal() () from /home/yanchunwei/third_party/tengfei/Paddle/paddle/capi/examples/model_inference/usr/local/lib/libpaddle_capi_shared.so
#7  0x00007ffff6c6ea0a in paddle::Parameter::load(std::basic_istream<char, std::char_traits<char> >&) () at /home/yuyang/BuildAgent3/work/d55918cf60d51073/paddle/parameter/Parameter.cpp:400
#8  0x00007ffff6c6f0c7 in paddle::Parameter::load(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at /home/yuyang/BuildAgent3/work/d55918cf60d51073/paddle/parameter/Parameter.cpp:339
#9  0x00007ffff6b610f6 in paddle::GradientMachine::loadParameters(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () at /home/yuyang/BuildAgent3/work/d55918cf60d51073/paddle/gserver/gradientmachines/GradientMachine.cpp:79
#10 0x00007ffff6a1c433 in paddle_gradient_machine_load_parameter_from_disk () at /home/yuyang/BuildAgent3/work/d55918cf60d51073/paddle/capi/gradient_machine.cpp:67
#11 0x0000000000401557 in main ()

怀疑是不是模型参数文件也得修改?删掉right_fc的那些参数文件?

@luotao1
Copy link
Contributor

luotao1 commented Sep 6, 2017

Parameter.cpp:349] Check failed: header.size == getSize() (32768 vs. 3455094)可以看出,直接拿训练好的参数_fc_layer_1.w0,不匹配你当前的预测配置。

@fty8788
Copy link
Author

fty8788 commented Sep 6, 2017

@luotao1 删掉部分模型参数文件有用么?我删了right的参数,还是一样的错误

@luotao1
Copy link
Contributor

luotao1 commented Sep 6, 2017

参数的header,你也得对应进行变化。

@fty8788
Copy link
Author

fty8788 commented Sep 6, 2017

header如何改呢,在哪个文件

@luotao1
Copy link
Contributor

luotao1 commented Sep 6, 2017

每个参数都有自己的header,不知道你说的right参数是哪一个文件。从出错信息上看,_fc_layer_1.w0的header就和预测配置的不符合。

@fty8788
Copy link
Author

fty8788 commented Sep 7, 2017

right是name为right的部分网络
od -f _fc_layer_1.w0的尾部数据是这样:

0377720   2.860865e-01  -1.993349e-01  -1.586919e-01  -1.650321e-01
0377740  -2.726264e-01   2.844545e-01   3.533762e-01  -2.132430e-01
0377760   2.185809e-01  -4.334270e-01  -1.166841e-01   7.720743e-02
0400000   3.310674e-01   6.301481e-02  -2.428743e-01   2.548038e-01
0400020

我这个模型的结构应该是这样的:

                            ___fc_layer_2
                            /          \
                ___fc_layer_0       ___fc_layer_1
                           |               |
                       left               right

现在我把right去掉了,所以___fc_layer_1__.w0不匹配,这个怎么改呢

@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 7, 2017

DSSM 的网络中 right 和 left 网络完全对称,共享参数。如果去掉 right 不会出现 ___fc_layer_1.w0 不匹配。肯定应该有别的问题。还是你还改了其它地方?

@fty8788
Copy link
Author

fty8788 commented Sep 7, 2017

  1. 网络的配置
    https://github.com/PaddlePaddle/models/blob/develop/dssm/network_conf.py
    对这个配置改了如下部分:
target = paddle.layer.data(
        name='target_input',
        type=paddle.data_type.dense_vector_sequence(self.vocab_sizes[1]))

输入由integer_value_sequence改成dense_vector_sequence

for id, input in enumerate([source]):
        x = self.create_embedding(input, prefix=embed_prefixs[id])
        word_vecs.append(x)

去掉了原target的输入

semantics = []
for id, input in enumerate(word_vecs):
        x = self.model_arch_creater(input, prefix=prefixs[id])
        semantics.append(x)
semantics.append(target)

target直接输入到semantics层

  1. 删除模型中对应层的参数

模型文件解压缩后,删掉了right开头的所有文件,包括right_emb.w、right_fc_0_128.b等

  1. infer输入数据的格式

left的输入不变:{330070,1515788,1606717,163247,1622216,251207,304166,729241,1177768};

right的输入变为(离线导出):

float intent_semantic[] = {-0.434024,-0.999805,0.999574,0.999854,0.99991,-0.999943,-1.0,0.906624,-0.999991,0.999976,-0.999891,0.999057,0.999854,0.999814,-0.999994,-0.999516,0.818375,0.263941,-0.99892,-0.95015,-0.999959,0.80347,0.99926,0.999913,-0.0611929,-0.999988,0.999909,0.999227,-0.999974,0.0621019,-0.997075,0.665512};

paddle_matrix mat = paddle_matrix_create(/* sample_num */ 1, 32, false);
paddle_real* intent_array;
CHECK(paddle_matrix_get_row(mat, 0, &intent_array));
for (int i = 0; i < 32; ++i) {
    intent_array[i] = intent_semantic[i];
}
CHECK(paddle_arguments_set_value(in_args, 1, mat));

@lcy-seso lcy-seso added the User 用于标记用户问题 label Sep 7, 2017
@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 7, 2017

target = paddle.layer.data(
            name='target_input',
            type=paddle.data_type.dense_vector_sequence(self.vocab_sizes[1]))

下面这个变量是多少?是不是 32 ?

self.vocab_sizes[1]

请尽可能地组织一下问题。如果不能提炼问题,就无法迅速的解决问题。

里面这些变量都是什么,一个完全脱离背景的人需要花时间去理解。这些是改过了?还是没改?这些变量具体取值都是什么?自己是否有做过检查。

并且请确认提供的信息是完备,(完备不意味的把所有的代码都贴出来)。

Check failed: header.size == getSize() (32768 vs. 3455094) The size (32768) in the file does not match the size (3455094) of the parameter: ___fc_layer_1__.w0

这个出错信息非常明确,也非常简单。配置发生了变化, ___fc_layer_1__.w0 存储下来的参数和现在配置中的size对不上。

@fty8788
Copy link
Author

fty8788 commented Sep 7, 2017

@lcy-seso 谢谢
self.vocab_sizes[1]这个刚刚没注意到,应该改成32,因为right网络的输出是32维(新的网络中就是target数据层)。

重新运行后的出错信息:
Check failed: header.size == getSize() (32768 vs. 128) The size (32768) in the file does not match the size (128) of the parameter: _fc_layer_1.w0
32768=32*1024,参数文件中___fc_layer_1__的参数个数为啥是32*1024呢
而输入的intent_semantic是32维的,在这里为啥变成128呢

训练时的网络结构:

                        ___fc_layer_2
                        /          \
            ___fc_layer_0       ___fc_layer_1
                       |               |
                   left               right(最高隐层是32节点)

预测时的网络结构:

                        ___fc_layer_2
                        /          \
            ___fc_layer_0       ___fc_layer_1
                       |               |
                   left               intent_semantic(paddle_matrix类型)

不知道这样说明白了没有。

@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 7, 2017

我可以理解你的任务和需求。

有一个办法,你试试看:https://github.com/lcy-seso/models/blob/8f10929112909c20dd4643fae05eb8c116a4792d/globally_normalized_reader/model.py#L327
这个方法能够将网络的解析结果打印出来。

你检查下修改之后的网络解析结果,和没有修改的网络解析结果。除去被丢弃掉的embedding层。修改前后两个网络的参数名,参数大小信息应该完全一样,不应该有差异。

@fty8788
Copy link
Author

fty8788 commented Sep 8, 2017

@lcy-seso 按你的方法解析出来了,网络配置参数应该是正确的:

           fc_layer_1
                |
            concat_64
          /                \\

left_fc_2_32 target_input_32
|
left_fc_1_64
|
left_fc_0_128
|
fc_layer_0
|
seq_pooling_0
|
embedding_0
|
source_input

而原来的网络配置是:

           fc_layer_2
                |
            concat_64
          /                  \\

left_fc_2_32 right_fc_2_32
| |
left_fc_1_64 right_fc_1_64
| |
left_fc_0_128 right_fc_0_128
| |
fc_layer_0 fc_layer_1
| |
seq_pooling_0 seq_pooling_1
| |
embedding_0 embedding_1
| |
source_input target_input

我推测是模型参数文件没有处理好。
之前我的做法是,删除了right开头的所有参数文件。但是在预测时,旧网络中的fc_layer_1被误认为是新网络的输出层。
新的做法是:再删掉fc_layer_1的参数文件,fc_layer_2的参数文件重命名为fc_layer_1

再运行,成功

@Yancey1989
Copy link
Contributor

赞清晰的描述,我先关闭这个issue了,如有更新可以随时再打开。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good Question User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

4 participants