Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【question】nce_layer的用法 #1388

Closed
pkuyym opened this issue Feb 20, 2017 · 5 comments
Closed

【question】nce_layer的用法 #1388

pkuyym opened this issue Feb 20, 2017 · 5 comments
Assignees

Comments

@pkuyym
Copy link
Contributor

pkuyym commented Feb 20, 2017

文档介绍:

Noise-contrastive estimation. Implements the method in the following paper: A fast and simple algorithm for training neural probabilistic language models.

cost = nce_layer(input=layer1, label=layer2, weight=layer3,
num_classes=3, neg_distribution=[0.1,0.3,0.6])

我理解,应该是在训练时,以指定概率采样负样本进行更新,模型效果跟样本采样分布以及采样数量相关性很大

问题1:
上述理解是否有误,有没有经验说明nce_layer与完整softmax+negative log loss相比,对模型收敛影响多大?

问题2:
这个layer的输出是cost,如何在predict时获取每个类别的概率?

@pkuyym
Copy link
Contributor Author

pkuyym commented Feb 20, 2017

使用nce_layer时,出现错误:

[INFO 2017-02-20 16:57:45,611 dataprovider.py:20] dict len : 1972305
I0220 16:57:45.611878 28999 GradientMachine.cpp:135] Initing parameters..
I0220 16:58:14.458161 28999 GradientMachine.cpp:142] Init parameters done.
I0220 16:58:15.707015 31345 ThreadLocal.cpp:40] thread use undeterministic rand seed:31346
Thread [140038641317632] Forwarding nce_layer_0, fc_layer_1, fc_layer_0, last_seq_0, simple_gru2_0, __simple_gru2_0___transform, embedding_0, label, bidword_seq,
*** Aborted at 1487581099 (unix time) try "date -d @1487581099" if you are using GNU date ***
PC: @ 0x7f69e3625764 __log
*** SIGFPE (@0x7f69e3625764) received by PID 28999 (TID 0x7f5d49787700) from PID 18446744073229457252; stack trace: ***
@ 0x7f69e4238160 (unknown)
@ 0x7f69e3625764 __log
@ 0x5fd4d5 paddle::NCELayer::forward()
@ 0x6ecb40 paddle::NeuralNetwork::forward()
@ 0x6e15e9 paddle::TrainerThread::forward()
@ 0x6e3bd5 paddle::TrainerThread::computeThread()
@ 0x7f69e39b28a0 execute_native_thread_routine
@ 0x7f69e42301c3 start_thread
@ 0x7f69e312312d __clone

@luotao1
Copy link
Contributor

luotao1 commented Feb 20, 2017

请贴出对应配置。另外NCE_layer不能在GPU上计算,请知晓。

@pkuyym
Copy link
Contributor Author

pkuyym commented Feb 20, 2017

@luotao1 谢谢提醒,配置如下:
bidword_seq = data_layer(name = 'bidword_seq', size = dict_size)
label = data_layer(name = 'label', size = dict_size)

######################Algorithm Configuration #############
settings(
batch_size = 2000,
learning_rate = 1e-7,
learning_method = MomentumOptimizer(momentum = 0.95),
regularization=L2Regularization(1e-5)
)

#######################Network Configuration #############
embed = embedding_layer(input = bidword_seq, size = embed_dim)
gru = simple_gru2(input = embed, size = rnn_dim)
fw_seq = last_seq(input = gru)
hid_layer1 = fc_layer(input = fw_seq, size = 200, act = ReluActivation())
hid_layer2 = fc_layer(input = hid_layer1, size = 100, act = ReluActivation())
outputs(nce_layer(input = [hid_layer2], label = label, num_classes = dict_size))

@dqcao
Copy link

dqcao commented Apr 6, 2017

遇到同样问题,请问解决了吗,怎么解决问题的!@luotao1 @pkuyym

@lcy-seso lcy-seso self-assigned this Apr 7, 2017
@pkuyym
Copy link
Contributor Author

pkuyym commented Jul 31, 2017

@dqcao 请参照这个样例nce_cost使用nce,里面有使用nce_cost进行训练、预测的逻辑

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
* update install whl

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants