Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the implementation of embedding in PaddlePaddle #3867

Closed
bdluping opened this issue Sep 5, 2017 · 8 comments
Closed

What is the implementation of embedding in PaddlePaddle #3867

bdluping opened this issue Sep 5, 2017 · 8 comments
Assignees
Labels
User 用于标记用户问题

Comments

@bdluping
Copy link

bdluping commented Sep 5, 2017

We wanna know the implementation of the function embedding inside, but we can't find it using tools such as source insight. Could you guys help us to locate this function in the paddle project?

@lcy-seso lcy-seso self-assigned this Sep 5, 2017
@lcy-seso lcy-seso added the User 用于标记用户问题 label Sep 5, 2017
@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 5, 2017

@lcy-seso lcy-seso changed the title issue about paddle.layer.embedding What is the implementation of embedding in PaddlePaddle Sep 5, 2017
@bdluping
Copy link
Author

bdluping commented Sep 5, 2017

感谢定位。
我们在脚本里看到有以下处理流程:

data = paddle.layer.data("word",paddle.data_type.integer_value_sequence(input_dim))
emb = paddle.layer.embedding(input=data, size=emb_dim);
conv_3 = paddle.networks.sequence_conv_pool(input=emb, context_len=3, 
hidden_size=hid_dim)

我们其中,我们想自己替换embedding这一步,data转emb,data怎么解析,转成我们的向量后怎么封装成emb,查看embedding也就是相对其内部格式有一个了解,这样才能对接paddle.networks.sequence_conv_pool

@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 5, 2017

请参考此issue下面的回复,我会把这个问题加入paddle 的F&Q中:
#2812

@bdluping
Copy link
Author

bdluping commented Sep 5, 2017

感谢 我们研究一下

@lcy-seso
Copy link
Contributor

lcy-seso commented Sep 5, 2017

PaddleBook 中 SRL 一节,https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/train.py#L142 使用了加载预训练参数,使用方法也可以参考这个例子。

@bdluping
Copy link
Author

bdluping commented Sep 5, 2017

看了https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/train.py#L142

感觉很有用,还请值班人员帮忙看看以下我的理解是否有误:

通过parameters.set,加载了预先训练出来的44068*32的矩阵(44068个词向量,每个词向量32维,每一维是个float32类型;该矩阵里的第i个词向量,i对应该词在词典里的位置),加载这个后,整个embedding层用该矩阵map,把输入里的word(词典位置) map成刚刚加载矩阵的对应的word向量,整个map完成后,输出给其他层使用。我们只需要加载自己训练的对应矩阵就可以达到自定义embedding的目的。

@hedaoyuan
Copy link
Contributor

@bdluping 你的理解没有错。加载自定义的embedding,可以参考#2927

@bdluping
Copy link
Author

bdluping commented Sep 6, 2017

好的 谢谢啦

@lcy-seso lcy-seso closed this as completed Sep 8, 2017
heavengate added a commit to heavengate/Paddle that referenced this issue Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
User 用于标记用户问题
Projects
None yet
Development

No branches or pull requests

3 participants