add embedding 2.0 #26649

seiriosPlus · 2020-08-25T06:55:06Z

PR types

New features

PR changes

OPs

Describe

add new embedding function and change it's interface

paddle-bot-old · 2020-08-25T06:55:12Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Heeenrrry · 2020-08-27T01:17:49Z

python/paddle/nn/functional/input.py

+                            [0.0,         0.0,         ..., 0.0        ]]  # padding data
+            It will pad all-zero data when ids is 0.
+        Args:
+            input(Variable): A Tensor or LoDTensor with type int64, which contains the id information.


For 2.0, dtype Variable -> Tensor

Heeenrrry · 2020-08-27T01:21:24Z

python/paddle/nn/functional/input.py

+            input(Variable): A Tensor or LoDTensor with type int64, which contains the id information.
+                The last dimension of Tensor shape must be equal to 1. The value of the input id should
+                satisfy :math:`0<= id < size[0]` .
+            weight (Variable): The weight. A Tensor with shape of lookup table parameter. It should have two elements which


same as above, use Tensor, recommend to check and amend the expression totally.

Heeenrrry · 2020-08-27T01:24:43Z

python/paddle/nn/functional/input.py

+               to :ref:`api_guide_Name`. Usually name is no need to set and
+               None by default.
+        Returns:
+            Variable: Embedding Tensor or LoDTensor mapped by input. The data type is the same as :attr:`dtype` .


Variable -> Tensor
No LoDTensor. show Tensor only in docs.

Heeenrrry · 2020-08-27T01:37:33Z

python/paddle/nn/functional/input.py

+                            [0.0,         0.0,         ..., 0.0        ]]]  # padding data
+            The input padding_idx is less than 0, it is automatically converted to padding_idx = -1 + 128 = 127
+            It will pad all-zero data when ids is 127.
+            Case 2:


In 2.0, LoDTensor is not recommended. LoD examples can be removed.

Heeenrrry · 2020-08-27T01:42:18Z

python/paddle/fluid/tests/unittests/test_nn_functional_embedding.py

@@ -0,0 +1,57 @@
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.


2019 -> 2020

Heeenrrry · 2020-08-27T01:43:36Z

python/paddle/nn/functional/input.py

+                indicates the size of the dictionary of embeddings and the size of each embedding vector respectively.
+            is_sparse(bool): The flag indicating whether to use sparse update. This parameter only
+                affects the performance of the backwards gradient update. It is recommended to set
+                True because sparse update is faster. But some optimizer does not support sparse update,


... some optimizers do not support...

Heeenrrry · 2020-08-27T01:44:25Z

python/paddle/nn/functional/input.py

+                such as :ref:`api_fluid_optimizer_AdadeltaOptimizer` , :ref:`api_fluid_optimizer_AdamaxOptimizer` ,
+                :ref:`api_fluid_optimizer_DecayedAdagradOptimizer` , :ref:`api_fluid_optimizer_FtrlOptimizer` ,
+                :ref:`api_fluid_optimizer_LambOptimizer` and :ref:`api_fluid_optimizer_LarsMomentumOptimizer` .
+                In these case, is_sparse must be False. Default: False.


case -> cases

Heeenrrry · 2020-08-27T01:53:26Z

python/paddle/nn/functional/input.py

@@ -108,3 +108,113 @@ def one_hot(x, num_classes, name=None):
            outputs={'Out': one_hot_out},
            stop_gradient=True)
        return one_hot_out
+
+
+def embedding(input, weight, padding_idx=None, is_sparse=True, name=None):


a preview page of this part is required to check the display format and layout.

… feature/embedding_2.0

iclementine · 2020-08-27T09:13:03Z

确认了一下详细的参数名，在这里记录一下

paddle.nn.Embedding(num_embeddings, embedding_dim, 
                    padding_idx=None, 
                    sparse=False, 
                    weight_attr=None, 
                    name=None)

paddle.nn.Embedding.forward(self, x)

paddle.nn.functional.embedding(x, weight, padding_idx=None,  sparse=False, name=None)

iclementine

文档需要进行一些修改，因为 lookup_v2_op 不要求输入的最后一维是 1，输出的形状是输入的形状后面 append 一个 embedding_size.

iclementine · 2020-08-27T09:17:48Z

python/paddle/nn/functional/input.py

+            Case 1:
+            input is a Tensor. padding_idx = -1
+                input.data = [[[1], [3]], [[2], [4]], [[4], [127]]]
+                input.shape = [3, 2, 1]


这里的形状说明是否要改一下，新的 op 不需要输入的最后一维是 1，而是在 Input 的 shape 后面附加一个 embedding_size 维度吧

iclementine · 2020-08-27T09:19:37Z

python/paddle/nn/functional/input.py

+            .. code-block:: python
+              import paddle.fluid as fluid
+              import numpy as np
+              data = fluid.data(name='x', shape=[None, 1], dtype='int64')


这里的形状也是，不要求最后一维是 1， output 会在 input 的维度后面附加一个 embedding_size

iclementine · 2020-08-27T09:22:17Z

python/paddle/nn/functional/input.py

+def embedding(input, weight, padding_idx=None, is_sparse=False, name=None):
+    """
+        The operator is used to lookup embeddings vector of ids provided by :attr:`input` .
+        It automatically constructs a 2D embedding matrix based on the


这些应该也是要修改，因为这个 functional.embedding 并不创建 weight.

Heeenrrry · 2020-08-27T12:58:06Z

python/paddle/nn/layer/common.py

+        **weight** (Parameter): the learnable weights of this layer.
+
+    Returns:
+        Variable: Embedding Tensor mapped by input. The data type is the same as :attr:`dtype` .


Here is Tensor

iclementine · 2020-08-27T13:48:51Z

python/paddle/nn/functional/input.py

+
+            Case 1:
+            input is a Tensor. padding_idx = -1
+                input.data = [[[1], [3]], [[2], [4]], [[4], [127]]]


这么写的实际形状是 [3, 2, 1]

iclementine · 2020-08-27T13:49:32Z

python/paddle/nn/functional/input.py

+    """
+        The operator is used to lookup embeddings vector of ids provided by :attr:`input` .
+
+        This OP requires the last dimension of Tensor shape must be equal to 1. The shape


这里的行为也是和 append 一个 emb_size 而不是 replacing the last dimension of the input Tensor shape
with emb_size

iclementine · 2020-08-27T13:50:38Z

python/paddle/nn/functional/input.py

+               None by default.
+
+        Returns:
+            Tensor: Embedding Tensor or LoDTensor mapped by input. The data type is the same as :attr:`dtype` .


data type 是和 weight 的 dtype 一致。

iclementine · 2020-08-27T13:51:39Z

python/paddle/nn/functional/input.py

+        Examples:
+            .. code-block:: python
+
+                import paddle.fluid as fluid


建议使用动态图做为示例代码。

iclementine

LGTM
I added some comments, may be left for another document_fix PR?

seiriosPlus · 2020-08-27T13:55:33Z

LGTM
I added some comments, may be left for another document_fix PR?

I will pull another request for document_fix.

iclementine

LGTM

iclementine

LGTM

seiriosPlus · 2020-08-28T05:14:20Z

XiaoguangHu01 · 2020-08-28T06:33:39Z

python/paddle/nn/functional/input.py

+        sparse(bool): The flag indicating whether to use sparse update. This parameter only
+            affects the performance of the backwards gradient update. It is recommended to set
+            True because sparse update is faster. But some optimizers does not support sparse update,
+            such as :ref:`api_fluid_optimizer_AdadeltaOptimizer` , :ref:`api_fluid_optimizer_AdamaxOptimizer` ,


这里后续需要更新为paddle.optimizer下的API文档

XiaoguangHu01 · 2020-08-28T06:43:33Z

python/paddle/nn/functional/input.py

+    The operator is used to lookup embeddings vector of ids provided by :attr:`input` .
+
+    The shape of output Tensor is generated by appending the last dimension of the input Tensor shape
+    with emb_size.


emb_size -> embedding size

XiaoguangHu01 · 2020-08-28T06:45:11Z

python/paddle/nn/functional/input.py

+
+    The shape of output Tensor is generated by appending the last dimension of the input Tensor shape
+    with emb_size.
+    **Note:** The id in :attr:`input` must satisfy :math:`0 =< id < size[0]` ,


size[0] -> weight.shape[0]

XiaoguangHu01 · 2020-08-28T06:46:19Z

python/paddle/nn/functional/input.py

+                padding_idx = -1
+                input.data = [[1, 3], [2, 4], [4, 127]]
+                input.shape = [3, 2]
+                Given size = [128, 16]


input.data -> x.data
input.shape -> x.shape
Given size -> weight.shape

XiaoguangHu01 · 2020-08-28T06:50:41Z

python/paddle/nn/functional/input.py

+            It will pad all-zero data when ids is 127.
+
+    Args:
+        x(Tensor): A Tensor or LoDTensor with type int64, which contains the id information.


去掉LoDTensor

增加支持int32

最后一维不要求是1

size[0] -> weight.size[0]

XiaoguangHu01 · 2020-08-28T07:01:06Z

python/paddle/nn/functional/input.py

+            inp_word.shape  # [2, 3]
+            dict_size = 20
+
+            emb = nn.Embedding(dict_size, 32, weight_attr='emb.w', sparse=False)


这里用embedding的例子

XiaoguangHu01 · 2020-08-28T07:01:53Z

python/paddle/nn/functional/input.py

+        helper = LayerHelper('embedding', **locals())
+        dtype = helper.input_dtype()
+
+        check_variable_and_dtype(x, 'input', ['int64'], 'embedding')


支持下int32

XiaoguangHu01 · 2020-08-28T07:05:10Z

python/paddle/nn/layer/common.py

@@ -15,7 +15,7 @@
 # TODO: define the common classes to build a neural network
 from ...fluid.dygraph import BilinearTensorProduct  #DEFINE_ALIAS
 from ...fluid.dygraph import Pool2D  #DEFINE_ALIAS
-from ...fluid.dygraph import Embedding  #DEFINE_ALIAS
+from ...fluid.dygraph import Linear  #DEFINE_ALIAS


为啥把Embedding改成Linear?

解决冲突，并没有修改这个地方。

XiaoguangHu01 · 2020-08-28T07:11:49Z

python/paddle/nn/layer/common.py

+        sparse(bool): The flag indicating whether to use sparse update. This parameter only
+            affects the performance of the backwards gradient update. It is recommended to set
+            True because sparse update is faster. But some optimizer does not support sparse update,
+            such as :ref:`api_fluid_optimizer_AdadeltaOptimizer` , :ref:`api_fluid_optimizer_AdamaxOptimizer` ,


XiaoguangHu01 · 2020-08-28T07:14:25Z

python/paddle/nn/layer/common.py

+
+          emb = nn.Embedding(dict_size,
+                  32,
+                  weight_attr='emb.w',


emb.w 是啥？

XiaoguangHu01

LGTM

XiaoguangHu01

LGTM

jzhang533 · 2020-09-01T01:55:44Z

python/paddle/nn/functional/input.py

+    The shape of output Tensor is generated by appending the last dimension of the input Tensor shape
+    with embedding size.
+    **Note:** The id in :attr:`input` must satisfy :math:`0 =< id < weight.shape[0]` ,
+    otherwise the program will throw an exception and exit.


jzhang533 · 2020-09-01T02:02:45Z

python/paddle/nn/functional/input.py

+
+        .. code-block:: python
+
+            import paddle


这个示例代码是静态图的吧。
需要改成动态图的示例代码。
直接用paddle.randn创建的tensor作为weight是不是就可以?

jzhang533 · 2020-09-01T02:03:57Z

python/paddle/nn/layer/common.py

+    """
+    :alias_main: paddle.nn.Embedding
+	:alias: paddle.nn.Embedding,paddle.nn.layer.Embedding,paddle.nn.layer.common.Embedding
+	:old_api: paddle.fluid.dygraph.Embedding


不需要这三行alias

jzhang533 · 2020-09-01T02:07:05Z

python/paddle/nn/layer/common.py

+    For specific usage, refer to code examples. It implements the function of the Embedding Layer.
+    This layer is used to lookup embeddings vector of ids provided by :attr:`input` .
+    It automatically constructs a 2D embedding matrix based on the
+    input :attr:`size` (vocab_size, emb_size) and :attr:`dtype` .


:attr:size没有了， attr:dtype 也没有了。

jzhang533 · 2020-09-01T02:10:10Z

python/paddle/nn/layer/common.py

+    The shape of output Tensor is generated by appending an emb_size dimension to the
+    last dimension of the input Tensor shape.
+
+    **Note:** The id in :attr:`input` must satisfy :math:`0 =< id < size[0]` ,


size[0] -> num_embeddings

jzhang533 · 2020-09-01T02:13:53Z

python/paddle/nn/layer/common.py

+            such as :ref:`api_optimizer_AdadeltaOptimizer` , :ref:`api_optimizer_AdamaxOptimizer` ,
+            :ref:`api_optimizer_DecayedAdagradOptimizer` , :ref:`api_optimizer_FtrlOptimizer` ,
+            :ref:`api_optimizer_LambOptimizer` and :ref:`api_optimizer_LarsMomentumOptimizer` .
+            In these case, is_sparse must be False. Default: False.


jzhang533 · 2020-09-01T02:15:11Z

python/paddle/nn/layer/common.py

+            default weight parameter property is used. See usage for details in :ref:`api_fluid_ParamAttr` . In addition,
+            user-defined or pre-trained word vectors can be loaded with the :attr:`param_attr` parameter.
+            The local word vector needs to be transformed into numpy format, and the shape of local word
+            vector should be consistent with :attr:`size` . Then :ref:`api_fluid_initializer_NumpyArrayInitializer`


size没有了

jzhang533 · 2020-09-01T02:15:36Z

python/paddle/nn/layer/common.py

+            user-defined or pre-trained word vectors can be loaded with the :attr:`param_attr` parameter.
+            The local word vector needs to be transformed into numpy format, and the shape of local word
+            vector should be consistent with :attr:`size` . Then :ref:`api_fluid_initializer_NumpyArrayInitializer`
+            is used to load custom or pre-trained word vectors. See code example 2 for details.


where is code example 2?

jzhang533 · 2020-09-01T02:18:04Z

python/paddle/nn/layer/common.py

+          emb = nn.Embedding(
+                    dict_size,
+                    32,
+                    sparse=False)


缺失
对emb的调用。

jzhang533 · 2020-09-01T02:20:24Z

python/paddle/nn/layer/common.py

+
+    Attribute:
+        **weight** (Parameter): the learnable weights of this layer.
+


用

"""
Shape
Input:
Output:
"""
的形式说明一下forward时的输入输出的形状。

jzhang533

will have followup pr to fix docs, after this pr is landed.

phlrain

LGTM for api

phlrain · 2020-09-01T02:44:30Z

paddle/fluid/operators/lookup_table_v2_op.cu

+      const int64_t *ids_p = nullptr;
+
+      if (ids_t->type() == framework::proto::VarType::INT32) {
+        InputTypeCovert<


这种类型转换会影响性能，建议写到kernel中，这个优化可以放在后需的pr中优化

我看看有没有更好的解法，不用强转就可以解决最好

* add embedding 2.0 * add embedding support input int32

add embedding 2.0

8d9ea96

add embedding 2.0

1c2fe3e

Heeenrrry reviewed Aug 27, 2020

View reviewed changes

seiriosPlus added 4 commits August 27, 2020 10:49

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a275b2f

… feature/embedding_2.0

fix reviewer's comments

309b9b1

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d7b4ded

… feature/embedding_2.0

fix fuse

c48e73c

iclementine reviewed Aug 27, 2020

View reviewed changes

seiriosPlus added 5 commits August 27, 2020 19:06

add nn.Embedding

c5a569d

add nn.Embedding

fc528b2

add nn.Embedding

a4194bd

add nn.Embedding

51611c4

add nn.Embedding

1bc093a

Heeenrrry previously approved these changes Aug 27, 2020

View reviewed changes

iclementine reviewed Aug 27, 2020

View reviewed changes

iclementine previously approved these changes Aug 27, 2020

View reviewed changes

merge develop

c2ebb07

seiriosPlus dismissed stale reviews from iclementine and Heeenrrry via c2ebb07 August 28, 2020 01:50

seiriosPlus added 2 commits August 28, 2020 09:51

merge develop

9be1b04

merge develop

0f791fb

iclementine previously approved these changes Aug 28, 2020

View reviewed changes

seiriosPlus dismissed iclementine’s stale review via 0f791fb August 28, 2020 01:54

iclementine previously approved these changes Aug 28, 2020

View reviewed changes

fix UT

cd3dd55

XiaoguangHu01 reviewed Aug 28, 2020

View reviewed changes

seiriosPlus added 6 commits August 28, 2020 19:10

support INT32 input

4d19d63

support INT32 input

924851b

support INT32 input

b41fd74

support INT32 input

1c387bc

fix embedding 2.0 doc

bdb2399

add BugfixWithBehaviorChanged

8cf4b3a

XiaoguangHu01 previously approved these changes Aug 31, 2020

View reviewed changes

add UT

5e3de1c

seiriosPlus dismissed XiaoguangHu01’s stale review via 5e3de1c August 31, 2020 09:45

add UT

ac328cc

XiaoguangHu01 previously approved these changes Aug 31, 2020

View reviewed changes

chalsliu previously approved these changes Aug 31, 2020

View reviewed changes

phlrain self-requested a review August 31, 2020 13:30

add cuda support for int32

569536d

seiriosPlus dismissed stale reviews from chalsliu and XiaoguangHu01 via 569536d August 31, 2020 14:39

XiaoguangHu01 approved these changes Sep 1, 2020

View reviewed changes

jzhang533 reviewed Sep 1, 2020

View reviewed changes

jzhang533 approved these changes Sep 1, 2020

View reviewed changes

Heeenrrry approved these changes Sep 1, 2020

View reviewed changes

phlrain approved these changes Sep 1, 2020

View reviewed changes

chalsliu self-requested a review September 1, 2020 03:25

chalsliu approved these changes Sep 1, 2020

View reviewed changes

seiriosPlus merged commit ebc5f99 into PaddlePaddle:develop Sep 1, 2020

seiriosPlus added a commit to seiriosPlus/Paddle that referenced this pull request Sep 2, 2020

add embedding 2.0 (PaddlePaddle#26649)

0f58ba6

* add embedding 2.0 * add embedding support input int32

seiriosPlus added a commit that referenced this pull request Sep 2, 2020

add embedding 2.0 (#26649) (#26903)

1b60f7f

* add embedding 2.0 * add embedding support input int32

seiriosPlus mentioned this pull request Sep 2, 2020

add paddle 2.0 embedding #26239

Closed

		@@ -0,0 +1,57 @@
		# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.


		Attribute:
		weight (Parameter): the learnable weights of this layer.

add embedding 2.0 #26649

add embedding 2.0 #26649

Conversation

seiriosPlus commented Aug 25, 2020 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Aug 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Heeenrrry Aug 27, 2020 • edited Loading

Choose a reason for hiding this comment

iclementine commented Aug 27, 2020

iclementine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iclementine left a comment

Choose a reason for hiding this comment

seiriosPlus commented Aug 27, 2020

iclementine left a comment

Choose a reason for hiding this comment

iclementine left a comment

Choose a reason for hiding this comment

seiriosPlus commented Aug 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

phlrain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seiriosPlus commented Aug 25, 2020 •

edited

Loading

Heeenrrry Aug 27, 2020 •

edited

Loading