Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix cumsum op for API 2.0, optimize performance test=develop #25505

Merged
merged 3 commits into from
Aug 10, 2020

Conversation

LutaoChu
Copy link
Contributor

@LutaoChu LutaoChu commented Jul 13, 2020

PR types

Function optimization, Performance optimization

PR changes

OPs

Describe

1 optimize forward performance
image
2 optimize backward performance

-- 优化前 优化后 结论
反向耗时 313ms 0.185ms 加速1691倍

3 add parameters "dtype", "name"
4 if axis=None, flatten the input
5 full negative indexing for the 'axis' parameter is supported

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

"""
:alias_main: paddle.cumsum
:alias: paddle.cumsum,paddle.tensor.cumsum,paddle.tensor.math.cumsum
:old_api: paddle.fluid.layers.cumsum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除这行

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

TCChenlong
TCChenlong previously approved these changes Jul 16, 2020
Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -22,7 +22,14 @@ class CumOp : public framework::OperatorWithKernel {
using framework::OperatorWithKernel::OperatorWithKernel;

void InferShape(framework::InferShapeContext *ctx) const override {
ctx->SetOutputDim("Out", ctx->GetInputDim("X"));
if (ctx->Attrs().Get<bool>("flatten")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用到了新增属性,这里是否会破坏兼容性?
考虑下1.8模型训练后保存inference_model,然后用2.0进行预测的场景。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑的场景只是1.8保存的静态图模型能不能用2.0静态图预测吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

自测不会破坏兼容性

@@ -37,6 +44,10 @@ class CumsumOpMaker : public framework::OpProtoAndCheckerMaker {
"dimension [default -1].")
.SetDefault(-1)
.EqualGreaterThan(-1);
AddAttr<bool>("flatten",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增属性,是否能保持兼容?
考虑下1.8模型训练后保存inference_model,然后用2.0进行预测的场景。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

自测不会破坏兼容性

// size of the ‘axis’ dimension. Invalid in reverse case because the thrust
// APIs do not support.
if (size == out_dims[axis] && !reverse) {
if (exclusive) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

反向时是否有必要添加CUDA Kernel,以提升速度?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加反向时的CUDA kernel,可加速1691倍

The cumulative sum of the elements along a given axis. The first element of the result is the same of the first element of the input.

Args:
x (Variable): Input of cumsum operator, the Tensor/LoDTensor needed to be cumsumed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable->Tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

name(str, optional): Normally there is no need for user to set this property. For more information, please refer to :ref:`api_guide_Name`. The default value is None.

Returns:
Variable(Tensor/LoDTensor): The result of cumsum operator, output of cumsum operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable->Tensor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensor, the result of cumsum operator, output of cumsum operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@LutaoChu
Copy link
Contributor Author

LutaoChu commented Aug 7, 2020

能把性能对比的 code 发出来么?谢谢~

可以的。需要运行backward=True和backward=False两种case,反向的耗时需要换算一下,把backward的耗时减去前向的耗时就是反向的耗时。

import paddle
from paddle.imperative import to_variable
import numpy as np
import paddle.fluid.dygraph as dg
import paddle.fluid as fluid
import paddle.fluid.layers as F
import paddle.fluid.profiler as profiler


def test_speed(num_epochs=5, axis=0, backward=False):
    num_class = 3
    probas_data = np.random.random((2, 3, 1000, 1000))
    probas_data = probas_data.astype(np.float32)
    labels_data = np.random.randint(0, num_class, [1, 1], dtype='int32')
    
    
    probas_shape = [3, 3, 3]
    labels_shape = list(labels_data.shape)
    probas = fluid.layers.data(name='p', shape=probas_shape, dtype='float32')
    labels = fluid.layers.data(name='l', shape=labels_shape, dtype='int32')
    
    param_attr = fluid.ParamAttr(name='conv2d.weight', initializer=fluid.initializer.ConstantInitializer(value=2.0))
    y_predict = fluid.layers.conv2d(input=probas, num_filters=num_class, filter_size=2, param_attr=param_attr)
    y_predict = paddle.reshape(y_predict, shape=[-1,1])
    y_predict = paddle.cumsum(y_predict, axis=axis)
    

    labels = fluid.layers.cast(labels, dtype='float32')
    cost = fluid.layers.square_error_cost(input=y_predict, label=labels)  
    loss = fluid.layers.mean(cost)  

    if backward:
        sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.1)
        sgd_optimizer.minimize(loss)

    use_cuda = True
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() 
    main_program = fluid.default_main_program()
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())
    for i in range(5):
        exe.run(
            main_program,
            fetch_list= [],
            feed={'p': probas_data, 'l': labels_data, 'm': ignore_data}, 
            return_numpy=True)

    with profiler.profiler('GPU', 'total', '/tmp/profile') as prof:
        for i in range(num_epochs):
            exe.run(
                main_program,
                fetch_list= [],
                feed={'p': probas_data, 'l': labels_data, 'm': ignore_data}, 
                return_numpy=True)
    
# test_speed(num_epochs=300)
test_speed(num_epochs=300, backward=True)

@LutaoChu LutaoChu force-pushed the cumsum_fix branch 2 times, most recently from 9068bc7 to 340c592 Compare August 7, 2020 05:33
@@ -1543,3 +1542,73 @@ def kron(x, y, name=None):
out = helper.create_variable_for_type_inference(dtype=x.dtype)
helper.append_op(type="kron", inputs={"X": x, "Y": y}, outputs={"Out": out})
return out


def cumsum(x, axis=None, dtype=None, name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在fluid.layers.cumsum函数添加@deprecated装饰器,指明跟新API的关系。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已在新的PR中添加@deprecated装饰器 #26104

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

def cumsum(x, axis=None, dtype=None, name=None):
"""
:alias_main: paddle.cumsum
:alias: paddle.cumsum,paddle.tensor.cumsum,paddle.tensor.math.cumsum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias都不用写,可以删除这两行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants