Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine the gradient calculation errors caused by renaming in while_grad #27814

Merged
merged 1 commit into from
Oct 12, 2020

Conversation

gfwm2013
Copy link
Contributor

PR types

Bug fixes

PR changes

OPs

Describe

修复WhileOp反向计算中梯度计算错误的bug。
问题1.目前WhileOp的反向计算,对于既是其input又是output的变量的梯度值计算存在问题,如下面的代码:

import paddle.fluid.layers as layers
import paddle.fluid as fluid
import numpy as np

x = fluid.data(name='x', shape=[1], dtype='float32')
x.stop_gradient = False
i = fluid.data(name='i', shape=[1], dtype='float32')
i.stop_gradient= False
feed_x = np.ones(1).astype('float32')
feed_i = np.ones(1).astype('float32')

def cond(i, x):
    return i < 3

def body(i, x):
    x = x + i
    layers.increment(i)
    return i, x

out = layers.while_loop(cond, body, [i, x])
mean = fluid.layers.mean(x)
fluid.backward.append_backward(mean)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())

res = exe.run(fluid.default_main_program(), feed={'x': feed_x, 'i': feed_i}, fetch_list=[x.grad_name, i.grad_name, x, i])

在之前的逻辑中,最后fetch到的i.grad_name的值为1,x.grad_name的值为 0 ,结果结果是错误的。
在该PR修改之后, fetch到的i.grad_name的值为 2 ,x.grad_name的值为 1 ,计算正确。

问题2. 若反向计算中会使用前向计算中的变量时,目前的WhileOp会一直使用正向计算中最后一次的值

import paddle.fluid.layers as layers
import paddle.fluid as fluid
import numpy as np

x = fluid.data(name='x', shape=[1], dtype='float32')
x.stop_gradient = False
i = fluid.data(name='i', shape=[1], dtype='float32')
i.stop_gradient= False
feed_x = np.ones(1).astype('float32')
feed_i = np.ones(1).astype('float32')

def cond(i, x):
    return i < 3

def body(i, x):
    x = x * i
    layers.increment(i)
    return i, x

out = layers.while_loop(cond, body, [i, x])
mean = fluid.layers.mean(x)
fluid.backward.append_backward(mean)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())

res = exe.run(fluid.default_main_program(), feed={'x': feed_x, 'i': feed_i}, fetch_list=[x.grad_name, i.grad_name, x, i])

在之前的逻辑中,最后fetch到的i.grad_name的值为 6,x.grad_name的值为 0 ,结果结果是错误的。
在该PR修改之后, fetch到的i.grad_name的值为 3 ,x.grad_name的值为 2 ,计算正确。

修改思路:
问题1

  • 分析:当变量既是WhileOp的Output又是其Input时,目前的while_gard逻辑中,在梯度计算时,其会对 inputs 和 params 进行梯度累加,但是按照数学知识可知,Output变量的梯度值不应进行累加的。
  • 解决方法:按照梯度的数学计算规则,输出的梯度在循环过程中不应对其累加,而应该进行累乘,故在梯度计算的sum过程中去除了output的变量,用以确保梯度计算的正确。

问题2:

  • 分析:反向梯度计算中使用前向计算的变量,因目前的逻辑中,输入变量在前向计算过程中只会在主scope中保存,在sub_scope中并未保存,所以当反向计算需使用前向计算的变量时,其仅能拿到主scope中保存的变量(因只保存一份,所以其值为正向运算中最后一次循环后的值),也因为在scope中只有一份的缘故,反向计算过程的循环过程中拿到的变量都是相同的。这就导致梯度计算错误。
  • 解决方法:上述问题出现的原因就是变量只保存了一份而导致的。为了避免这个问题,所以在前向计算的每次循环开始前的输入变量值保存至对应的sub_scope中。但是这个地方会涉及到 in_place 操作与非in_place 操作的不同,所以在保存变量之前,特地将涉及in_place 操作的变量剔除,以保证梯度计算的正确性。

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gfwm2013 gfwm2013 merged commit 2e1bca9 into PaddlePaddle:develop Oct 12, 2020
@gfwm2013 gfwm2013 deleted the refine_while_op_error branch October 12, 2020 11:59
chen-zhiyu pushed a commit to chen-zhiyu/Paddle that referenced this pull request Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants