Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 #17872

zixuanweeei · 2020-03-19T07:41:44Z

Description

Patch for the issue #17818. The rnn operator produces zero gradients for bias when num_layers > 1. It is caused by a mistake in calculating the shift of bias pointer, where we used the size of fusion bias (i2h_bias + h2h_bias) but MXNet gives twice (i2h_bias, h2h_bias) as many as the fusion size.

Checklist

Changes

Use the correct shift of bias pointer.
Change the way of sharing the same values of parameters of the fused RNN layers and the stacked one.
Add check for RNN output states.

@ciyongch @pengzhao-intel @TaoLv

ciyongch · 2020-03-25T02:08:05Z

There's still some failure in fusedlstm tests. Please take a check.

stu1130 · 2020-04-07T17:18:42Z

@zixuanweeei Thanks for your contribution, could you also cherry-pick the commit to 1.7? DJL LSTM model depends on this commit. Thanks!

pengzhao-intel · 2020-04-08T00:53:22Z

@zixuanweeei Thanks for your contribution, could you also cherry-pick the commit to 1.7? DJL LSTM model depends on this commit. Thanks!

Sure, please add this requirement in 1.7 roadmap #16864

* Use nd.copy() to initialize parameters of new operator * Add check for output states

zixuanweeei · 2020-04-11T07:28:38Z

CI passed. Please take a review. Thanks. @ciyongch @TaoLv @pengzhao-intel

Besides, we will backport this patch into v1.7 branch as well @stu1130.

…che#17872) * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 * Use nd.copy() to initialize parameters of new operator * Add check for output states * Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size * Split fused rnn layer test into tests of individual mode * Skip lstm and gru tests on CPU context without DNNL

* Support projection feature for LSTM on CPU (Only Inference) (#17702) * Support projection feature for LSTM on CPU * test solution for -Werror=maybe-uninitialized * Check device type when create state * Document the projection feature of LSTM for RNN operator * Minor fix * Re-run CI * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 (#17872) * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 * Use nd.copy() to initialize parameters of new operator * Add check for output states * Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size * Split fused rnn layer test into tests of individual mode * Skip lstm and gru tests on CPU context without DNNL

…18038) * Support projection feature for LSTM on CPU (Only Inference) (apache#17702) * Support projection feature for LSTM on CPU * test solution for -Werror=maybe-uninitialized * Check device type when create state * Document the projection feature of LSTM for RNN operator * Minor fix * Re-run CI * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 (apache#17872) * Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 * Use nd.copy() to initialize parameters of new operator * Add check for output states * Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size * Split fused rnn layer test into tests of individual mode * Skip lstm and gru tests on CPU context without DNNL

zixuanweeei requested a review from szha as a code owner March 19, 2020 07:41

pengzhao-intel added the MKLDNN label Apr 8, 2020

pengzhao-intel added this to In progress in CPU Performance and Quantization via automation Apr 8, 2020

zixuanweeei mentioned this pull request Apr 8, 2020

[MKLDNN] Support projection feature of LSTM #17996

Merged

6 tasks

stu1130 mentioned this pull request Apr 8, 2020

[Discussion] 1.7.0 Roadmap #16864

Open

zixuanweeei added 4 commits April 11, 2020 12:16

Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1

97722d6

* Use nd.copy() to initialize parameters of new operator * Add check for output states

Initialize i2h/h2h_weights with zeros for rnn_relu/tanh, and reduce size

ea01d3f

Split fused rnn layer test into tests of individual mode

fc64b5b

Skip lstm and gru tests on CPU context without DNNL

40c57f3

zixuanweeei force-pushed the rnn/gradients branch from b34e7a5 to 40c57f3 Compare April 11, 2020 04:26

TaoLv approved these changes Apr 11, 2020

View reviewed changes

CPU Performance and Quantization automation moved this from In progress to Reviewer approved Apr 11, 2020

pengzhao-intel merged commit 7dd7e7e into apache:master Apr 12, 2020

CPU Performance and Quantization automation moved this from Reviewer approved to Done Apr 12, 2020

zixuanweeei mentioned this pull request Apr 13, 2020

[v1.x] Backport #17702 and #17872 to v1.x branch #18038

Merged

stu1130 mentioned this pull request Apr 15, 2020

[v1.7] Backport #17702 and #17872 to v1.7 branch (#18038) #18070

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 #17872

Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 #17872

zixuanweeei commented Mar 19, 2020

ciyongch commented Mar 25, 2020

stu1130 commented Apr 7, 2020

pengzhao-intel commented Apr 8, 2020

zixuanweeei commented Apr 11, 2020

Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 #17872

Fix issue of zeros gradients w.r.t. RNN bias when num_layers > 1 #17872

Conversation

zixuanweeei commented Mar 19, 2020

Description

Checklist

Changes

ciyongch commented Mar 25, 2020

stu1130 commented Apr 7, 2020

pengzhao-intel commented Apr 8, 2020

zixuanweeei commented Apr 11, 2020