A temporary guide to stablize models' training #2262

kuke · 2017-05-25T02:15:32Z

Currently, the convergence of many configurations in v2 APIs has not been validated. One major chanllenge comes from that the training process is always interrupted by some fatal error related to float point exception in the very begining. And we have concluded that this type of erros results from the overflow due to gradient explosion in backpropogation.

A threshold for clipping gradient of parameters can be enabled to suppress the gradient explosion. But gradient clipping seems not enough to defeat the float point exception. Instead, one necessary and more effective way is to clip error in the crucial position with proper threshold. error is the gradient of cost function with respect to the output of each layer, which propagates backward by following chain rule. In PaddlePaddle, the threshold for error clipping can be set via args layer_attr layer by layer:

layer_attr=paddle.attr.ExtraAttr(error_clipping_threshold=10.0)

As long as error is clipped after the layers sensitive to numerical instability, the float point exception can be avoided expectedly. Take the seq2seq demo adapted from machine translation in PaddleBook for example:

Enable the log of error clipping by setting log_error_clipping=True and log_clipping=True in paddle.init();
Set the threshold for error clipping in the input mixed layer of decoder, after simple_attention() which is sensitive because of the softmax computation.

Then begin to train the model. At first, a lot of error clipping information appears

Pass 0, Batch 50, Cost 237.458789, {'classification_error_evaluator': 0.9572649598121643}
I0525 00:46:05.145277 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=150.321 avg error=6.00274
I0525 00:46:05.186667 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=1453.23 avg error=57.54
I0525 00:46:05.237247 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=3381.55 avg error=143.592
I0525 00:46:05.304813 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=5546.9 avg error=235.026
I0525 00:46:05.364843 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=7751.46 avg error=330.331
I0525 00:46:05.423358 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=9669.27 avg error=351.612
I0525 00:46:05.490279 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=11335.1 avg error=474.481
I0525 00:46:05.529522 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=12848.3 avg error=594.606
I0525 00:46:05.577287 2043461632 Layer.cpp:363]  layer=input_recurrent@decoder_group need clipping, max error=14418.5 avg error=709.603

Take it easy

Usually after serveral hundreds of batches, the training would turn to be stable. As the error becomes smaller, the threshold barely acts again.

Pass 0, Batch 340, Cost 163.044409, {'classification_error_evaluator': 0.9482758641242981}
.........
Pass 0, Batch 350, Cost 146.991077, {'classification_error_evaluator': 0.9910714030265808}
.........
Pass 0, Batch 360, Cost 167.668896, {'classification_error_evaluator': 0.93388432264328}
.........
Pass 0, Batch 370, Cost 180.562292, {'classification_error_evaluator': 0.9770992398262024}
.........
Pass 0, Batch 380, Cost 211.419653, {'classification_error_evaluator': 0.9756097793579102}
.........
Pass 0, Batch 390, Cost 155.465637, {'classification_error_evaluator': 0.9576271176338196}
.........
Pass 0, Batch 400, Cost 157.087720, {'classification_error_evaluator': 0.9473684430122375}

Otherwise, a larger threshold or other hyper parameters with proper value may be needed.

Enjoy!

The text was updated successfully, but these errors were encountered:

kuke · 2017-09-29T09:37:08Z

This guide needs to be reedited for the evolution of Paddle

lcy-seso · 2018-01-29T02:30:57Z

此issue相关内容已经添加进F&Q，遂close此issue。

* fix doc link, test=dygraph, test=document

lcy-seso mentioned this issue May 25, 2017

Floating point exception #1961

Closed

kuke added the documentation label May 25, 2017

kuke mentioned this issue May 25, 2017

modify seq2seq demo to show gradient/error clipping #2254

Merged

lcy-seso mentioned this issue Jun 22, 2017

Cost going to NaN with Paddle v0.10.0 for MT example #2563

Closed

yinyunfeng mentioned this issue Sep 7, 2017

cost goes to NAN in training of the seq2seq model #3944

Closed

typhoonzero mentioned this issue Sep 8, 2017

SIGFPE错误的解决办法 #3960

Closed

lcy-seso added this to TODO LIST in V2 API documentation refine Sep 12, 2017

lcy-seso mentioned this issue Sep 18, 2017

add some frequently asked questions into F&Q #4024

Closed

6 tasks

lcy-seso moved this from TODO LIST to DONE in V2 API documentation refine Sep 26, 2017

lcy-seso closed this as completed Jan 29, 2018

chengduoZH mentioned this issue Mar 5, 2018

Floating point exception (core dumped) #8702

Closed

jiweibo mentioned this issue Sep 1, 2019

Floating point exception (core dumped) 求关注! PaddlePaddle/models#3154

Open

heavengate pushed a commit to heavengate/Paddle that referenced this issue Aug 16, 2021

[doc] fix doc link (PaddlePaddle#2262)

84e79e8

* fix doc link, test=dygraph, test=document

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A temporary guide to stablize models' training #2262

A temporary guide to stablize models' training #2262

kuke commented May 25, 2017 •

edited

Loading

kuke commented Sep 29, 2017 •

edited

Loading

lcy-seso commented Jan 29, 2018

A temporary guide to stablize models' training #2262

A temporary guide to stablize models' training #2262

Comments

kuke commented May 25, 2017 • edited Loading

kuke commented Sep 29, 2017 • edited Loading

lcy-seso commented Jan 29, 2018

kuke commented May 25, 2017 •

edited

Loading

kuke commented Sep 29, 2017 •

edited

Loading