Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error while training #5

Closed
stexandev opened this issue Nov 13, 2014 · 16 comments
Closed

error while training #5

stexandev opened this issue Nov 13, 2014 · 16 comments

Comments

@stexandev
Copy link

After executing (on 156 files of groundtruth text and imagery):
ocropus-rtrain gt/????/*.png -F 10000 -o mub_combined &
I've got the following reproduceable error:

454 150.32 (1486, 48) gt/0001/01000b.bin.png
TRU: u'quod dicitur Fulda, quod est situm in pago Grapfeld, constructum in honore sancti'
ALN: u'quuod dicituur Fuulda, qquod et situumm in pagoo Grapfeld, construuctuuumm in honnore '
OUT: u' iiii ii te ti imm tm e iii eutmut m mi eii '

oops, got FloatingPointError overflow encountered in exp

Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 228, in
pcs = network.trainSequence(line,cs,update=do_update,key=fname)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 863, in trainSequence
self.outputs = array(self.lstm.forward(xs))
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 587, in forward
xs = net.forward(xs)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 636, in forward
outputs = [net.forward(xs) for net in self.nets]
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 545, in forward
self.WIP,self.WFP,self.WOP)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 419, in forward_py
go[t] = ffunc(gox[t])
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 367, in ffunc
return 1.0/(1.0+exp(-x))
FloatingPointError: overflow encountered in exp
Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 232, in
network = ocrolib.load_object(last_save)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 502, in load_object
fname = ocropus_find_file(fname)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 680, in ocropus_find_file
if os.path.exists(fname):
File "/usr/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

another case with half of the files (dir 0001 only):

960 110.63 (1490, 48) gt/0001/010022.bin.png
TRU: u'in honorem\u2074 domini salvatoris Jesu Christi et beate Marie genetricis\u2075 eius episco-'
ALN: u'in honorem~ domini salvatoris Jesu Christi et beate MMarie genetricis eius episco-'
OUT: u'iu bouoreu ouiui salvatoris lesu bristi et beate arie geuetricis eius episoo-'

oops, got FloatingPointError overflow encountered in exp

Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 228, in
pcs = network.trainSequence(line,cs,update=do_update,key=fname)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 863, in trainSequence
self.outputs = array(self.lstm.forward(xs))
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 587, in forward
xs = net.forward(xs)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 636, in forward
outputs = [net.forward(xs) for net in self.nets]
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 619, in forward
return self.net.forward(xs[::-1])[::-1]
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 545, in forward
self.WIP,self.WFP,self.WOP)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 419, in forward_py
go[t] = ffunc(gox[t])
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 367, in ffunc
return 1.0/(1.0+exp(-x))
FloatingPointError: overflow encountered in exp
Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 232, in
network = ocrolib.load_object(last_save)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 502, in load_object
fname = ocropus_find_file(fname)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 680, in ocropus_find_file
if os.path.exists(fname):
File "/usr/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

@tmbdev
Copy link
Collaborator

tmbdev commented Nov 24, 2014

I haven't seen that happen in the past. It's fairly easy to fix with clipping the value, but I'm concerned that your weights are getting large enough to trigger this in the first place. What learning rates and # hidden units are you using?

@stexandev
Copy link
Author

I am sorry but I can't answer your question concerning learning rates and hidden units as I didn't change anything in the source code (except for enhancing the codec) and executed the command as stated above.

@tmbdev
Copy link
Collaborator

tmbdev commented Nov 26, 2014

If you changed the codec, how many output classes are there?

@stexandev
Copy link
Author

I just added a class of superdigits = u"⁰¹²³⁴⁵⁶⁷⁸⁹" and attached it to the default codec.

@tmbdev
Copy link
Collaborator

tmbdev commented Dec 3, 2014

OK, thanks. I'll keep the bug open and will incorporate a workaround (or you can send me a patch; basically, to avoid the overflow just clip the argument to the exp to some reasonable range and test it).

@danvk
Copy link
Contributor

danvk commented Jan 5, 2015

I'm also running into this error. I started to see it increasingly often as I let my model train. The quality of its outputs started decreasing around the same time—my assumption is that the model started diverging, causing both of these problems. I have sample data, command lines and model files if they would helpful.

@hzhangwd
Copy link

hzhangwd commented May 8, 2015

Also running into this issue. I think it has something to do with exploding/vanishing gradient nature of RNN even with LSTM.

@tmbdev
Copy link
Collaborator

tmbdev commented May 12, 2015

I've run a lot of benchmarks now, and generally, the gradients don't explode haphazardly. They explode at high learning rates, but lowering the learning rate reliably makes things work.

QuLogic pushed a commit to QuLogic/ocropy that referenced this issue Jun 6, 2015
@slimanef
Copy link

I used the ocropus-rtrain to train the ocropus with handwritten historical word images. It gave the following error after 13605 iterations.
ocropus-rtrain -o /home/p/models/test /home/p/WordImages/*.jpg

Could you help me on this error?

Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 270, in
line = network.lnorm.normalize(line,cval=amax(line))
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lineest.py", line 59, in normalize
dewarped = self.dewarp(img,cval=cval,dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lineest.py", line 56, in dewarp
dewarped = array(dewarped,dtype=dtype).T
ValueError: setting an array element with a sequence.

and this error also:

/usr/lib/python2.7/dist-packages/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
warnings.warn("Mean of empty slice.", RuntimeWarning)
Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 269, in
network.lnorm.measure(amax(line)-line)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lineest.py", line 43, in measure
self.mad = mean(deltas[line!=0])
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 2716, in mean
out=out, keepdims=keepdims)
File "/usr/lib/python2.7/dist-packages/numpy/core/_methods.py", line 67, in _mean
ret = ret.dtype.type(ret / rcount)
FloatingPointError: invalid value encountered in double_scalars

Thank you

@anupamaray
Copy link

anupamaray commented May 5, 2016

I am using ocropus-rtrain and getting this error at the beginning itself.. I am running it: ocropus-rtrain -o model words/*.bin.png
and the error is

got FloatingPointError divide by zero encountered in double_scalars
Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 286, in
pcs = network.trainSequence(line,cs,update=do_update,key=fname)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 902, in trainSequence
self.error_log.append(self.error**.5/len(cs))
FloatingPointError: divide by zero encountered in double_scalars
Traceback (most recent call last):
File "/usr/local/bin/ocropus-rtrain", line 290, in
network = load_lstm(last_save)
File "/usr/local/bin/ocropus-rtrain", line 176, in load_lstm
network = ocrolib.load_object(last_save)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 503, in load_object
fname = ocropus_find_file(fname)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 682, in ocropus_find_file
if os.path.exists(fname):
File "/usr/lib/python2.7/genericpath.py", line 18, in exists
os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

Can someone help me on this error please.
Thank you

@kba
Copy link
Collaborator

kba commented May 5, 2016

@anupamaray Can you open a new issue for this one? Please give your operating system and put the backtrace in a fenced code block, so it's more readable.

os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found

Are you sure that the pictures are in word/*.bin.png? What does ls word/*.bin.png return?

@ChillarAnand
Copy link
Contributor

Any updates on this issue? I am using all the defaults values except this code.

 ఁంఃఅఆఇఈఉఊఋఌఎఏఐఒఓఔకఖగఘఙచఛజఝఞటఠడఢణతథదధనపఫబభమయరఱలళవశషసహఽాిుూృౄెేౘౙౠౡౢౣ౦౧౨౩౪౫౬౭౮౯

For first 3K samples, everything went fine. After that 4 out of 10 samples are failing because of FloatingPointError.

@maluz
Copy link

maluz commented Apr 17, 2017

I think I have the same issue. Can someone tell me what I should do to avoid this kind of error? I'm only just starting out with ocropy and ocrosis. Thank you!

# oops, got FloatingPointError overflow encountered in exp
Traceback (most recent call last):
  File "/usr/local/bin/ocropus-rtrain", line 289, in <module>
    pcs = network.trainSequence(line,cs,update=do_update,key=fname)
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 900, in trainSequence
    self.outputs = array(self.lstm.forward(xs))
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 612, in forward
    xs = net.forward(xs)
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 668, in forward
    outputs = [net.forward(xs) for net in self.nets]
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 566, in forward
    self.WIP,self.WFP,self.WOP)
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 435, in forward_py
    go[t] = ffunc(gox[t])
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/lstm.py", line 383, in ffunc
    return 1.0/(1.0+exp(-x))
FloatingPointError: overflow encountered in exp
Traceback (most recent call last):
  File "/usr/local/bin/ocropus-rtrain", line 293, in <module>
    network = load_lstm(last_save)
  File "/usr/local/bin/ocropus-rtrain", line 179, in load_lstm
    network = ocrolib.load_object(last_save)
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 436, in load_object
    fname = ocropus_find_file(fname)
  File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 639, in ocropus_find_file
    full = os.path.join(prefix, basename, fname)
  File "/usr/lib/python2.7/posixpath.py", line 75, in join
    if b.startswith('/'):
AttributeError: 'NoneType' object has no attribute 'startswith'
[OCROCIS] [ERROR] Ocropus command failed: ocropus-rtrain --ntrain 30000 --savefreq 1000 --codec ./book/charset.txt --output ./iterations/01/models/model ./training/*/*.bin.png 2>&1

zuphilip added a commit that referenced this issue Apr 18, 2017
This should avoid (hopefully) some possible FloatingPointError overflow errors.

The sigmoid function ffunc is for any x<-20 and x>20 already 0 resp. 1 up to 10^-9
and cutting will therefore not change the function substantially.

This idea is from @tmbdev in #5 (comment)
Implemented first in #49 (comment)
Additional infos from #79 (comment)
@zuphilip
Copy link
Collaborator

I tried to implement a possible fix in #201 for ocropy. Can someone check this out?

I don't have any details for ocrocis but maybe @uvius can help there.

@kba
Copy link
Collaborator

kba commented Dec 11, 2017

Since #201 was merged, can we close this?

@zuphilip
Copy link
Collaborator

Yes, closing this issue, which was resolved by #201.

If you encounter new problems, then please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants