Skip to content
This repository has been archived by the owner on Oct 15, 2019. It is now read-only.

LSTM_Captioning using minpy: memory issue #91

Closed
zhouxu179 opened this issue Dec 3, 2016 · 3 comments
Closed

LSTM_Captioning using minpy: memory issue #91

zhouxu179 opened this issue Dec 3, 2016 · 3 comments

Comments

@zhouxu179
Copy link

I implement an LSTM captioning which can be found in https://github.com/zx0502/cs231-homework-with-MinPy/blob/master/LSTM_Captioning_minpy_ans.ipynb, based on the codes in https://github.com/lryta/cs231n-winter2016-sol-minpy/tree/master/assignment3.
It appears that minpy version is very memory consuming, taking over 7GB (only 8GB available in my PC). In contrast, numpy version costs less than 1.5GB with the same parameters and a batch_size of 512, which is 4 times as large as what I test in the minpy notebook .

A number of warnings appear and the algorithm seems to be crashed, see below

(Iteration 561 / 1560) loss: 18.893697
(Iteration 571 / 1560) loss: 19.122100
(Iteration 581 / 1560) loss: 17.164288
(Iteration 591 / 1560) loss: 19.072623
(Iteration 601 / 1560) loss: 17.494023
(Iteration 611 / 1560) loss: 18.056801
(Iteration 621 / 1560) loss: 18.518261
(Iteration 631 / 1560) loss: 17.263882
(Iteration 641 / 1560) loss: 17.908625
(Iteration 651 / 1560) loss: 17.269566
(Iteration 661 / 1560) loss: 17.401004

/usr/local/lib/python2.7/dist-packages/minpy/array_variants/numpy/numpy_core.py:215: RuntimeWarning: overflow encountered in cosh
prims('tanh').def_grad(lambda ans, x: lambda g: g / np.cosh(x)**2)
/usr/local/lib/python2.7/dist-packages/minpy/primitive.py:136: RuntimeWarning: overflow encountered in exp
result_value = self._func(*arg_values, **kwargs_values)
/usr/local/lib/python2.7/dist-packages/minpy/array_variants/numpy/numpy_core.py:217: RuntimeWarning: invalid value encountered in multiply
prims('exp').def_grad(lambda ans, x: lambda g: ans * g)

(Iteration 671 / 1560) loss: nan
(Iteration 681 / 1560) loss: nan
(Iteration 691 / 1560) loss: nan
(Iteration 701 / 1560) loss: nan
(Iteration 711 / 1560) loss: nan
(Iteration 721 / 1560) loss: nan

@jermainewang
Copy link
Member

The overflow is due to the gradient definition of tanh (g / np.cosh(x) ** 2). It seems that the divisor is too small which leads to numerical overflow. Let me see whether I could implement it in a more stable way. Thanks for the reporting!

For memory issue, @hotpxl could you have a look. I remember there is some weak reference problem in the autograd part before.

@sneakerkg
Copy link
Member

Similar issue detected in the rnn perf test. Please check the code: https://github.com/dmlc/minpy/blob/rnn_perf/examples/nn/rnn_test/rnn_minpy_perf.py (you should copy the file to examples/nn folder to run)

I did some math on how much memory should we use:
Input: 256, Hidden: 2560, Out: 1, seq_len: 30, batch_size: 100

Weights:
Wx = 256 * 2560 * 4 / 1024 / 1024 = 2.5M
b = 2560 * 4 / 1024 / 1024 = 0.01M
Wh = 2560 * 2560 * 4 / 1024 / 1024 = 25M
hb = 0.01M
Wout = 0.01M
Sum = 27.03M

Activation:
Input: 100 * 256 * 30 * 4 / 1024 / 1024 = 0.3M
Activation: 100 * 2560 * 30 * 4 / 1024 / 1024 = 29.3M
Sum: 29.6M

When involving BP, just double the space for error derivative and gradients for the weigts, and if we have momentum, just add another pie of weights. Total memory needed should be about:
29.6 * 2 + 27.03 * 3 = 59.2 + 81.08 = 140M

The minpy example seems run more than 1500M on my device.

@jermainewang
Copy link
Member

This should been solved in #112 #117 .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants