LSTM_Captioning using minpy: memory issue #91

zhouxu179 · 2016-12-03T11:14:44Z

I implement an LSTM captioning which can be found in https://github.com/zx0502/cs231-homework-with-MinPy/blob/master/LSTM_Captioning_minpy_ans.ipynb, based on the codes in https://github.com/lryta/cs231n-winter2016-sol-minpy/tree/master/assignment3.
It appears that minpy version is very memory consuming, taking over 7GB (only 8GB available in my PC). In contrast, numpy version costs less than 1.5GB with the same parameters and a batch_size of 512, which is 4 times as large as what I test in the minpy notebook .

A number of warnings appear and the algorithm seems to be crashed, see below

(Iteration 561 / 1560) loss: 18.893697
(Iteration 571 / 1560) loss: 19.122100
(Iteration 581 / 1560) loss: 17.164288
(Iteration 591 / 1560) loss: 19.072623
(Iteration 601 / 1560) loss: 17.494023
(Iteration 611 / 1560) loss: 18.056801
(Iteration 621 / 1560) loss: 18.518261
(Iteration 631 / 1560) loss: 17.263882
(Iteration 641 / 1560) loss: 17.908625
(Iteration 651 / 1560) loss: 17.269566
(Iteration 661 / 1560) loss: 17.401004

/usr/local/lib/python2.7/dist-packages/minpy/array_variants/numpy/numpy_core.py:215: RuntimeWarning: overflow encountered in cosh
prims('tanh').def_grad(lambda ans, x: lambda g: g / np.cosh(x)**2)
/usr/local/lib/python2.7/dist-packages/minpy/primitive.py:136: RuntimeWarning: overflow encountered in exp
result_value = self._func(*arg_values, **kwargs_values)
/usr/local/lib/python2.7/dist-packages/minpy/array_variants/numpy/numpy_core.py:217: RuntimeWarning: invalid value encountered in multiply
prims('exp').def_grad(lambda ans, x: lambda g: ans * g)

(Iteration 671 / 1560) loss: nan
(Iteration 681 / 1560) loss: nan
(Iteration 691 / 1560) loss: nan
(Iteration 701 / 1560) loss: nan
(Iteration 711 / 1560) loss: nan
(Iteration 721 / 1560) loss: nan

jermainewang · 2016-12-03T15:17:57Z

The overflow is due to the gradient definition of tanh (g / np.cosh(x) ** 2). It seems that the divisor is too small which leads to numerical overflow. Let me see whether I could implement it in a more stable way. Thanks for the reporting!

For memory issue, @hotpxl could you have a look. I remember there is some weak reference problem in the autograd part before.

sneakerkg · 2016-12-11T17:38:32Z

Similar issue detected in the rnn perf test. Please check the code: https://github.com/dmlc/minpy/blob/rnn_perf/examples/nn/rnn_test/rnn_minpy_perf.py (you should copy the file to examples/nn folder to run)

I did some math on how much memory should we use:
Input： 256, Hidden: 2560, Out: 1, seq_len: 30, batch_size: 100

Weights:
Wx = 256 * 2560 * 4 / 1024 / 1024 = 2.5M
b = 2560 * 4 / 1024 / 1024 = 0.01M
Wh = 2560 * 2560 * 4 / 1024 / 1024 = 25M
hb = 0.01M
Wout = 0.01M
Sum = 27.03M

Activation:
Input: 100 * 256 * 30 * 4 / 1024 / 1024 = 0.3M
Activation: 100 * 2560 * 30 * 4 / 1024 / 1024 = 29.3M
Sum: 29.6M

When involving BP, just double the space for error derivative and gradients for the weigts, and if we have momentum, just add another pie of weights. Total memory needed should be about:
29.6 * 2 + 27.03 * 3 = 59.2 + 81.08 = 140M

The minpy example seems run more than 1500M on my device.

jermainewang · 2017-01-10T18:25:05Z

This should been solved in #112 #117 .

jermainewang closed this as completed Jan 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM_Captioning using minpy: memory issue #91

LSTM_Captioning using minpy: memory issue #91

zhouxu179 commented Dec 3, 2016

jermainewang commented Dec 3, 2016

sneakerkg commented Dec 11, 2016

jermainewang commented Jan 10, 2017

LSTM_Captioning using minpy: memory issue #91

LSTM_Captioning using minpy: memory issue #91

Comments

zhouxu179 commented Dec 3, 2016

jermainewang commented Dec 3, 2016

sneakerkg commented Dec 11, 2016

jermainewang commented Jan 10, 2017