Training transition model is too resource intensive, uses too much memory. Possible bug #27

kamal94 · 2016-09-24T17:58:25Z

After training the autoencode, i try to train the transition model as described by the same document.

using

./server.py --time 60 --batch 64

and

./train_generative_model.py transition --batch 64 --name transition

on two different tmux sessions.

Soon (a minute) after running the training command, the process is killed because my memory and swap (16 + 10 GB) are used up, and I'm still on epoch one.

Here is a dump:

/train_generative_model.py transition --batch 64 --name transition                                                                                                  [0/0]
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7085
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.58GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
T.shape:  (64, 14, 512)
Transition variables:
transition/dreamyrnn_1_W:0
transition/dreamyrnn_1_U:0
transition/dreamyrnn_1_b:0
transition/dreamyrnn_1_V:0
transition/dreamyrnn_1_ext_b:0
Epoch 1/200
Killed

The text was updated successfully, but these errors were encountered:

EderSantana · 2016-09-24T19:01:06Z

it is super resource intensive yes. I saw elsewhere that Keras does a lot of memory leaks. I used to have a tensorflow only implementation that seemed lighter. But it was less convenient, that was why I opted for Keras in the release.

sunny1986 · 2016-11-20T23:16:22Z

@kamal94 : Were you able to resolve that issue? I am having the same problem and my train fails sometimes on epoch 1/200 or 2/200 and never goes beyond that. Any suggestions??

zhaohuaqing1993 · 2017-03-01T12:15:42Z

how do you train the train_generative_model.py autoencoder successfully ,i meet some difficuty , have to doing somehting in code?

pandamax · 2017-05-17T02:31:58Z

Have you solved this issue? I am having the same problem and my train fails sometimes on epoch 10/200 or 40/200 and never goes beyond that. Any suggestions?
Traceback (most recent call last):
File "./train_generative_model.py", line 168, in
nb_epoch=args.epoch, verbose=1, saver=saver
File "./train_generative_model.py", line 84, in train_model
z, x = next(generator)
File "./train_generative_model.py", line 31, in gen
X = cleanup(tup)
File "/home/deep-learning/research-master/models/transition.py", line 34, in cleanup
X = X/127.5 - 1.
MemoryError

sunny1986 mentioned this issue Nov 20, 2016

ValueError: Filter must not be larger than the input: Filter: (8, 8) Input: (3, 160) #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training transition model is too resource intensive, uses too much memory. Possible bug #27

Training transition model is too resource intensive, uses too much memory. Possible bug #27

kamal94 commented Sep 24, 2016

EderSantana commented Sep 24, 2016

sunny1986 commented Nov 20, 2016

zhaohuaqing1993 commented Mar 1, 2017

pandamax commented May 17, 2017 •

edited

Loading

Training transition model is too resource intensive, uses too much memory. Possible bug #27

Training transition model is too resource intensive, uses too much memory. Possible bug #27

Comments

kamal94 commented Sep 24, 2016

EderSantana commented Sep 24, 2016

sunny1986 commented Nov 20, 2016

zhaohuaqing1993 commented Mar 1, 2017

pandamax commented May 17, 2017 • edited Loading

pandamax commented May 17, 2017 •

edited

Loading