when test "python synthesize.py --model='WaveNet' --GTA='False'", get OOM when allocating tensor with shape[17,80,1,470250] #235

sysuzyx · 2018-10-08T03:42:53Z

Hi~
When I train WaveNet, I don't get OOM error. But when I test WaveNet with same hparams, I get OOM error. The process to reproduce is python synthesize.py --model='WaveNet' --GTA='False'.

Even I set wavenet_batch_size to 2, this error can't be resolved.

Besides, I use one K80 GPU and no other programs are running on this GPU.
I use the code version 14 days ago, not the newest one.

The full log is as below. Anyone can help me?

/data/yxzou/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36:

FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.

0%| | 0/1 [00:00<?, ?it/s][1][470250]
[2][470250]
[3][470250]
[4][470250]
[5][470250]
[6][470250]
[7][470250]
[8][470250]
[9][470250]
[10][470250]
[11][470250]
[12][470250]
[13][470250]
[14][470250]
[15][470250]
[16][470250]
[17][470250]
[18][470250]
[19][470250]
[20][470250]
[21][470250]
[22][470250]
[23][470250]
[24][470250]
[25][470250]
[26][470250]
[27][470250]
[28][470250]
[29][470250]
[30][470250]
[31][470250]
[32][470250]
loaded model at exp/logs-WaveNet/wave_pretrained/wavenet_model.ckpt-395000
Hyperparameters:
allow_clipping_in_normalization: True
attention_dim: 128
attention_filters: 32
attention_kernel: (31,)
cbhg_conv_channels: 128
cbhg_highway_units: 128
cbhg_highwaynet_layers: 4
cbhg_kernels: 8
cbhg_pool_size: 2
cbhg_projection: 256
cbhg_projection_kernel_size: 3
cbhg_rnn_units: 128
cin_channels: 80
cleaners: english_cleaners
clip_mels_length: True
cross_entropy_pos_weight: 1
cumulative_weights: True
decoder_layers: 2
decoder_lstm_units: 1024
embedding_dim: 512
enc_conv_channels: 512
enc_conv_kernel_size: (5,)
enc_conv_num_layers: 3
encoder_lstm_units: 256
fmax: 7600
fmin: 95
frame_shift_ms: None
freq_axis_kernel_size: 3
gate_channels: 256
gin_channels: -1
griffin_lim_iters: 60
hop_size: 275
input_type: mulaw-quantize
kernel_size: 3
layers: 20
leaky_alpha: 0.4
log_scale_min: -32.23619130191664
log_scale_min_gauss: -7.0
mask_decoder: False
mask_encoder: False
max_abs_value: 4.0
max_iters: 2000
max_mel_frames: 1000
max_time_sec: None
max_time_steps: 11000
min_level_db: -100
n_fft: 2048
n_speakers: 5
natural_eval: False
normalize_for_wavenet: True
num_freq: 1025
num_mels: 80
out_channels: 256
outputs_per_step: 3
postnet_channels: 512
postnet_kernel_size: (5,)
postnet_num_layers: 5
power: 1.5
predict_linear: True
preemphasis: 0.97
preemphasize: True
prenet_layers: [256, 256]
quantize_channels: 256
ref_level_db: 20
rescale: True
rescaling_max: 0.999
residual_channels: 128
sample_rate: 22050
signal_normalization: True
silence_threshold: 2
skip_out_channels: 128
smoothing: False
stacks: 2
stop_at_any: True
symmetric_mels: True
tacotron_adam_beta1: 0.9
tacotron_adam_beta2: 0.999
tacotron_adam_epsilon: 1e-06
tacotron_batch_size: 32
tacotron_clip_gradients: True
tacotron_data_random_state: 1234
tacotron_decay_learning_rate: True
tacotron_decay_rate: 0.4
tacotron_decay_steps: 50000
tacotron_dropout_rate: 0.5
tacotron_final_learning_rate: 1e-05
tacotron_initial_learning_rate: 0.001
tacotron_random_seed: 5339
tacotron_reg_weight: 1e-06
tacotron_scale_regularization: False
tacotron_start_decay: 50000
tacotron_swap_with_cpu: False
tacotron_synthesis_batch_size: 512
tacotron_teacher_forcing_decay_alpha: 0.0
tacotron_teacher_forcing_decay_steps: 280000
tacotron_teacher_forcing_final_ratio: 0.0
tacotron_teacher_forcing_init_ratio: 1.0
tacotron_teacher_forcing_mode: scheduled
tacotron_teacher_forcing_ratio: 1.0
tacotron_teacher_forcing_start_decay: 10000
tacotron_test_batches: 41
tacotron_test_size: None
tacotron_zoneout_rate: 0.1
train_with_GTA: True
trim_fft_size: 512
trim_hop_size: 128
trim_silence: True
trim_top_db: 23
upsample_activation: LeakyRelu
upsample_conditional_features: True
upsample_scales: [5, 5, 11]
upsample_type: 1D
use_bias: True
use_lws: False
use_speaker_embedding: True
wavenet_adam_beta1: 0.9
wavenet_adam_beta2: 0.999
wavenet_adam_epsilon: 1e-08
wavenet_batch_size: 8
wavenet_clip_gradients: False
wavenet_data_random_state: 1234
wavenet_decay_rate: 0.5
wavenet_decay_steps: 300000
wavenet_dropout: 0.05
wavenet_ema_decay: 0.9999
wavenet_init_scale: 1.0
wavenet_learning_rate: 0.0001
wavenet_lr_schedule: exponential
wavenet_random_seed: 5339
wavenet_swap_with_cpu: False
wavenet_synthesis_batch_size: 20
wavenet_test_batches: None
wavenet_test_size: 0.0441
wavenet_warmup: 4000.0
wavenet_weight_normalization: False
win_size: 1100
Constructing model: WaveNet
Initializing Wavenet model. Dimensions (? = dynamic shape):
Train mode: False
Eval mode: False
Synthesis mode: True
local_condition: (?, 80, ?)
outputs: (?, ?)
Receptive Field: (4093 samples / 185.6 ms)
WaveNet Parameters: 3.398 Million.
Loading checkpoint: exp/logs-WaveNet/wave_pretrained/wavenet_model.ckpt-395000
Starting synthesis! (this will take a while..)
Traceback (most recent call last):
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[17,80,1,470250] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 11], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](WaveNet_model/inference/conv_transpose1d_2/stack, WaveNet_model/inference/conv_transpose1d_2/kernel/read, WaveNet_model/inference/upsample_leaky_relu_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: WaveNet_model/inference/strided_slice_3/_373 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_978_WaveNet_model/inference/strided_slice_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "synthesize.py", line 102, in
main()
File "synthesize.py", line 94, in main
wavenet_synthesize(args, hparams, wave_checkpoint)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize
run_synthesis(args, checkpoint_path, output_dir, hparams)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 55, in run_synthesis
audio_files = synth.synthesize(mel_spectros, speaker_id_batch, basenames, wav_dir, log_dir)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesizer.py", line 71, in synthesize
generated_wavs = self.session.run(self.model.y_hat, feed_dict=feed_dict)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[17,80,1,470250] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 11], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](WaveNet_model/inference/conv_transpose1d_2/stack, WaveNet_model/inference/conv_transpose1d_2/kernel/read, WaveNet_model/inference/upsample_leaky_relu_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: WaveNet_model/inference/strided_slice_3/_373 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_978_WaveNet_model/inference/strided_slice_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose', defined at:
File "synthesize.py", line 102, in
main()
File "synthesize.py", line 94, in main
wavenet_synthesize(args, hparams, wave_checkpoint)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize
run_synthesis(args, checkpoint_path, output_dir, hparams)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 19, in run_synthesis
synth.load(checkpoint_path, hparams)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesizer.py", line 26, in load
input_lengths=None, synthesis_length=self.synthesis_length)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/models/wavenet.py", line 384, in initialize
softmax=False, quantize=True, log_scale_min=hparams.log_scale_min, log_scale_min_gauss=hparams.log_scale_min_gauss)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/models/wavenet.py", line 641, in incremental
c = upsample_conv(c)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 362, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 736, in call
outputs = self.call(inputs, *args, **kwargs)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/models/modules.py", line 519, in call
return super(ConvTranspose1D, self).call(inputs)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 781, in call
data_format=conv_utils.convert_data_format(self.data_format, ndim=4))
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1254, in conv2d_transpose
name=name)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1228, in conv2d_backprop_input
dilations=dilations, name=name)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[17,80,1,470250] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 11], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](WaveNet_model/inference/conv_transpose1d_2/stack, WaveNet_model/inference/conv_transpose1d_2/kernel/read, WaveNet_model/inference/upsample_leaky_relu_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: WaveNet_model/inference/strided_slice_3/_373 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_978_WaveNet_model/inference/strided_slice_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Thank you for all your help!

The text was updated successfully, but these errors were encountered:

sysuzyx · 2018-10-12T06:13:49Z

I solved it by setting wavenet_synthesis_batch_size from 20 to 4.

zshy1205 · 2018-10-12T12:37:58Z

Hi, Can you try the input_type='raw'.
I train the WaveNet with this and get the loss is negative.
Can you know something about this?

sysuzyx · 2018-10-17T12:47:09Z

@zshy1205 see #220

sysuzyx closed this as completed Nov 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when test "python synthesize.py --model='WaveNet' --GTA='False'", get OOM when allocating tensor with shape[17,80,1,470250] #235

when test "python synthesize.py --model='WaveNet' --GTA='False'", get OOM when allocating tensor with shape[17,80,1,470250] #235

sysuzyx commented Oct 8, 2018

sysuzyx commented Oct 12, 2018

zshy1205 commented Oct 12, 2018

sysuzyx commented Oct 17, 2018

when test "python synthesize.py --model='WaveNet' --GTA='False'", get OOM when allocating tensor with shape[17,80,1,470250] #235

when test "python synthesize.py --model='WaveNet' --GTA='False'", get OOM when allocating tensor with shape[17,80,1,470250] #235

Comments

sysuzyx commented Oct 8, 2018

sysuzyx commented Oct 12, 2018

zshy1205 commented Oct 12, 2018

sysuzyx commented Oct 17, 2018