Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when test "python synthesize.py --model='WaveNet' --GTA='False'", get OOM when allocating tensor with shape[17,80,1,470250] #235

Closed
sysuzyx opened this issue Oct 8, 2018 · 3 comments

Comments

@sysuzyx
Copy link

sysuzyx commented Oct 8, 2018

Hi~
When I train WaveNet, I don't get OOM error. But when I test WaveNet with same hparams, I get OOM error. The process to reproduce is python synthesize.py --model='WaveNet' --GTA='False'.

Even I set wavenet_batch_size to 2, this error can't be resolved.

Besides, I use one K80 GPU and no other programs are running on this GPU.
I use the code version 14 days ago, not the newest one.

The full log is as below. Anyone can help me?

/data/yxzou/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36:

FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.

0%| | 0/1 [00:00<?, ?it/s][1][470250]
[2][470250]
[3][470250]
[4][470250]
[5][470250]
[6][470250]
[7][470250]
[8][470250]
[9][470250]
[10][470250]
[11][470250]
[12][470250]
[13][470250]
[14][470250]
[15][470250]
[16][470250]
[17][470250]
[18][470250]
[19][470250]
[20][470250]
[21][470250]
[22][470250]
[23][470250]
[24][470250]
[25][470250]
[26][470250]
[27][470250]
[28][470250]
[29][470250]
[30][470250]
[31][470250]
[32][470250]
loaded model at exp/logs-WaveNet/wave_pretrained/wavenet_model.ckpt-395000
Hyperparameters:
allow_clipping_in_normalization: True
attention_dim: 128
attention_filters: 32
attention_kernel: (31,)
cbhg_conv_channels: 128
cbhg_highway_units: 128
cbhg_highwaynet_layers: 4
cbhg_kernels: 8
cbhg_pool_size: 2
cbhg_projection: 256
cbhg_projection_kernel_size: 3
cbhg_rnn_units: 128
cin_channels: 80
cleaners: english_cleaners
clip_mels_length: True
cross_entropy_pos_weight: 1
cumulative_weights: True
decoder_layers: 2
decoder_lstm_units: 1024
embedding_dim: 512
enc_conv_channels: 512
enc_conv_kernel_size: (5,)
enc_conv_num_layers: 3
encoder_lstm_units: 256
fmax: 7600
fmin: 95
frame_shift_ms: None
freq_axis_kernel_size: 3
gate_channels: 256
gin_channels: -1
griffin_lim_iters: 60
hop_size: 275
input_type: mulaw-quantize
kernel_size: 3
layers: 20
leaky_alpha: 0.4
log_scale_min: -32.23619130191664
log_scale_min_gauss: -7.0
mask_decoder: False
mask_encoder: False
max_abs_value: 4.0
max_iters: 2000
max_mel_frames: 1000
max_time_sec: None
max_time_steps: 11000
min_level_db: -100
n_fft: 2048
n_speakers: 5
natural_eval: False
normalize_for_wavenet: True
num_freq: 1025
num_mels: 80
out_channels: 256
outputs_per_step: 3
postnet_channels: 512
postnet_kernel_size: (5,)
postnet_num_layers: 5
power: 1.5
predict_linear: True
preemphasis: 0.97
preemphasize: True
prenet_layers: [256, 256]
quantize_channels: 256
ref_level_db: 20
rescale: True
rescaling_max: 0.999
residual_channels: 128
sample_rate: 22050
signal_normalization: True
silence_threshold: 2
skip_out_channels: 128
smoothing: False
stacks: 2
stop_at_any: True
symmetric_mels: True
tacotron_adam_beta1: 0.9
tacotron_adam_beta2: 0.999
tacotron_adam_epsilon: 1e-06
tacotron_batch_size: 32
tacotron_clip_gradients: True
tacotron_data_random_state: 1234
tacotron_decay_learning_rate: True
tacotron_decay_rate: 0.4
tacotron_decay_steps: 50000
tacotron_dropout_rate: 0.5
tacotron_final_learning_rate: 1e-05
tacotron_initial_learning_rate: 0.001
tacotron_random_seed: 5339
tacotron_reg_weight: 1e-06
tacotron_scale_regularization: False
tacotron_start_decay: 50000
tacotron_swap_with_cpu: False
tacotron_synthesis_batch_size: 512
tacotron_teacher_forcing_decay_alpha: 0.0
tacotron_teacher_forcing_decay_steps: 280000
tacotron_teacher_forcing_final_ratio: 0.0
tacotron_teacher_forcing_init_ratio: 1.0
tacotron_teacher_forcing_mode: scheduled
tacotron_teacher_forcing_ratio: 1.0
tacotron_teacher_forcing_start_decay: 10000
tacotron_test_batches: 41
tacotron_test_size: None
tacotron_zoneout_rate: 0.1
train_with_GTA: True
trim_fft_size: 512
trim_hop_size: 128
trim_silence: True
trim_top_db: 23
upsample_activation: LeakyRelu
upsample_conditional_features: True
upsample_scales: [5, 5, 11]
upsample_type: 1D
use_bias: True
use_lws: False
use_speaker_embedding: True
wavenet_adam_beta1: 0.9
wavenet_adam_beta2: 0.999
wavenet_adam_epsilon: 1e-08
wavenet_batch_size: 8
wavenet_clip_gradients: False
wavenet_data_random_state: 1234
wavenet_decay_rate: 0.5
wavenet_decay_steps: 300000
wavenet_dropout: 0.05
wavenet_ema_decay: 0.9999
wavenet_init_scale: 1.0
wavenet_learning_rate: 0.0001
wavenet_lr_schedule: exponential
wavenet_random_seed: 5339
wavenet_swap_with_cpu: False
wavenet_synthesis_batch_size: 20
wavenet_test_batches: None
wavenet_test_size: 0.0441
wavenet_warmup: 4000.0
wavenet_weight_normalization: False
win_size: 1100
Constructing model: WaveNet
Initializing Wavenet model. Dimensions (? = dynamic shape):
Train mode: False
Eval mode: False
Synthesis mode: True
local_condition: (?, 80, ?)
outputs: (?, ?)
Receptive Field: (4093 samples / 185.6 ms)
WaveNet Parameters: 3.398 Million.
Loading checkpoint: exp/logs-WaveNet/wave_pretrained/wavenet_model.ckpt-395000
Starting synthesis! (this will take a while..)
Traceback (most recent call last):
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[17,80,1,470250] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 11], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](WaveNet_model/inference/conv_transpose1d_2/stack, WaveNet_model/inference/conv_transpose1d_2/kernel/read, WaveNet_model/inference/upsample_leaky_relu_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: WaveNet_model/inference/strided_slice_3/_373 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_978_WaveNet_model/inference/strided_slice_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "synthesize.py", line 102, in
main()
File "synthesize.py", line 94, in main
wavenet_synthesize(args, hparams, wave_checkpoint)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize
run_synthesis(args, checkpoint_path, output_dir, hparams)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 55, in run_synthesis
audio_files = synth.synthesize(mel_spectros, speaker_id_batch, basenames, wav_dir, log_dir)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesizer.py", line 71, in synthesize
generated_wavs = self.session.run(self.model.y_hat, feed_dict=feed_dict)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[17,80,1,470250] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 11], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](WaveNet_model/inference/conv_transpose1d_2/stack, WaveNet_model/inference/conv_transpose1d_2/kernel/read, WaveNet_model/inference/upsample_leaky_relu_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: WaveNet_model/inference/strided_slice_3/_373 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_978_WaveNet_model/inference/strided_slice_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose', defined at:
File "synthesize.py", line 102, in
main()
File "synthesize.py", line 94, in main
wavenet_synthesize(args, hparams, wave_checkpoint)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize
run_synthesis(args, checkpoint_path, output_dir, hparams)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesize.py", line 19, in run_synthesis
synth.load(checkpoint_path, hparams)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/synthesizer.py", line 26, in load
input_lengths=None, synthesis_length=self.synthesis_length)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/models/wavenet.py", line 384, in initialize
softmax=False, quantize=True, log_scale_min=hparams.log_scale_min, log_scale_min_gauss=hparams.log_scale_min_gauss)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/models/wavenet.py", line 641, in incremental
c = upsample_conv(c)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 362, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 736, in call
outputs = self.call(inputs, *args, **kwargs)
File "/data/yxzou/Tacotron-2-master-9.18/wavenet_vocoder/models/modules.py", line 519, in call
return super(ConvTranspose1D, self).call(inputs)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 781, in call
data_format=conv_utils.convert_data_format(self.data_format, ndim=4))
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1254, in conv2d_transpose
name=name)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1228, in conv2d_backprop_input
dilations=dilations, name=name)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/data/yxzou/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[17,80,1,470250] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: WaveNet_model/inference/conv_transpose1d_2/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 11], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](WaveNet_model/inference/conv_transpose1d_2/stack, WaveNet_model/inference/conv_transpose1d_2/kernel/read, WaveNet_model/inference/upsample_leaky_relu_2)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

 [[Node: WaveNet_model/inference/strided_slice_3/_373 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_978_WaveNet_model/inference/strided_slice_3", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Thank you for all your help!

@sysuzyx
Copy link
Author

sysuzyx commented Oct 12, 2018

I solved it by setting wavenet_synthesis_batch_size from 20 to 4.

@zshy1205
Copy link

Hi, Can you try the input_type='raw'.
I train the WaveNet with this and get the loss is negative.
Can you know something about this?

@sysuzyx
Copy link
Author

sysuzyx commented Oct 17, 2018

@zshy1205 see #220

@sysuzyx sysuzyx closed this as completed Nov 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants