failed to initialize batched cufft plan with customized allocator #711

abaddon-moriarty · 2021-11-29T17:08:56Z

Hello everyone,
I am currently training a phoneme-based HiFi-GAN model and I recently ran into the following issue. It started when I tried using multiple GPUs, but now I can't even train on a single GPU.

It is written that it is reducing the batch size, but these are the settings in my hifigan.v1.yaml file

I saw this issue having thr same failed to initialize batched cufft plan with customized allocator, but for them the GPU runs out of memory, which is not the case for me.
I also saw in this issue that there was a problem in the batch_max_steps_valid, but I've used the same file to train other vocoders and it is the first time this error arises, what should be the correct value ?

INFO:tensorflow:batch_all_reduce: 156 all-reduces with algorithm = nccl, num_packs = 1
2021-11-29 16:45:53,870 (cross_device_ops:702) INFO: batch_all_reduce: 156 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 102 all-reduces with algorithm = nccl, num_packs = 1
2021-11-29 16:46:15,996 (cross_device_ops:702) INFO: batch_all_reduce: 102 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 156 all-reduces with algorithm = nccl, num_packs = 1
2021-11-29 16:46:53,329 (cross_device_ops:702) INFO: batch_all_reduce: 156 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 102 all-reduces with algorithm = nccl, num_packs = 1
2021-11-29 16:47:14,118 (cross_device_ops:702) INFO: batch_all_reduce: 102 all-reduces with algorithm = nccl, num_packs = 1
2021-11-29 16:48:33.400178: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-11-29 16:48:38.176008: E tensorflow/stream_executor/cuda/cuda_fft.cc:223] failed to make cuFFT batched plan:5
2021-11-29 16:48:38.176052: E tensorflow/stream_executor/cuda/cuda_fft.cc:426] Initialize Params: rank: 1 elem_count: 2048 input_embed: 2048 input_stride: 1 input_distance: 2048 output_embed: 1025 output_stride: 1 output_distance: 1025 batch_count: 480
2021-11-29 16:48:38.176062: F tensorflow/stream_executor/cuda/cuda_fft.cc:435] failed to initialize batched cufft plan with customized allocator: Failed to make cuFFT batched plan.

Any ideas on how to correct this ?
Thank you

dathudeptrai · 2021-12-05T16:30:46Z

@ZDisket do you know what is a problem here ?

abaddon-moriarty · 2022-01-03T09:49:42Z

I have re-initialised everything and started from zero again, I no longer have this issue.

dathudeptrai self-assigned this Dec 5, 2021

dathudeptrai added the bug 🐛 Something isn't working label Dec 5, 2021

abaddon-moriarty closed this as completed Jan 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to initialize batched cufft plan with customized allocator #711

failed to initialize batched cufft plan with customized allocator #711

abaddon-moriarty commented Nov 29, 2021

dathudeptrai commented Dec 5, 2021

abaddon-moriarty commented Jan 3, 2022

failed to initialize batched cufft plan with customized allocator #711

failed to initialize batched cufft plan with customized allocator #711

Comments

abaddon-moriarty commented Nov 29, 2021

dathudeptrai commented Dec 5, 2021

abaddon-moriarty commented Jan 3, 2022