[Bug] cuFFT error #2365

mesut92 · 2023-02-26T09:01:43Z

Describe the bug

I am trying to train vits with ljspeech on 4090. i am getting that error, i could not fix. I update the torch and nvidia drivers.

To Reproduce

run this code: python recipes/turk/vits_tts/train_vits.py

getting this error
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
! Run is removed from /media/mesut/Depo1/works/TTS/recipes/turk/vits_tts/vits_ljspeech-February-26-2023_08+55AM-0000000
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1591, in fit
self._fit()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1544, in _fit
self.train_epoch()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1309, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1126, in train_step
batch = self.format_batch(batch)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 926, in format_batch
batch = self.model.format_batch_on_device(batch)
File "/media/mesut/Depo1/works/TTS/TTS/tts/models/vits.py", line 1503, in format_batch_on_device
batch["spec"] = wav_to_spec(wav, ac.fft_size, ac.hop_length, ac.win_length, center=False)
File "/media/mesut/Depo1/works/TTS/TTS/tts/models/vits.py", line 123, in wav_to_spec
spec = torch.stft(
File "/usr/local/lib/python3.8/dist-packages/torch/functional.py", line 632, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

Expected behavior

start to train

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.13.1+cu117",
        "TTS": "0.11.1",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.8.10",
        "version": "#66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 UTC 2023"
    }
}

Additional context

No response

erogol · 2023-02-27T06:12:31Z

can't reproduce. In general it is OOM issue

pathnirvana · 2023-03-05T05:29:28Z

I am getting the same error on a rtx 4090 on the ljspeech dataset using the !CUDA_VISIBLE_DEVICES=0 python3 recipes/ljspeech/vits_tts/train_vits.py

edit: a solution is mentioned here

mesut92 added the bug Something isn't working label Feb 26, 2023

erogol closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] cuFFT error #2365

[Bug] cuFFT error #2365

mesut92 commented Feb 26, 2023

erogol commented Feb 27, 2023

pathnirvana commented Mar 5, 2023 •

edited

Loading

[Bug] cuFFT error #2365

[Bug] cuFFT error #2365

Comments

mesut92 commented Feb 26, 2023

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

erogol commented Feb 27, 2023

pathnirvana commented Mar 5, 2023 • edited Loading

pathnirvana commented Mar 5, 2023 •

edited

Loading