Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] cuFFT error #2365

Closed
mesut92 opened this issue Feb 26, 2023 · 2 comments
Closed

[Bug] cuFFT error #2365

mesut92 opened this issue Feb 26, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@mesut92
Copy link

mesut92 commented Feb 26, 2023

Describe the bug

I am trying to train vits with ljspeech on 4090. i am getting that error, i could not fix. I update the torch and nvidia drivers.

To Reproduce

run this code: python recipes/turk/vits_tts/train_vits.py

getting this error
/usr/local/lib/python3.8/dist-packages/torch/functional.py:632: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:801.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
! Run is removed from /media/mesut/Depo1/works/TTS/recipes/turk/vits_tts/vits_ljspeech-February-26-2023_08+55AM-0000000
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1591, in fit
self._fit()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1544, in _fit
self.train_epoch()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1309, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1126, in train_step
batch = self.format_batch(batch)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 926, in format_batch
batch = self.model.format_batch_on_device(batch)
File "/media/mesut/Depo1/works/TTS/TTS/tts/models/vits.py", line 1503, in format_batch_on_device
batch["spec"] = wav_to_spec(wav, ac.fft_size, ac.hop_length, ac.win_length, center=False)
File "/media/mesut/Depo1/works/TTS/TTS/tts/models/vits.py", line 123, in wav_to_spec
spec = torch.stft(
File "/usr/local/lib/python3.8/dist-packages/torch/functional.py", line 632, in stft
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

Expected behavior

start to train

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.13.1+cu117",
        "TTS": "0.11.1",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.8.10",
        "version": "#66~20.04.1-Ubuntu SMP Wed Jan 25 09:41:30 UTC 2023"
    }
}

Additional context

No response

@mesut92 mesut92 added the bug Something isn't working label Feb 26, 2023
@erogol
Copy link
Member

erogol commented Feb 27, 2023

can't reproduce. In general it is OOM issue

@erogol erogol closed this as completed Feb 27, 2023
@pathnirvana
Copy link

pathnirvana commented Mar 5, 2023

I am getting the same error on a rtx 4090 on the ljspeech dataset using the !CUDA_VISIBLE_DEVICES=0 python3 recipes/ljspeech/vits_tts/train_vits.py

edit: a solution is mentioned here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants