Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4

Open
sujeendran opened this issue Jun 4, 2020 · 18 comments

Comments

@sujeendran
Copy link

Hi,
Which version of PyTorch , CUDA and cuDNN do you use? I always run into this error.
My setup:
Windows 10 Pro
python 3.6.5
torch 1.1.0
Cuda 10.0
cuDNN 7.6

I tried explicitly setting the device to 'cpu' in synthesis.py and fastspeech.py to try running just on cpu.
Also tried model.to(device) and also model arguments .to(device). Same issue happens in both FastSpeech and SqueezeWave for me.
image
Same result on ubuntu 18.04 with python 3.6.9, torch 1.5, cuda 10.2 and cudnn 8!

@alokprasad
Copy link
Owner

No need of Cuda or CuDNN , if you want to run only CPU.
try export CUDA_VISIBLE_DEVICES=-1

@sujeendran
Copy link
Author

sujeendran commented Jun 5, 2020

Thanks! I wanted to run test on GPU version of both fastspeech and squeezewave on Jetson. Resolved this issue.

@alokprasad
Copy link
Owner

Thanks, I had specially made the changes in repo to make it work on CPU , GPU havent tested.
You may want share comparison between GPU and CPU inference time.

@sujeendran
Copy link
Author

@alokprasad I was able to merge the code in both fastspeech and squeezewave for testing end-to-end TTS from one application to skip the storing and loading mel spectrogram. The full pipeline was able to synthesize audio from text in 0.5 seconds on Jetson Nano's GPU. CPU implementation I havent tested yet, will let you know once i test it.
PS, I want to try training the networks with my custom dataset (same structure as LJSpeech). Maybe you have an idea how i can do this quickly? Because from the looks of it, i feel like i need to generate alignments using tacotron2 to start with this.

@varungujjar
Copy link

@sujeendran am trying to run this on raspberry pi4 and do some tests.. ... however just needed help with which tts engine did u manage synthesize audio from text.. ? am kind of lost of where to begin with.. do u have a notebook for the same ?

@alokprasad
Copy link
Owner

@varungujjar this repo is using fastspeech ( for generating Features from Text) and Squeezewave as vocoder for generating wav from features generated from fastspeech..
There is two NN .

@varungujjar
Copy link

@alokprasad thankx a lot for the beginners :D however looking at your fastspeech repo it says u need cuda.. is it also been modified for cpu support aswell ?

@alokprasad
Copy link
Owner

@varungujjar i guess before jumping on TTS you should read about Older TTS implementation( svox PICO ,epeak) and how newer NN based TTS works ( Tacotron2 , Fastspeech ) Understand vocoder...etc..

Regarding repo its just tested for CPU as i wanted to run it embedded board without NVIDIA hardware..CUDA is only for nvidia devices..maybe you need to modify for CUDA support ...

@varungujjar
Copy link

@alokprasad sure.. i'll check that.. regarding the repo.. yes its exactly wanted to check since my target is to run on cpu .... without nvidia hardrware.. thankx a ton

@varungujjar
Copy link

@alokprasad Managed to run your squeezewave with cpu vocoder on pi4 4GB: took 16 seconds :)

@alokprasad
Copy link
Owner

@varungujjar good you were able to do so.
What was the text , how much length of Audio it generated in 16seconds?
you can use soxi/sox to get details on wav file generated.

@varungujjar
Copy link

@alokprasad not yet i just managed to compile pytorch after 4 hours first and then just run
python inference.py -f <(ls mel_spectrograms/*.pt) -w squeezewave.pt -o . -s 0.6
was getting libpack error with precompiled pytorch wheels on the internet.

@varungujjar
Copy link

@alokprasad so i managed run both the inferences fastpeech and squeezewave successfully but when i run the wav file its blank ..

@varungujjar
Copy link

Ok finally got it to work here are the results:

Raspberry Pi4 4GB
Model : L128_small_pretrain
Fastspeech :
MEL Calculation:
2.8617560863494873

SqueezeWave
Squeezewave vocoder time
14.423999309539795

@alokprasad
Copy link
Owner

Thanks For Sharing!

@alokprasad
Copy link
Owner

Ok finally got it to work here are the results:

Raspberry Pi4 4GB
Model : L128_small_pretrain
Fastspeech :
MEL Calculation:
2.8617560863494873

SqueezeWave
Squeezewave vocoder time
14.423999309539795

@varungujjar you may see Vocoder takes most of the TIme, So you might spent Some time in optimizing the code to run this faster in Rpi , May be optimized version of Pytorch that runs faster in ARM using Neon Instruction..

@varunquartic
Copy link

@alokprasad am actually not an ML guy yet have that learning curve going on.. wouldnt know how to do that.. however i was also wondering if I could use this vocoder with Mozilla tts engine to get better sound output ? is that possible ?

@alokprasad
Copy link
Owner

i would say..Read..learn..dig more..i am also not an ML Guy..:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants