Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what if i have no gpu, how long it will take to train this model in kaldi #11

Open
AlexPeng19 opened this issue Aug 7, 2018 · 5 comments

Comments

@AlexPeng19
Copy link

This script is intended to be used with GPUs but you have not compiled Kaldi with CUDA
If you want to use GPUs (and have them), go to src/, and configure and make on a machine
where "nvcc" is installed.

i see the warning, maybe it will not block the trainning, but could i know how to shorten the training period if there is no gpu. i think my machine is well configured, it has 256G memory and 26 processor, but after two weeks training, it only complet half of the run.sh script. anybody could provide help?

@tramphero
Copy link
Contributor

Generally speaking, a single GPU is dozens of times faster than CPU. So I am afraid it will take months for you to train this model using CPUs.

@AlexPeng19
Copy link
Author

@tramphero could i ask another question, while i am running librispeech/s5/run.sh. there is message as followings:

"This script is intended to be used with GPUs but you have not compiled Kaldi with CUDA
If you want to use GPUs (and have them), go to src/, and configure and make on a machine
where "nvcc" is installed."

after the command:
steps/align_fmllr.sh --nj 30 --cmd "$train_cmd"
240 data/train_clean_100 data/lang exp/tri4b exp/tri4b_ali_clean_100

and it exit without any errors each time, does it mean i need to make some change on somewhere?
looking forward your answer.

@AlexPeng19
Copy link
Author

@tramphero i see, i checked local/nnet2/run_5a_clean_100.sh, i reset use_gpu=false, now it moved on.

@AlexPeng19
Copy link
Author

i used the gridengine parallelism configuration with 24 thread to run, any possibility to shorten the period. i intended to run with multiple nodes, but it has error to find master node path, so i have to disable other nodes. did you ever come across this kind of problem?

@AlexPeng19
Copy link
Author

@tramphero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants