-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Personalized/Incremental Speech Model Training #13
Comments
Although this isn't officially advised/supported in Kaldi, I recently got it working. I have more testing to do to determine what the sweet spot is for the amount of training data, but preliminary results are very positive. I plan on posting some numbers on this soon. However, performing the training on the client machine is extremely difficult, due to Kaldi's many dependencies for training. So for the near term, I think collecting the data locally and then performing the training in the cloud may be most practical. |
Some numbers are posted in https://github.com/daanzu/kaldi-active-grammar/blob/master/docs/models.md |
@daanzu anything i can do to help beta test this? |
@kendonB Eventually I would like to streamline it and package it up in Docker or something, but there's more work to be done for that. However, if you're comfortable sending me some audio and transcripts for training, I can use you for testing. I will be posting more info on recommended ways for collecting this training data soon. |
I would really love to see this docker container! Something semi ready would be welcome, too |
@kendonB Thanks for reminding me! I have been slow in making this completely noob-friendly, and therefore afraid of releasing it, but I do have something you could try out. Do you have some audio data with transcripts, and a docker installation? Having CUDA is nice, but not a strict requirement if you don't mind having patience with the CPU. |
I can make some training data - do you have some suggested transcripts to read? Docker + CUDA are both easy to get going. Would it be easiest to set up a private repo that we can iterate on? |
@kendonB Great! I would say there are two good ways to gather training data: retaining data from your current speech recognition usage, and recording training data directly. Recording directly is certainly most efficient in gathering the best training data, though retaining is a relatively easy way to gather a lot, and there are a few ways to try to weed out any bad data. Here's an app I threw together for recording data directly, and storing it in an easy format, and it comes with a few standard sets of training sentences that try to equally cover the range of english sounds. https://github.com/daanzu/speech-training-recorder I will put up a repo with the training setup. |
Hi, @daanzu. I'm also super interested in this and happy to help in any way that I can. I'm pretty new to dictation in general, but strong with python and docker. I'm not sure what you're using to train, but I have some experience with tensorflow and keras, but not too advanced. I've recorded a bunch of data with your recorder and have NVIDIA GPUs with CUDA ready to throw at it :) |
@bluecamel Great! I should have scripts to play with uploaded within another day or two. FYI, for all standard Kaldi model training, it uses its own nnet code (rather than Tensorflow/Pytorch/etc), but of course it has full CUDA support. I put together separate docker images for CPU and CUDA: https://hub.docker.com/u/daanzu |
@daanzu I have made some recordings and have the docker image downloaded. I presume I just need to place the audio data in |
@kendonB Ah, yes, there are still some top-level scripts I need to add to the image, and instructions. Hopefully tonight! |
@daanzu I can't see those added here: https://hub.docker.com/u/daanzu are they somewhere else? Apologies for the noob questions |
@kendonB Sorry, I've been busy, and haven't had a chance to finish adding the necessary scripts yet. Really hope to soon, and will post an update when it's ready. |
Terribly late and entirely untested (as of yet): https://github.com/daanzu/kaldi_ag_training |
@daanzu Yay! Can't wait to try it, but the mentioned tag ( |
Ah, sorry, I see that I can just build the image. I'll try that. If you don't mind, I'll make a PR with some adjustments to docs as i go along to help other amateurs like me. :) |
@bluecamel Oops, actually I think that is leftover and |
FYI, I decided to try to treat the docker image as evergreen, and keep the things liable to change a lot like scripts in the git repo instead. |
Hi @daanzu, did you have any progress on this one? I don't think I mentioned it here, but I never got it to work, even with many recordings. Do you know of anyone that managed to get it to work? |
I'm not sure if Kaldi is capable of this, but DPI 15 seems to allow for "incremental" training, where the program starts with a base model, then very quickly learns from the user's training speech. On mine it seems to get better in a few seconds after just saving the user profile. Is Kaldi capable of this type of learning?
The text was updated successfully, but these errors were encountered: