Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Personalized/Incremental Speech Model Training #13

Open
kendonB opened this issue Jan 20, 2020 · 21 comments
Open

Personalized/Incremental Speech Model Training #13

kendonB opened this issue Jan 20, 2020 · 21 comments

Comments

@kendonB
Copy link

kendonB commented Jan 20, 2020

I'm not sure if Kaldi is capable of this, but DPI 15 seems to allow for "incremental" training, where the program starts with a base model, then very quickly learns from the user's training speech. On mine it seems to get better in a few seconds after just saving the user profile. Is Kaldi capable of this type of learning?

@daanzu
Copy link
Owner

daanzu commented Jan 20, 2020

Although this isn't officially advised/supported in Kaldi, I recently got it working. I have more testing to do to determine what the sweet spot is for the amount of training data, but preliminary results are very positive. I plan on posting some numbers on this soon.

However, performing the training on the client machine is extremely difficult, due to Kaldi's many dependencies for training. So for the near term, I think collecting the data locally and then performing the training in the cloud may be most practical.

@daanzu
Copy link
Owner

daanzu commented Apr 20, 2020

@kendonB
Copy link
Author

kendonB commented Apr 26, 2020

@daanzu anything i can do to help beta test this?

@daanzu
Copy link
Owner

daanzu commented Apr 27, 2020

@kendonB Eventually I would like to streamline it and package it up in Docker or something, but there's more work to be done for that.

However, if you're comfortable sending me some audio and transcripts for training, I can use you for testing. I will be posting more info on recommended ways for collecting this training data soon.

@daanzu daanzu changed the title Incremental training Personalized/Incremental Speech Model Training May 1, 2020
@JohnDoe02
Copy link

I would really love to see this docker container! Something semi ready would be welcome, too

@kendonB
Copy link
Author

kendonB commented Jul 13, 2021

Hi @daanzu, just revisiting this. I know there was some progress discussed here: #33

Do you have anything to share that would be noob-friendly?

@daanzu
Copy link
Owner

daanzu commented Jul 13, 2021

@kendonB Thanks for reminding me! I have been slow in making this completely noob-friendly, and therefore afraid of releasing it, but I do have something you could try out. Do you have some audio data with transcripts, and a docker installation? Having CUDA is nice, but not a strict requirement if you don't mind having patience with the CPU.

@kendonB
Copy link
Author

kendonB commented Jul 27, 2021

I can make some training data - do you have some suggested transcripts to read? Docker + CUDA are both easy to get going. Would it be easiest to set up a private repo that we can iterate on?

@daanzu
Copy link
Owner

daanzu commented Jul 28, 2021

@kendonB Great! I would say there are two good ways to gather training data: retaining data from your current speech recognition usage, and recording training data directly. Recording directly is certainly most efficient in gathering the best training data, though retaining is a relatively easy way to gather a lot, and there are a few ways to try to weed out any bad data.

Here's an app I threw together for recording data directly, and storing it in an easy format, and it comes with a few standard sets of training sentences that try to equally cover the range of english sounds. https://github.com/daanzu/speech-training-recorder

I will put up a repo with the training setup.

@bluecamel
Copy link

Hi, @daanzu. I'm also super interested in this and happy to help in any way that I can. I'm pretty new to dictation in general, but strong with python and docker. I'm not sure what you're using to train, but I have some experience with tensorflow and keras, but not too advanced. I've recorded a bunch of data with your recorder and have NVIDIA GPUs with CUDA ready to throw at it :)

@daanzu
Copy link
Owner

daanzu commented Aug 1, 2021

@bluecamel Great! I should have scripts to play with uploaded within another day or two. FYI, for all standard Kaldi model training, it uses its own nnet code (rather than Tensorflow/Pytorch/etc), but of course it has full CUDA support. I put together separate docker images for CPU and CUDA: https://hub.docker.com/u/daanzu

@kendonB
Copy link
Author

kendonB commented Aug 4, 2021

@daanzu I have made some recordings and have the docker image downloaded. I presume I just need to place the audio data in audio_data into the docker image then hit go. How do I do that? I have CUDA

@daanzu
Copy link
Owner

daanzu commented Aug 4, 2021

@kendonB Ah, yes, there are still some top-level scripts I need to add to the image, and instructions. Hopefully tonight!

@kendonB
Copy link
Author

kendonB commented Aug 9, 2021

@daanzu I can't see those added here: https://hub.docker.com/u/daanzu are they somewhere else? Apologies for the noob questions

@daanzu
Copy link
Owner

daanzu commented Aug 9, 2021

@kendonB Sorry, I've been busy, and haven't had a chance to finish adding the necessary scripts yet. Really hope to soon, and will post an update when it's ready.

@daanzu
Copy link
Owner

daanzu commented Aug 13, 2021

Terribly late and entirely untested (as of yet): https://github.com/daanzu/kaldi_ag_training

@bluecamel
Copy link

@daanzu Yay! Can't wait to try it, but the mentioned tag (2021-08-04) isn't on docker hub.

@bluecamel
Copy link

Ah, sorry, I see that I can just build the image. I'll try that. If you don't mind, I'll make a PR with some adjustments to docs as i go along to help other amateurs like me. :)

@daanzu
Copy link
Owner

daanzu commented Aug 15, 2021

@bluecamel Oops, actually I think that is leftover and 2020-11-28 would be fine. I pushed an update. Thanks for any help!

@daanzu
Copy link
Owner

daanzu commented Aug 15, 2021

FYI, I decided to try to treat the docker image as evergreen, and keep the things liable to change a lot like scripts in the git repo instead.

@kendonB
Copy link
Author

kendonB commented Dec 7, 2022

Hi @daanzu, did you have any progress on this one? I don't think I mentioned it here, but I never got it to work, even with many recordings. Do you know of anyone that managed to get it to work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants