Personalized/Incremental Speech Model Training #13

kendonB · 2020-01-20T04:00:55Z

I'm not sure if Kaldi is capable of this, but DPI 15 seems to allow for "incremental" training, where the program starts with a base model, then very quickly learns from the user's training speech. On mine it seems to get better in a few seconds after just saving the user profile. Is Kaldi capable of this type of learning?

daanzu · 2020-01-20T04:28:08Z

Although this isn't officially advised/supported in Kaldi, I recently got it working. I have more testing to do to determine what the sweet spot is for the amount of training data, but preliminary results are very positive. I plan on posting some numbers on this soon.

However, performing the training on the client machine is extremely difficult, due to Kaldi's many dependencies for training. So for the near term, I think collecting the data locally and then performing the training in the cloud may be most practical.

daanzu · 2020-04-20T14:02:40Z

Some numbers are posted in https://github.com/daanzu/kaldi-active-grammar/blob/master/docs/models.md

kendonB · 2020-04-26T21:54:46Z

@daanzu anything i can do to help beta test this?

daanzu · 2020-04-27T08:13:07Z

@kendonB Eventually I would like to streamline it and package it up in Docker or something, but there's more work to be done for that.

However, if you're comfortable sending me some audio and transcripts for training, I can use you for testing. I will be posting more info on recommended ways for collecting this training data soon.

JohnDoe02 · 2020-05-24T17:29:37Z

I would really love to see this docker container! Something semi ready would be welcome, too

kendonB · 2021-07-13T01:43:07Z

Hi @daanzu, just revisiting this. I know there was some progress discussed here: #33

Do you have anything to share that would be noob-friendly?

daanzu · 2021-07-13T06:33:35Z

@kendonB Thanks for reminding me! I have been slow in making this completely noob-friendly, and therefore afraid of releasing it, but I do have something you could try out. Do you have some audio data with transcripts, and a docker installation? Having CUDA is nice, but not a strict requirement if you don't mind having patience with the CPU.

kendonB · 2021-07-27T22:04:16Z

I can make some training data - do you have some suggested transcripts to read? Docker + CUDA are both easy to get going. Would it be easiest to set up a private repo that we can iterate on?

daanzu · 2021-07-28T01:25:42Z

@kendonB Great! I would say there are two good ways to gather training data: retaining data from your current speech recognition usage, and recording training data directly. Recording directly is certainly most efficient in gathering the best training data, though retaining is a relatively easy way to gather a lot, and there are a few ways to try to weed out any bad data.

Here's an app I threw together for recording data directly, and storing it in an easy format, and it comes with a few standard sets of training sentences that try to equally cover the range of english sounds. https://github.com/daanzu/speech-training-recorder

I will put up a repo with the training setup.

bluecamel · 2021-07-31T05:42:16Z

Hi, @daanzu. I'm also super interested in this and happy to help in any way that I can. I'm pretty new to dictation in general, but strong with python and docker. I'm not sure what you're using to train, but I have some experience with tensorflow and keras, but not too advanced. I've recorded a bunch of data with your recorder and have NVIDIA GPUs with CUDA ready to throw at it :)

daanzu · 2021-08-01T15:01:19Z

@bluecamel Great! I should have scripts to play with uploaded within another day or two. FYI, for all standard Kaldi model training, it uses its own nnet code (rather than Tensorflow/Pytorch/etc), but of course it has full CUDA support. I put together separate docker images for CPU and CUDA: https://hub.docker.com/u/daanzu

kendonB · 2021-08-04T01:25:18Z

@daanzu I have made some recordings and have the docker image downloaded. I presume I just need to place the audio data in audio_data into the docker image then hit go. How do I do that? I have CUDA

daanzu · 2021-08-04T03:09:11Z

@kendonB Ah, yes, there are still some top-level scripts I need to add to the image, and instructions. Hopefully tonight!

kendonB · 2021-08-09T04:05:42Z

@daanzu I can't see those added here: https://hub.docker.com/u/daanzu are they somewhere else? Apologies for the noob questions

daanzu · 2021-08-09T14:36:48Z

@kendonB Sorry, I've been busy, and haven't had a chance to finish adding the necessary scripts yet. Really hope to soon, and will post an update when it's ready.

daanzu · 2021-08-13T08:54:48Z

Terribly late and entirely untested (as of yet): https://github.com/daanzu/kaldi_ag_training

bluecamel · 2021-08-14T20:38:18Z

@daanzu Yay! Can't wait to try it, but the mentioned tag (2021-08-04) isn't on docker hub.

bluecamel · 2021-08-14T22:21:44Z

Ah, sorry, I see that I can just build the image. I'll try that. If you don't mind, I'll make a PR with some adjustments to docs as i go along to help other amateurs like me. :)

daanzu · 2021-08-15T01:25:10Z

@bluecamel Oops, actually I think that is leftover and 2020-11-28 would be fine. I pushed an update. Thanks for any help!

daanzu · 2021-08-15T01:26:43Z

FYI, I decided to try to treat the docker image as evergreen, and keep the things liable to change a lot like scripts in the git repo instead.

kendonB · 2022-12-07T20:11:39Z

Hi @daanzu, did you have any progress on this one? I don't think I mentioned it here, but I never got it to work, even with many recordings. Do you know of anyone that managed to get it to work?

daanzu changed the title ~~Incremental training~~ Personalized/Incremental Speech Model Training May 1, 2020

daanzu mentioned this issue May 2, 2020

Add wav2letter engine support dictation-toolbox/dragonfly#245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Personalized/Incremental Speech Model Training #13

Personalized/Incremental Speech Model Training #13

kendonB commented Jan 20, 2020

daanzu commented Jan 20, 2020

daanzu commented Apr 20, 2020

kendonB commented Apr 26, 2020

daanzu commented Apr 27, 2020 •

edited

Loading

JohnDoe02 commented May 24, 2020

kendonB commented Jul 13, 2021

daanzu commented Jul 13, 2021

kendonB commented Jul 27, 2021

daanzu commented Jul 28, 2021

bluecamel commented Jul 31, 2021

daanzu commented Aug 1, 2021

kendonB commented Aug 4, 2021

daanzu commented Aug 4, 2021

kendonB commented Aug 9, 2021

daanzu commented Aug 9, 2021

daanzu commented Aug 13, 2021

bluecamel commented Aug 14, 2021

bluecamel commented Aug 14, 2021

daanzu commented Aug 15, 2021

daanzu commented Aug 15, 2021

kendonB commented Dec 7, 2022

Personalized/Incremental Speech Model Training #13

Personalized/Incremental Speech Model Training #13

Comments

kendonB commented Jan 20, 2020

daanzu commented Jan 20, 2020

daanzu commented Apr 20, 2020

kendonB commented Apr 26, 2020

daanzu commented Apr 27, 2020 • edited Loading

JohnDoe02 commented May 24, 2020

kendonB commented Jul 13, 2021

daanzu commented Jul 13, 2021

kendonB commented Jul 27, 2021

daanzu commented Jul 28, 2021

bluecamel commented Jul 31, 2021

daanzu commented Aug 1, 2021

kendonB commented Aug 4, 2021

daanzu commented Aug 4, 2021

kendonB commented Aug 9, 2021

daanzu commented Aug 9, 2021

daanzu commented Aug 13, 2021

bluecamel commented Aug 14, 2021

bluecamel commented Aug 14, 2021

daanzu commented Aug 15, 2021

daanzu commented Aug 15, 2021

kendonB commented Dec 7, 2022

daanzu commented Apr 27, 2020 •

edited

Loading