-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation issue: Alan's voice introduces long compile times and failures on some machines. #61
Comments
Those large files in Alan voice being compiled in parallel can eat about 300MB RAM during compilation. If It seems that the main benefit of embedding a voice is a shorter load time. Once @forslund has his pymimic module we will only need to load the voice once at the beginning so the main advantage of embedding it will disappear. We can then move to not-embedding by default. |
@zeehio, no pressure then :) |
A possible workaround until we have a better solution:
Or maybe provide already built binaries? |
Hmm, the flitevox file is just huge and much slower... Pre compiled would be an option, but what about different architectures? |
Do you have a list of architectures/OS you would like to support? |
Maybe I profile the voice loading to see where is the bottleneck and if the performance can be improved. |
Hmm, That might be a good idea. Go a head and release the build process though if you like. |
I profiled the voice loading once and if I remember correctly the main issue was at https://github.com/MycroftAI/mimic/blob/master/src/cg/cst_cg_map.c#L93 where the mcep trees are read. There are many nested calls reading a lot of numbers value by value, with an error check per call. The voice loading typically takes some seconds on a background thread on a mobile device once on startup, so this wasn't a huge problem (I work for VocaliD, in case you wonder). |
Thanks for the info @m-toman, I will try to do that. Given that you are working for VocaliD, do you know if it would be possible to train an HTS version of the Mycroft voice? Adding hts support to mimic shouldn't be hard, as there is Flite+hts_engine out there. As you probably already know, HTS voices have a much smaller footprint (<5MB) and in my limited experience (in speech synthesis in Catalan demo) quite good quality, great for embedded apps. (In case you wonder, I am just collaborating with mimic on my spare time, I worked with speech synthesis in the past at the TALP-UPC group under the supervision of Antonio Bonafonte -great person- and now I just spend some time on it for fun) |
@zeehio if you like I can take a look at optimizing the flitevox-loading. |
Ah, I have been at the SSW 2013 (http://ssw8.talp.cat) in Barcelona :). I also trained an HTS version but it turned out to be rather disappointing with the regular hts_engine MLSA vocoder (in research we always used STRAIGHT). Mixed excitation as in flite is much smoother (but we had to make some changes to the festvox training to get the 44.1kHz version working). But yes, also much larger due to the random forest. Our German voice model was also much better when trained using the regular HTS demo (3 samples here: http://m-toman.github.io/SALB/). |
@forslund That would be great, thanks! I am thinking that if you can move forward with pymimic once we have released a new mimic version then maybe it is worth to release right now, and push on pymimic a bit more. The main drawback we have with voice loading times is not that it is slow (few seconds), the main issue is that Mycroft is loading the voice on each mimic call (on each sentence) instead of once per session. It would be great to have pymimic as it would allow to keep the loaded voice in memory so we would not be paying the several seconds delay price on each sentence. If you feel that it is easier to get pymimic working than working on optimizations here then I suggest that we release right now, focus on having a pymimic release too and adapting Mycroft to use it. It is up to you :-) @m-toman I will write you an email :-) I helped in the ssw8 organization (passing microphones, etc). It is a pity that there is not a better free software vocoder implementation, I know in TALP they have been working with both STRAIGHT and AHOcoder both improving the MLSA filter, but unfortunately none of them have a free software implementation. I believe they (at TALP) are also using SALB, I am sure they are thankful for it! |
Even if a bit off-topic but perhaps a discussion interesting for others too: I've also been thinking about hybrid synthesis, so replacing the vocoder with a unit selection search. In the end, probably a DNN will directly synthesize waveforms, I guess :). Regarding SALB, yes I've been contacted with some questions on it. I've also considered ICU but it seemed a bit huge and I wanted to keep the dependencies low, so I just added special treatment for UTF-8 characters for my small German text analysis. The connection from flite to hts_engine is rather simple - there is a huge function covering the utterance structure to a hts label and a dummy voice without synthesis function. But I guess discussion on that would belong to a new issue (like the whole post, but I'm not sure where :)). |
Sorry for the offtopic issue, if I could I would split it. After your comments I contacted Antonio Bonafonte, Asuncion Moreno both from TALP and Daniel Erro from AHOLAB and Dani sent me not one but two possible alternatives: AhoTTS is a GPL3 speech synthesis system for Basque and Spanish based on aholab vocoder. To train the voices aholab binaries are needed though, although if we are going to train HTS voices HTK is also needed and non free... The other solution is a free (BSD) implementation of something similar to the STRAIGHT vocoder called World. I believe it is worth looking into it. I will open a new issue and try to see if it is possible to move these vocoder comments there ;-) |
Thanks guys. We could really use a more optimized version:D |
Straight is now open source : https://github.com/HidekiKawahara/legacy_STRAIGHT |
I'm not sure what is going on here becasue some machines have no problem with compiling, while others do. It seems to be in part due to heavy resource usage.
I will return with error snippets as soon as I can.
The text was updated successfully, but these errors were encountered: