Compilation issue: Alan's voice introduces long compile times and failures on some machines. #61

aatchison · 2016-07-10T22:19:12Z

I'm not sure what is going on here becasue some machines have no problem with compiling, while others do. It seems to be in part due to heavy resource usage.
I will return with error snippets as soon as I can.

zeehio · 2016-07-11T11:55:25Z

Those large files in Alan voice being compiled in parallel can eat about 300MB RAM during compilation.

If make -j 4 or similar is used on a device with 1GB of RAM there is a chance that those files are compiled simultaneously and that there is an out of memory situation.

It seems that the main benefit of embedding a voice is a shorter load time. Once @forslund has his pymimic module we will only need to load the voice once at the beginning so the main advantage of embedding it will disappear. We can then move to not-embedding by default.

forslund · 2016-07-11T16:23:47Z

@zeehio, no pressure then :)

zeehio · 2016-08-14T20:31:00Z

A possible workaround until we have a better solution:

Disable Alan voice at compile time ./configure --disable-vid_gb_ap
Copy the Mycroft voice file from the voices directory
Use mimic with the voice from a file:
mimic -voice /path/to/Mycroft.flitevox

Or maybe provide already built binaries?

aatchison · 2016-08-14T20:33:57Z

Hmm, the flitevox file is just huge and much slower... Pre compiled would be an option, but what about different architectures?

zeehio · 2016-08-14T22:27:25Z

Do you have a list of architectures/OS you would like to support?

zeehio · 2016-08-15T00:29:23Z

Maybe I profile the voice loading to see where is the bottleneck and if the performance can be improved.

aatchison · 2016-08-15T00:30:44Z

Hmm, That might be a good idea. Go a head and release the build process though if you like.

m-toman · 2016-08-15T02:32:06Z

I profiled the voice loading once and if I remember correctly the main issue was at https://github.com/MycroftAI/mimic/blob/master/src/cg/cst_cg_map.c#L93 where the mcep trees are read. There are many nested calls reading a lot of numbers value by value, with an error check per call.
freading larger chunks could certainly help here.

The voice loading typically takes some seconds on a background thread on a mobile device once on startup, so this wasn't a huge problem (I work for VocaliD, in case you wonder).

zeehio · 2016-08-15T04:01:34Z

Thanks for the info @m-toman, I will try to do that.

Given that you are working for VocaliD, do you know if it would be possible to train an HTS version of the Mycroft voice? Adding hts support to mimic shouldn't be hard, as there is Flite+hts_engine out there.

As you probably already know, HTS voices have a much smaller footprint (<5MB) and in my limited experience (in speech synthesis in Catalan demo) quite good quality, great for embedded apps.

(In case you wonder, I am just collaborating with mimic on my spare time, I worked with speech synthesis in the past at the TALP-UPC group under the supervision of Antonio Bonafonte -great person- and now I just spend some time on it for fun)

forslund · 2016-08-15T06:14:33Z

@zeehio if you like I can take a look at optimizing the flitevox-loading.

m-toman · 2016-08-15T06:36:06Z

Ah, I have been at the SSW 2013 (http://ssw8.talp.cat) in Barcelona :).

I also trained an HTS version but it turned out to be rather disappointing with the regular hts_engine MLSA vocoder (in research we always used STRAIGHT). Mixed excitation as in flite is much smoother (but we had to make some changes to the festvox training to get the 44.1kHz version working). But yes, also much larger due to the random forest.

Our German voice model was also much better when trained using the regular HTS demo (3 samples here: http://m-toman.github.io/SALB/).
I suppose because it was recorded in studio setting with a professional speaker and manually cleaned labels.
We can talk about this by email if you like - m dot toman at neuratec dot com :).

zeehio · 2016-08-15T09:53:22Z

@forslund That would be great, thanks! I am thinking that if you can move forward with pymimic once we have released a new mimic version then maybe it is worth to release right now, and push on pymimic a bit more. The main drawback we have with voice loading times is not that it is slow (few seconds), the main issue is that Mycroft is loading the voice on each mimic call (on each sentence) instead of once per session. It would be great to have pymimic as it would allow to keep the loaded voice in memory so we would not be paying the several seconds delay price on each sentence. If you feel that it is easier to get pymimic working than working on optimizations here then I suggest that we release right now, focus on having a pymimic release too and adapting Mycroft to use it. It is up to you :-)

@m-toman I will write you an email :-) I helped in the ssw8 organization (passing microphones, etc). It is a pity that there is not a better free software vocoder implementation, I know in TALP they have been working with both STRAIGHT and AHOcoder both improving the MLSA filter, but unfortunately none of them have a free software implementation. I believe they (at TALP) are also using SALB, I am sure they are thankful for it!

m-toman · 2016-08-15T10:39:17Z

Even if a bit off-topic but perhaps a discussion interesting for others too:
Yes, the vocoder is a big bottleneck.
I wrote a small tool to do feature extraction and resynthesis using the flite/mimic MLSA+ME vocoder and it was actually much better than regular MLSA, but still...
if some other vocoder comes up, it would be interesting to integrate it, but I'm not sure how generic the flite parameter generation is.
The festvox training scripts for clustergen voices are also a lot messier than the HTS demo training scripts and can hardly be parameterized (well, except sed-replacing scheme script contents).

I've also been thinking about hybrid synthesis, so replacing the vocoder with a unit selection search. In the end, probably a DNN will directly synthesize waveforms, I guess :).

Regarding SALB, yes I've been contacted with some questions on it.
Back then I decided to build around flite instead of extending it because of https://sourceforge.net/p/at-flite/wiki/AddingNewLanguage/

I've also considered ICU but it seemed a bit huge and I wanted to keep the dependencies low, so I just added special treatment for UTF-8 characters for my small German text analysis.
I've been using flite in SALB only for text analysis of English and attached hts_engine, with abstractions in-between. Probably if you build that into mimic, SALB becomes obsolete :).

The connection from flite to hts_engine is rather simple - there is a huge function covering the utterance structure to a hts label and a dummy voice without synthesis function. But I guess discussion on that would belong to a new issue (like the whole post, but I'm not sure where :)).

zeehio · 2016-08-15T12:24:01Z

Sorry for the offtopic issue, if I could I would split it.

After your comments I contacted Antonio Bonafonte, Asuncion Moreno both from TALP and Daniel Erro from AHOLAB and Dani sent me not one but two possible alternatives:

AhoTTS is a GPL3 speech synthesis system for Basque and Spanish based on aholab vocoder. To train the voices aholab binaries are needed though, although if we are going to train HTS voices HTK is also needed and non free...

The other solution is a free (BSD) implementation of something similar to the STRAIGHT vocoder called World. I believe it is worth looking into it.

I will open a new issue and try to see if it is possible to move these vocoder comments there ;-)

aatchison · 2016-08-16T19:50:27Z

Thanks guys. We could really use a more optimized version:D

Shallowmallow · 2018-09-06T11:05:34Z

if some other vocoder comes up, it would be interesting to integrate it, but I'm not sure how generic the flite parameter generation is.

Straight is now open source : https://github.com/HidekiKawahara/legacy_STRAIGHT

zeehio mentioned this issue Aug 15, 2016

HTS and vocoder alternatives #76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilation issue: Alan's voice introduces long compile times and failures on some machines. #61

Compilation issue: Alan's voice introduces long compile times and failures on some machines. #61

aatchison commented Jul 10, 2016

zeehio commented Jul 11, 2016

forslund commented Jul 11, 2016

zeehio commented Aug 14, 2016

aatchison commented Aug 14, 2016

zeehio commented Aug 14, 2016

zeehio commented Aug 15, 2016

aatchison commented Aug 15, 2016

m-toman commented Aug 15, 2016

zeehio commented Aug 15, 2016 •

edited

forslund commented Aug 15, 2016

m-toman commented Aug 15, 2016 •

edited

zeehio commented Aug 15, 2016 •

edited

m-toman commented Aug 15, 2016

zeehio commented Aug 15, 2016

aatchison commented Aug 16, 2016

Shallowmallow commented Sep 6, 2018

Compilation issue: Alan's voice introduces long compile times and failures on some machines. #61

Compilation issue: Alan's voice introduces long compile times and failures on some machines. #61

Comments

aatchison commented Jul 10, 2016

zeehio commented Jul 11, 2016

forslund commented Jul 11, 2016

zeehio commented Aug 14, 2016

aatchison commented Aug 14, 2016

zeehio commented Aug 14, 2016

zeehio commented Aug 15, 2016

aatchison commented Aug 15, 2016

m-toman commented Aug 15, 2016

zeehio commented Aug 15, 2016 • edited

forslund commented Aug 15, 2016

m-toman commented Aug 15, 2016 • edited

zeehio commented Aug 15, 2016 • edited

m-toman commented Aug 15, 2016

zeehio commented Aug 15, 2016

aatchison commented Aug 16, 2016

Shallowmallow commented Sep 6, 2018

zeehio commented Aug 15, 2016 •

edited

m-toman commented Aug 15, 2016 •

edited

zeehio commented Aug 15, 2016 •

edited