You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A bit of context first.
My colleague and I at Inria are working on the same topic of porting the online decoding part of kaldi to wasm so that it can run on a web-like environment. So far, we have been able to compile kaldi to wasm and put together a demo which runs an online decoder. The only available model is the small english model of the Zamia speech project, which is about 150M so downloading it may take a while. Good news is, it works fairly well with continuous speech. The problem is that the decoding graph is too large to be loaded on mobile devices.
One of the possible solution is to use the programme from vosk-api which dynamically builds the decoding graph. This programme is what you use in your demo if I understood correctly. However, we were wondering if the latency we observed in your demo was due to the graph being built dynamically or if it was due to JS (which may be executing the decoding step in the UI thread)?
Just in case you wonder, we would like to release our code under an Apache-2.0 license but are waiting to hear back from the LAPACK people as we want to make sure that we are not infringing the terms of their license as we had to modify CLAPACK, CBLAS and BLAS to make them compile to wasm.
Cheers
The text was updated successfully, but these errors were encountered:
Hello,
Thanks for this awesome work.
I saw the result on https://dtreskunov.github.io/tiny-kaldi/ and had a few questions.
A bit of context first.
My colleague and I at Inria are working on the same topic of porting the online decoding part of kaldi to wasm so that it can run on a web-like environment. So far, we have been able to compile kaldi to wasm and put together a demo which runs an online decoder. The only available model is the small english model of the Zamia speech project, which is about 150M so downloading it may take a while. Good news is, it works fairly well with continuous speech. The problem is that the decoding graph is too large to be loaded on mobile devices.
One of the possible solution is to use the programme from vosk-api which dynamically builds the decoding graph. This programme is what you use in your demo if I understood correctly. However, we were wondering if the latency we observed in your demo was due to the graph being built dynamically or if it was due to JS (which may be executing the decoding step in the UI thread)?
Just in case you wonder, we would like to release our code under an Apache-2.0 license but are waiting to hear back from the LAPACK people as we want to make sure that we are not infringing the terms of their license as we had to modify CLAPACK, CBLAS and BLAS to make them compile to wasm.
Cheers
The text was updated successfully, but these errors were encountered: