Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I use German tuda-de model in kaldi-android-demo? #40

Closed
prvit opened this issue Mar 24, 2020 · 17 comments
Closed

How do I use German tuda-de model in kaldi-android-demo? #40

prvit opened this issue Mar 24, 2020 · 17 comments

Comments

@prvit
Copy link

prvit commented Mar 24, 2020

In vosk-api docs it's said that tuda-de model is compatible with vosk-api. I tried to run kaldi-android-demo and everything worked for me, but that was simple english model. I need to make some experiments on german language recognition. I have local kaldi server and I've made everything from docs in vosk-api project too. Also I've both pretrained tuda-de models and built one from their sources myself. But what I can't figure out is how to make android-demo working with tuda-de model. Just copying tuda-de pretrained model files instead of kaldi-android-demo/models/src/main/assets/sync/model-android/ files is not working for me too.
As I see you have few models in Release with proper lookahead, maybe I need to uptade tuda-de graph some how too?
Any chances you can describe a bit more in details how to use this android-demo with German tuda-de model, please?

@nshmyrev
Copy link
Collaborator

I need to make some experiments on german language recognition.

There is German model here: https://github.com/alphacep/kaldi-android-demo/releases/download/2020-01/alphacep-model-android-de-zamia-0.3.tar.gz, it is good actually and it includes all tuda data.

Just copying tuda-de pretrained model files instead of kaldi-android-demo/models/src/main/assets/sync/model-android/ files is not working for me too.

You need to provide details if you need help on this issue

Any chances you can describe a bit more in details how to use this android-demo with German tuda-de model, please?

Default tuda-de is very big, you need to train smaller model first of all with less parameters. Second, you can use this script to create the small graph, thats it:

https://github.com/kaldi-asr/kaldi/blob/master/egs/mini_librispeech/s5/local/lookahead/run_lookahead.sh

@prvit
Copy link
Author

prvit commented Mar 24, 2020

@nshmyrev Thanks for your response.

Default tuda-de is very big, you need to train smaller model first of all with less parameters.

Is it a requirement or a recommendation? Let's say I have a separate device that would run only one app for one purpose if I succeed, so I believe I should be ok to use ~500mb model.

I was going to try tuda_swc_mailabs_voc400k from https://github.com/uhh-lt/kaldi-tuda-de (from the size and name I understood you have the same at https://github.com/alphacep/vosk-api/blob/master/doc/models.md named tuda-de).

Should I be able to drop the pre-trained model files to assets without any actions on it? Files structure in example and tuda-de pretrained model are different.

@nshmyrev
Copy link
Collaborator

Is it a requirement or a recommendation? Let's say I have a separate device that would run only one app for one purpose if I succeed, so I believe I should be ok to use ~500mb model.

It depends on compute capabilities of your device. Not every android device will be able to process audio in realtime with a big model.

Should I be able to drop the pre-trained model files to assets without any actions on it? Files structure in example and tuda-de pretrained model are different.

You need to arrange files in the same way, you can check https://github.com/alphacep/vosk-server/blob/1057086fa9e4dccaafd2c6ab0ceeca14e845c205/docker/Dockerfile.kaldi-de#L4 for details.

@KLytvynenko
Copy link

I have the problem to connect de_400k_nnet3chain_tdnn1f_2048_sp_bi.
Steps:

  1. Copy all files from de_400k_nnet3chain_tdnn1f_2048_sp_bi folder to model-android folder
  2. Launch android solution

Expected result:
Application starts and buttons are active

Actual result:
crash
output:
2020-03-24 22:09:21.993 12372-12411/org.kaldi.demo D/!!!!: /storage/emulated/0/Android/data/org.kaldi.demo/files/sync
2020-03-24 22:09:21.994 12372-12411/org.kaldi.demo V/KaldiDemo: Cannot open config file: /storage/emulated/0/Android/data/org.kaldi.demo/files/sync/model-android/mfcc.conf
2020-03-24 22:09:21.995 12372-12411/org.kaldi.demo E/libc++abi: terminating with uncaught exception of type kaldi::KaldiFatalError: kaldi::KaldiFatalError
2020-03-24 22:09:21.995 12372-12411/org.kaldi.demo A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 12411 (AsyncTask #1), pid 12372 (org.kaldi.demo)

When I replace mdcc.conf file from conf/ to parent folder
I got the next error which is Error opening input stream /ivector/final.mat
then I can replace this file from ivector_extractor to new folder with name ivector/

Then I got another issue for another file and do again the replacement and in the end I managed to launch the app.
But after clicking the button "Recognize Microphone" I got crash:

V/KaldiDemo: Dimension mismatch: source features have dimension 91 and LDA #cols is 280
2020-03-24 22:20:33.072 12676-12676/org.kaldi.demo E/libc++abi: terminating with uncaught exception of type kaldi::KaldiFatalError: kaldi::KaldiFatalError
2020-03-24 22:20:33.072 12676-12676/org.kaldi.demo A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 12676 (org.kaldi.demo), pid 12676 (org.kaldi.demo)

Can you please help me?

  1. Seems like copy pasting stuff not the right thing to solve the problem
  2. How can I fix the final crash "Dimension mismatch: source features have dimension 91 and LDA #cols is 280"

Thank you very much. Very interesting project!

@nshmyrev
Copy link
Collaborator

When I replace mdcc.conf file from conf/ to parent folder

It should be mfcc_hires.conf. I sent you the link above, you just need to follow it.

@KLytvynenko
Copy link

KLytvynenko commented Mar 24, 2020

Ok, I opened it:

# config for high-resolution MFCC features, intended for neural network training
# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why 
# we prefer this method.
--use-energy=false   # use average of log energy, not energy.
--num-mel-bins=40     # similar to Google's setup.
--num-ceps=40     # there is no dimensionality reduction.
--low-freq=20     # low cutoff frequency for mel bins... this is high-bandwidth data, so
                  # there might be some information at the low end.
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600) 

What should I change to get it work?

@KLytvynenko
Copy link

@KLytvynenko
Copy link

After launching this script it asked me again about missing files. I replace it manually and then I got the error:
Could not read symbol table from file /storage/emulated/0/Android/data/org.kaldi.demo/files/sync/model-android/words.txt

@KLytvynenko
Copy link

this is the full output:
2020-03-24 22:43:56.369 14800-14800/? I/org.kaldi.demo: Not late-enabling -Xcheck:jni (already on)
2020-03-24 22:43:56.398 14800-14800/? E/org.kaldi.demo: Unknown bits set in runtime_flags: 0x8000
2020-03-24 22:43:56.400 14800-14800/? W/org.kaldi.demo: Unexpected CPU variant for X86 using defaults: x86_64
2020-03-24 22:43:56.597 14800-14830/org.kaldi.demo D/libEGL: Emulator has host GPU support, qemu.gles is set to 1.
2020-03-24 22:43:56.588 14800-14800/org.kaldi.demo W/RenderThread: type=1400 audit(0.0:61): avc: denied { write } for name="property_service" dev="tmpfs" ino=957 scontext=u:r:untrusted_app:s0:c114,c256,c512,c768 tcontext=u:object_r:property_socket:s0 tclass=sock_file permissive=0 app=org.kaldi.demo
2020-03-24 22:43:56.598 14800-14830/org.kaldi.demo W/libc: Unable to set property "qemu.gles" to "1": connection failed; errno=13 (Permission denied)
2020-03-24 22:43:56.632 14800-14830/org.kaldi.demo D/libEGL: loaded /vendor/lib64/egl/libEGL_emulation.so
2020-03-24 22:43:56.633 14800-14830/org.kaldi.demo D/libEGL: loaded /vendor/lib64/egl/libGLESv1_CM_emulation.so
2020-03-24 22:43:56.634 14800-14830/org.kaldi.demo D/libEGL: loaded /vendor/lib64/egl/libGLESv2_emulation.so
2020-03-24 22:43:56.721 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/ivector/final.dubm: checksums are equal
2020-03-24 22:43:56.721 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/ivector/splice.conf: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/ivector/final.mat: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/mfcc.conf: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/words.txt: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/word_boundary.int: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/ivector/global_cmvn.stats: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/ivector/online_cmvn.conf: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/mfcc_hires.conf: checksums are equal
2020-03-24 22:43:56.722 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/ivector/final.ie: checksums are equal
2020-03-24 22:43:56.723 14800-14836/org.kaldi.demo I/Assets: Skipping asset model-android/HCLG.fst: checksums are equal
2020-03-24 22:43:56.762 14800-14828/org.kaldi.demo W/OpenGLRenderer: Failed to choose config with EGL_SWAP_BEHAVIOR_PRESERVED, retrying without...
2020-03-24 22:43:56.763 14800-14828/org.kaldi.demo D/eglCodecCommon: setVertexArrayObject: set vao to 0 (0) 0 0
2020-03-24 22:43:56.763 14800-14828/org.kaldi.demo D/EGL_emulation: eglCreateContext: 0x71c9b872c860: maj 3 min 0 rcv 3
2020-03-24 22:43:56.764 14800-14828/org.kaldi.demo D/EGL_emulation: eglMakeCurrent: 0x71c9b872c860: ver 3 0 (tinfo 0x71c9b86903e0)
2020-03-24 22:43:56.792 14800-14828/org.kaldi.demo W/Gralloc3: mapper 3.x is not supported
2020-03-24 22:43:56.838 14800-14828/org.kaldi.demo D/EGL_emulation: eglMakeCurrent: 0x71c9b872c860: ver 3 0 (tinfo 0x71c9b86903e0)
2020-03-24 22:43:56.841 14800-14828/org.kaldi.demo D/eglCodecCommon: setVertexArrayObject: set vao to 0 (0) 1 0
2020-03-24 22:43:58.453 14800-14815/org.kaldi.demo W/System: A resource failed to call close.
2020-03-24 22:44:17.068 14800-14836/org.kaldi.demo I/Assets: Copying asset model-android/final.mdl to /storage/emulated/0/Android/data/org.kaldi.demo/files/sync/model-android/final.mdl
2020-03-24 22:44:17.080 14800-14836/org.kaldi.demo I/Assets: Removing asset /storage/emulated/0/Android/data/org.kaldi.demo/files/sync/model-android/ivector/final.mdl
2020-03-24 22:44:17.082 14800-14836/org.kaldi.demo D/!!!!: /storage/emulated/0/Android/data/org.kaldi.demo/files/sync
2020-03-24 22:44:17.156 14800-14836/org.kaldi.demo V/KaldiDemo: Computing derived variables for iVector extractor
2020-03-24 22:44:17.575 14800-14836/org.kaldi.demo V/KaldiDemo: Done.
2020-03-24 22:44:18.382 14800-14836/org.kaldi.demo V/KaldiDemo: Removed 1 orphan nodes.
2020-03-24 22:44:18.382 14800-14836/org.kaldi.demo V/KaldiDemo: Removing 2 orphan components.
2020-03-24 22:44:18.382 14800-14836/org.kaldi.demo V/KaldiDemo: Added 1 components, removed 2
2020-03-24 22:44:18.396 14800-14836/org.kaldi.demo V/KaldiDemo: Spent 0.00959492 seconds in looped compilation.
2020-03-24 22:45:15.710 14800-14836/org.kaldi.demo V/KaldiDemo: Could not read symbol table from file /storage/emulated/0/Android/data/org.kaldi.demo/files/sync/model-android/words.txt
2020-03-24 22:45:15.731 14800-14836/org.kaldi.demo E/libc++abi: terminating with uncaught exception of type kaldi::KaldiFatalError: kaldi::KaldiFatalError
2020-03-24 22:45:15.731 14800-14836/org.kaldi.demo A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 14836 (AsyncTask #1), pid 14800 (org.kaldi.demo)

And this is the file structure that I have:
image
image

Can you help me with it?

@nshmyrev
Copy link
Collaborator

You can monitor memory usage on your emulator. Most likely it goes out of memory.

@KLytvynenko
Copy link

You are right, is there some requirements or statistics or what is the dependency?
For instance:
Android device 4Ram 16Gb internal storage and so on -> can use pretrained model up to 600 MBs (for instance)
How it possible to detect which model is fitting for which device the most?

Managed to launch alphacep-model-android-de-zamia-0.3 but the result is not good. It cannot detect the simplest German words as "hallo, Wie geht es dir?" or numbers. Maybe I did something wrong?

After multiple tries to launch any of https://github.com/uhh-lt/kaldi-tuda-de the application behaves as not enough memory.

High appreciate any help, thank you!

@nshmyrev
Copy link
Collaborator

Managed to launch alphacep-model-android-de-zamia-0.3 but the result is not good. It cannot detect the simplest German words as "hallo, Wie geht es dir?" or numbers. Maybe I did something wrong?

You need to test the accuracy with vosk-api python and prerecorded audio files first.

@KLytvynenko
Copy link

So, after playing around a day.
The results are the next (used prerecorded audios and pure speech) :
alphacep-model-android-en-us-0.3.tar.gz - accuracy very good
alphacep-model-android-de-zamia-0.3.tar.gz - accuracy about 0 %
and any of this models https://github.com/uhh-lt/kaldi-tuda-de - accuracy about 0 %

FYI: the results are the same on Android devices/emulators and on Volk-api solution.

Used prerecorded audio files:
Archive.zip

My for German mfcc.conf:
--use-energy=false # use average of log energy, not energy.
--num-mel-bins=40 # similar to Google's setup.
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so
# there might be some information at the low end.
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)

@nshmyrev can you help me please?

@nshmyrev
Copy link
Collaborator

audio should be 16khz 16 bit pcm mono. Your first file is stereo, second is mp4.

@KLytvynenko
Copy link

Profit! Thank you very much!

@nshmyrev
Copy link
Collaborator

We have new German models now, more accurate than external ones.

@prvit
Copy link
Author

prvit commented Dec 14, 2020

@nshmyrev Great news! Could you, please, share some details about how you preprocess it? As I understood from the readme, it's based on kaldi-tuda-de biggest pretrained model. I was using that one also with vosk by changing the initial structure, moving some directories etc. so that it's acceptable for vosk. But your one has new files, like G.carpa and G.fst in rescore folder, like frame_subsampling_factor and tree in am folder while I was having just a final.mdl in that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants