Verify alphabet in pb* and tflite models #4

JRMeyer · 2021-06-24T08:50:12Z

The alphabet files from Jaco models are inconsistent with the output of the models at runtime. It has been observed that the Jaco Spanish model can produce accented vowels, but the alphabet file does not include them. The alphabet file should be confirmed and uploaded to the zoo for language model generation.

TFModelState::init and TFLiteModelState::init can be modified to print out the loaded alphabet used to train the model here: https://github.com/coqui-ai/STT/blob/653ce25a7ce5bd6cbb564416d847d8afcd5c5e8c/native_client/tfmodelstate.cc#L120

The text was updated successfully, but these errors were encountered:

mariano-balto · 2022-09-08T19:57:37Z

Maybe the above could be the cause of the problem we are seeing on a dockerized ARM environment when using the Jaco models for Spanish with the python (3.9) bindings.

coqui-ai/STT#2284

zuazo · 2023-01-27T13:24:40Z

The correct alphabet files seem to be the following: https://gitlab.com/Jaco-Assistant/Scribosermo/-/tree/deepspeech/data

As you said, the Spanish alphabet includes accented vowels. Also other language's alphabets like French and Polish.

JRMeyer self-assigned this Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify alphabet in pb* and tflite models #4

Verify alphabet in pb* and tflite models #4

JRMeyer commented Jun 24, 2021

mariano-balto commented Sep 8, 2022

zuazo commented Jan 27, 2023

Verify alphabet in pb* and tflite models #4

Verify alphabet in pb* and tflite models #4

Comments

JRMeyer commented Jun 24, 2021

mariano-balto commented Sep 8, 2022

zuazo commented Jan 27, 2023