didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #146

MuruganR96 · 2018-09-22T03:46:35Z

Problem

-Sir I didn't integrete for custom en-in Acoustic Model(Adapting the default acoustic model-Indian English) and custom Language Model.
i was download a acoustic model from this link:
i follow the instruction this link: https://cmusphinx.github.io/wiki/tutorialam/

sphinx_fe -argfile en_in/feat.params -samprate 16000 -c audio.fileids -di . -do . -ei wav -eo mfc -mswav yes

pocketsphinx_mdef_convert -text en_in/mdef en_in/mdef.txt

cp -a /usr/local/libexec/sphinxtrain/bw .
cp -a /usr/local/libexec/sphinxtrain/mk_s2sendump .
cp -a /usr/local/libexec/sphinxtrain/map_adapt .
cp -a /usr/local/libexec/sphinxtrain/mllr_solve .

./bw
-hmmdir en_in
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-feat 1s_c_d_dd
-cmn current
-agc none
-dictfn en_in.dic
-ctlfn audio.fileids
-lsnfn audio.transcription
-accumdir .

./mllr_solve
-meanfn en_in/means
-varfn en_in/variances
-outmllrfn mllr_matrix -accumdir .

cp -a en_in en_in_own

./map_adapt
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-meanfn en_in/means
-varfn en_in/variances
-mixwfn en_in/mixture_weights
-tmatfn en_in/transition_matrices
-accumdir .
-mapmeanfn en_in_own/means
-mapvarfn en_in_own/variances
-mapmixwfn en_in_own/mixture_weights
-maptmatfn en_in_own/transition_matrices

./mk_s2sendump
-pocketsphinx yes
-moddeffn en_in_own/mdef.txt
-mixwfn en_in_own/mixture_weights
-sendumpfn en_in_own/sendump

pocketsphinx_continuous -hmm en_in_own -lm en-us.lm.bin -dict en_in.dic -infile 38.wav > 4.txt

it is working but not predicting a particular words. words is relevant to banking sectors.so i build again own language model using language model build tool (Building a simple language model using a web service)

own language model: lm.dict & lm.bin:
transcript file: own_vocab.txt

sphinx_lm_convert -i own.lm -o own.lm.bin
sphinx_lm_convert -i own.lm.bin -ifmt bin -o own.lm -ofmt arpa

pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic

sir, it is working fine. detecting that particular words. but one confusion,

which default acoustic model it takes to run on that command " pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic" ?

but i integrete these two AM and LM, and run on,

pocketsphinx_continuous -hmm en_in_own -lm own.lm.bin -dict own.dic -infile 1.wav > result_own.txt

it was not return any words. and it shows error. phone words dict in the LM not present in the AM.

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

but some how i identify the issue. what it is, phone words(own.dict) EH, EY, AH, AE always presents in the en_in acoustic model(INDIAN ENGLISH mdef phones) also but it is in SMALL CASE.(en_in/ mdef file).
BUT OTHER ENGLISH mdef phones like wsj_all_cd30.mllt_cd_cont_4000, hub4_cd_continuous_8gau_1s_c_d_dd,
Columns definitions
#base lft rt p attrib tmat ... state id's ...
SIL - - - filler 0 0 1 2 N
UNK - - - n/a 1 3 4 5 N
aa - - - n/a 2 6 7 8 N
ae - - - n/a 3 9 10 11 N
ah - - - n/a 4 12 13 14 N

i tried something own.dic phones into small case but it was not reflect both AM & LM.

Basically that LM tool gives these kind of structure words and phones. it is affecting acoustic model model. these two not sync.

i tried another way something to create a own.lm.bin & own.dic also

Build an other way LM:
text2wfreq < own_vocab.txt | wfreq2vocab > own_vocab.tmp.vocab

text2idngram -vocab own_vocab.tmp.vocab -idngram own_vocab.idngram < own_vocab.txt

idngram2lm -vocab_type 0 -idngram own_vocab.idngram -vocab own_vocab.tmp.vocab -arpa own.lm

sphinx_lm_convert -i own.lm -o own.lm.bin

Build a own.dic an other way:
i was followed these link: &

g2p-seq2seq --decode own_vocab.tmp.vocab --model_dir g2p-seq2seq/g2p-seq2seq-model-6.2-cmudict-nostress --output own.dic
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt

it is working fine to predicting a particular words but that confusion is,

which acoustic model is combined to run on that command "pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt"

but i integrete these two AM and LM, and run on,
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt -hmm en_in_own

Again it was return the same error. it was not display any text. the error log is,

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

LM tool produced dict(word-phone) format:
A AH
A(2) EY
ABLE EY B AH L
ABOUT AH B AW T
ABSOLUTELY AE B S AH L UW T L IY

LM g2p-seq2seq produced dict(word-phone) format:
s EH S
s EH S
a EY
able EY B AH L
about AH B AW T
absolutely AE B S AH L UW T L IY

en_in_own mdef phones structure:
ia f aa s n/a 20 2023 2038 2063 N
ia f ae e n/a 20 2023 2038 2063 N
ia f ae s n/a 20 2023 2038 2063 N
ia f ah e n/a 20 2023 2038 2063 N
ia f ah s n/a 20 2023 2038 2063 N
ia f ao e n/a 20 2023 2038 2063 N
ia f ao s n/a 20 2023 2038 2063 N
ia f aw e n/a 20 2023 2038 2063 N

really is those small case was an issue or not? i was not able to predict this issue.

Sir How can i fix this issue?
i didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

OS: Linux with version 16.04
Python3:
Sphinx version:
PocketSphinx 5prealpha

nshmyrev · 2018-09-22T06:42:17Z

Same as cmusphinx/pocketsphinx-python#47

nshmyrev closed this as completed Sep 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #146

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #146

MuruganR96 commented Sep 22, 2018

nshmyrev commented Sep 22, 2018

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #146

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #146

Comments

MuruganR96 commented Sep 22, 2018

Problem

nshmyrev commented Sep 22, 2018