Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #146

Closed
MuruganR96 opened this issue Sep 22, 2018 · 1 comment

Comments

@MuruganR96
Copy link

Problem

-Sir I didn't integrete for custom en-in Acoustic Model(Adapting the default acoustic model-Indian English) and custom Language Model.
i was download a acoustic model from this link:
i follow the instruction this link: https://cmusphinx.github.io/wiki/tutorialam/

sphinx_fe -argfile en_in/feat.params -samprate 16000 -c audio.fileids -di . -do . -ei wav -eo mfc -mswav yes

pocketsphinx_mdef_convert -text en_in/mdef en_in/mdef.txt

cp -a /usr/local/libexec/sphinxtrain/bw .
cp -a /usr/local/libexec/sphinxtrain/mk_s2sendump .
cp -a /usr/local/libexec/sphinxtrain/map_adapt .
cp -a /usr/local/libexec/sphinxtrain/mllr_solve .

./bw
-hmmdir en_in
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-feat 1s_c_d_dd
-cmn current
-agc none
-dictfn en_in.dic
-ctlfn audio.fileids
-lsnfn audio.transcription
-accumdir .

./mllr_solve
-meanfn en_in/means
-varfn en_in/variances
-outmllrfn mllr_matrix -accumdir .

cp -a en_in en_in_own

./map_adapt
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-meanfn en_in/means
-varfn en_in/variances
-mixwfn en_in/mixture_weights
-tmatfn en_in/transition_matrices
-accumdir .
-mapmeanfn en_in_own/means
-mapvarfn en_in_own/variances
-mapmixwfn en_in_own/mixture_weights
-maptmatfn en_in_own/transition_matrices

./mk_s2sendump
-pocketsphinx yes
-moddeffn en_in_own/mdef.txt
-mixwfn en_in_own/mixture_weights
-sendumpfn en_in_own/sendump

pocketsphinx_continuous -hmm en_in_own -lm en-us.lm.bin -dict en_in.dic -infile 38.wav > 4.txt

it is working but not predicting a particular words. words is relevant to banking sectors.so i build again own language model using language model build tool (Building a simple language model using a web service)

own language model: lm.dict & lm.bin:
transcript file: own_vocab.txt

sphinx_lm_convert -i own.lm -o own.lm.bin
sphinx_lm_convert -i own.lm.bin -ifmt bin -o own.lm -ofmt arpa

pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic

sir, it is working fine. detecting that particular words. but one confusion,

which default acoustic model it takes to run on that command " pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic" ?

but i integrete these two AM and LM, and run on,

pocketsphinx_continuous -hmm en_in_own -lm own.lm.bin -dict own.dic -infile 1.wav > result_own.txt

it was not return any words. and it shows error. phone words dict in the LM not present in the AM.

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

but some how i identify the issue. what it is, phone words(own.dict) EH, EY, AH, AE always presents in the en_in acoustic model(INDIAN ENGLISH mdef phones) also but it is in SMALL CASE.(en_in/ mdef file).
BUT OTHER ENGLISH mdef phones like wsj_all_cd30.mllt_cd_cont_4000, hub4_cd_continuous_8gau_1s_c_d_dd,
Columns definitions
#base lft rt p attrib tmat ... state id's ...
SIL - - - filler 0 0 1 2 N
UNK - - - n/a 1 3 4 5 N
aa - - - n/a 2 6 7 8 N
ae - - - n/a 3 9 10 11 N
ah - - - n/a 4 12 13 14 N

i tried something own.dic phones into small case but it was not reflect both AM & LM.

Basically that LM tool gives these kind of structure words and phones. it is affecting acoustic model model. these two not sync.

i tried another way something to create a own.lm.bin & own.dic also

Build an other way LM:
text2wfreq < own_vocab.txt | wfreq2vocab > own_vocab.tmp.vocab

text2idngram -vocab own_vocab.tmp.vocab -idngram own_vocab.idngram < own_vocab.txt

idngram2lm -vocab_type 0 -idngram own_vocab.idngram -vocab own_vocab.tmp.vocab -arpa own.lm

sphinx_lm_convert -i own.lm -o own.lm.bin

Build a own.dic an other way:
i was followed these link: &

g2p-seq2seq --decode own_vocab.tmp.vocab --model_dir g2p-seq2seq/g2p-seq2seq-model-6.2-cmudict-nostress --output own.dic
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt

it is working fine to predicting a particular words but that confusion is,

which acoustic model is combined to run on that command "pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt"

but i integrete these two AM and LM, and run on,
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt -hmm en_in_own

Again it was return the same error. it was not display any text. the error log is,

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

LM tool produced dict(word-phone) format:
A AH
A(2) EY
ABLE EY B AH L
ABOUT AH B AW T
ABSOLUTELY AE B S AH L UW T L IY

LM g2p-seq2seq produced dict(word-phone) format:
s EH S
s EH S
a EY
able EY B AH L
about AH B AW T
absolutely AE B S AH L UW T L IY

en_in_own mdef phones structure:
ia f aa s n/a 20 2023 2038 2063 N
ia f ae e n/a 20 2023 2038 2063 N
ia f ae s n/a 20 2023 2038 2063 N
ia f ah e n/a 20 2023 2038 2063 N
ia f ah s n/a 20 2023 2038 2063 N
ia f ao e n/a 20 2023 2038 2063 N
ia f ao s n/a 20 2023 2038 2063 N
ia f aw e n/a 20 2023 2038 2063 N

really is those small case was an issue or not? i was not able to predict this issue.

Sir How can i fix this issue?
i didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

  • OS: Linux with version 16.04
  • Python3:
  • Sphinx version:
    PocketSphinx 5prealpha
@nshmyrev
Copy link
Contributor

Same as cmusphinx/pocketsphinx-python#47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants