didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #47

MuruganR96 · 2018-09-22T03:47:41Z

Problem

-Sir I didn't integrete for custom en-in Acoustic Model(Adapting the default acoustic model-Indian English) and custom Language Model.
i was download a acoustic model from this link:
i follow the instruction this link: https://cmusphinx.github.io/wiki/tutorialam/

sphinx_fe -argfile en_in/feat.params -samprate 16000 -c audio.fileids -di . -do . -ei wav -eo mfc -mswav yes

pocketsphinx_mdef_convert -text en_in/mdef en_in/mdef.txt

cp -a /usr/local/libexec/sphinxtrain/bw .
cp -a /usr/local/libexec/sphinxtrain/mk_s2sendump .
cp -a /usr/local/libexec/sphinxtrain/map_adapt .
cp -a /usr/local/libexec/sphinxtrain/mllr_solve .

./bw
-hmmdir en_in
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-feat 1s_c_d_dd
-cmn current
-agc none
-dictfn en_in.dic
-ctlfn audio.fileids
-lsnfn audio.transcription
-accumdir .

./mllr_solve
-meanfn en_in/means
-varfn en_in/variances
-outmllrfn mllr_matrix -accumdir .

cp -a en_in en_in_own

./map_adapt
-moddeffn en_in/mdef.txt
-ts2cbfn .cont.
-meanfn en_in/means
-varfn en_in/variances
-mixwfn en_in/mixture_weights
-tmatfn en_in/transition_matrices
-accumdir .
-mapmeanfn en_in_own/means
-mapvarfn en_in_own/variances
-mapmixwfn en_in_own/mixture_weights
-maptmatfn en_in_own/transition_matrices

./mk_s2sendump
-pocketsphinx yes
-moddeffn en_in_own/mdef.txt
-mixwfn en_in_own/mixture_weights
-sendumpfn en_in_own/sendump

pocketsphinx_continuous -hmm en_in_own -lm en-us.lm.bin -dict en_in.dic -infile 38.wav > 4.txt

it is working but not predicting a particular words. words is relevant to banking sectors.so i build again own language model using language model build tool (Building a simple language model using a web service)

own language model: lm.dict & lm.bin:
transcript file: own_vocab.txt

sphinx_lm_convert -i own.lm -o own.lm.bin
sphinx_lm_convert -i own.lm.bin -ifmt bin -o own.lm -ofmt arpa

pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic

sir, it is working fine. detecting that particular words. but one confusion,

which default acoustic model it takes to run on that command " pocketsphinx_continuous -inmic yes -lm own.lm.bin -dict own.dic" ?

but i integrete these two AM and LM, and run on,

pocketsphinx_continuous -hmm en_in_own -lm own.lm.bin -dict own.dic -infile 1.wav > result_own.txt

it was not return any words. and it shows error. phone words dict in the LM not present in the AM.

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

but some how i identify the issue. what it is, phone words(own.dict) EH, EY, AH, AE always presents in the en_in acoustic model(INDIAN ENGLISH mdef phones) also but it is in SMALL CASE.(en_in/ mdef file).
BUT OTHER ENGLISH mdef phones like wsj_all_cd30.mllt_cd_cont_4000, hub4_cd_continuous_8gau_1s_c_d_dd,
Columns definitions
#base lft rt p attrib tmat ... state id's ...
SIL - - - filler 0 0 1 2 N
UNK - - - n/a 1 3 4 5 N
aa - - - n/a 2 6 7 8 N
ae - - - n/a 3 9 10 11 N
ah - - - n/a 4 12 13 14 N

i tried something own.dic phones into small case but it was not reflect both AM & LM.

Basically that LM tool gives these kind of structure words and phones. it is affecting acoustic model model. these two not sync.

i tried another way something to create a own.lm.bin & own.dic also

Build an other way LM:
text2wfreq < own_vocab.txt | wfreq2vocab > own_vocab.tmp.vocab

text2idngram -vocab own_vocab.tmp.vocab -idngram own_vocab.idngram < own_vocab.txt

idngram2lm -vocab_type 0 -idngram own_vocab.idngram -vocab own_vocab.tmp.vocab -arpa own.lm

sphinx_lm_convert -i own.lm -o own.lm.bin

Build a own.dic an other way:
i was followed these link: &

g2p-seq2seq --decode own_vocab.tmp.vocab --model_dir g2p-seq2seq/g2p-seq2seq-model-6.2-cmudict-nostress --output own.dic
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt

it is working fine to predicting a particular words but that confusion is,

which acoustic model is combined to run on that command "pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt"

but i integrete these two AM and LM, and run on,
pocketsphinx_continuous -lm own.lm.bin -dict own.dic -infile 10.wav > 10.txt -hmm en_in_own

Again it was return the same error. it was not display any text. the error log is,

INFO: dict.c(333): Reading main dictionary: lm_model_resources/other/own.dic
ERROR: "dict.c", line 195: Line 5: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 6: Phone 'EH' is mising in the acoustic model; word 's' ignored
ERROR: "dict.c", line 195: Line 7: Phone 'EY' is mising in the acoustic model; word 'a' ignored
ERROR: "dict.c", line 195: Line 8: Phone 'EY' is mising in the acoustic model; word 'able' ignored
ERROR: "dict.c", line 195: Line 9: Phone 'AH' is mising in the acoustic model; word 'about' ignored
ERROR: "dict.c", line 195: Line 10: Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

LM tool produced dict(word-phone) format:
A AH
A(2) EY
ABLE EY B AH L
ABOUT AH B AW T
ABSOLUTELY AE B S AH L UW T L IY

LM g2p-seq2seq produced dict(word-phone) format:
s EH S
s EH S
a EY
able EY B AH L
about AH B AW T
absolutely AE B S AH L UW T L IY

en_in_own mdef phones structure:
ia f aa s n/a 20 2023 2038 2063 N
ia f ae e n/a 20 2023 2038 2063 N
ia f ae s n/a 20 2023 2038 2063 N
ia f ah e n/a 20 2023 2038 2063 N
ia f ah s n/a 20 2023 2038 2063 N
ia f ao e n/a 20 2023 2038 2063 N
ia f ao s n/a 20 2023 2038 2063 N
ia f aw e n/a 20 2023 2038 2063 N

really is those small case was an issue or not? i was not able to predict this issue.

Sir How can i fix this issue?
i didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored

OS: Linux with version 16.04
Python3:
Sphinx version:
PocketSphinx 5prealpha

nshmyrev · 2018-09-22T06:41:52Z

You have to use Indian English phonetic dictionary with this model and train Indian English g2p model with seq2seq

MuruganR96 · 2018-09-22T07:17:16Z

@nshmyrev thank you sir. i will train, build and run a model. then i will update my status sir.

MuruganR96 · 2018-09-22T09:24:02Z

sir i didn't understand this meaning,

" train Indian English g2p model with seq2seq "

we have our en_in.dic (predefined Indian English phonetic dictionary) and then custom acoustic model (en_in_own).
And then we have a g2p model,

**```
wget -O g2p-seq2seq-cmudict.tar.gz https://sourceforge.net/projects/cmusphinx/files/G2P%20Models/g2p-seq2seq-model-6.2-cmudict-nostress.tar.gz/download
tar xf g2p-seq2seq-cmudict.tar.gz
```**

and,
G2P Models :
g2p-seq2seq-model-6.2-cmudict-nostress.tar.gz
g2p-seq2seq-model-6.2-pronasyl.tar.gz
g2p-seq2seq-model-5.2-cmudict.tar.gz
phonetisaurus-cmudict-split.tar.gz

fst:
it.tar.gz (Italian)
en_us_nostress.tar.gz (english)
zh.tar.gz(Mandarin)
ru.tar.gz (Russian)
nl.tar.gz (Dutch)
fr.tar.gz (French)
es.tar.gz (Spanish)
es_mx.tar.gz(Mexican Spanish)
de.tar.gz (German)

it is the seq2seq g2p model.not mension these particularly but what is meant by Indian English g2p model with seq2seq.

how can i train these indian English g2p model. i am really confused sir.
we take the wordlist is a text file with one word per line----> own_vocab.tmp.vocab
and run a below program,
**g2p-seq2seq --decode own_vocab.tmp.vocab --model_dir g2p-seq2seq-model-6.2-cmudict-nostress --output own.dic**
got own.dict i know these only sir.
how can i train Indian English g2p model with seq2seq. sir can you explain me sir?
@nshmyrev thank you so much sir.

nshmyrev closed this as completed Sep 22, 2018

nshmyrev mentioned this issue Sep 22, 2018

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored cmusphinx/pocketsphinx#146

Closed

MuruganR96 mentioned this issue Sep 24, 2018

train Indian English g2p model with seq2seq #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #47

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #47

MuruganR96 commented Sep 22, 2018

nshmyrev commented Sep 22, 2018

MuruganR96 commented Sep 22, 2018

MuruganR96 commented Sep 22, 2018 •

edited

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #47

didn't integrete for custom en-in AM and custom LM, ERROR: "dict.c", Phone 'AE' is mising in the acoustic model; word 'absolutely' ignored #47

Comments

MuruganR96 commented Sep 22, 2018

Problem

nshmyrev commented Sep 22, 2018

MuruganR96 commented Sep 22, 2018

MuruganR96 commented Sep 22, 2018 • edited

MuruganR96 commented Sep 22, 2018 •

edited